4.3 Descriptives statistics
TABLE OF CONTENTS
- 4.3.1 Making a sum
- 4.3.2 Calculating an average
- 4.3.3 Calculating the median
- 4.3.4 Calculating the mode
- 4.3.5 Calculating the minimum or maximum
- 4.3.6 Calculating the standard deviation
- 4.3.7 Calculating the variance
- 4.3.8 Presenting descriptive statistics: rounding numbers
Descriptive statistics summarise a statistical population, unlike inferential statistics, which aim to generalise conclusions drawn from samples to a population as a whole.
Table of the main descriptive statistical functions applied to Excel
Action | Formula |
---|---|
Sum | =SUM() |
Average | =AVERAGE() |
Median | =MEDIAN() |
Mode | =MODE.SNGL() |
Minimum | =MIN() |
Maximum | =MAX() |
Standard deviation | =STDEV.S() |
Variance | =VAR.S() |
4.3.1 Making a sum
Under the Stats tab, we want to calculate the total of the Active Health Centre column present in the Database1 tab:
=SUM(‘Database1’!C2:C37)
The first argument ‘Database 1’! indicates the tab where the data to be summed up is located, C2:C37 indicates the range of data to be summed up.
Thus, in total, there are 5,510 active health centres for all provinces.
4.3.2 Calculating an average
Under the Stats tab, we want to calculate the average of the Active Health Centre column present in the Database1 tab using the =AVERAGE() function:
=AVERAGE(‘Database1’!C2:C37)
Where the first argument ‘Database1’! indicates the tab where the data to be summed up is located, C2:C37 indicates the range of data to be averaged.
On average, there are thus 153 active health centres per province.
4.3.3 Calculating the median
The median will only look for the value that is half the data range. Accordingly, for the distribution of active health centres by province, 50% of provinces fall below the median and 50% above.
The median represents the value that lies in the middle of the other values. In the example below, there are 36 provinces. The median therefore lies midway between the 18th and 19th provinces, ranking the values in ascending order in terms of the number of active health centres using the sort button located at the top right of the screen:
The 18th province has a value of 84 and the 19th province has a value of 110. To obtain the median, these 2 values are added together and then divided by two:
84 + 110 = 194 194 / 2 = 97
It is also possible to obtain the median directly using the =MEDIAN() function:
=MEDIAN(‘database1’!C2:C37)
4.3.4 Calculating the mode
The mode is the most common value in a statistical series. This measurement can readily be calculated in Excel using the =MODE.SNGL() function:
=MODE.SNGL(‘Database1’!C2:C37)
The mode of the Active Health Centres variable is 2 (it appears 3 times in the statistical series).
4.3.5 Calculating the minimum or maximum
It is possible to easily determine the minimum or maximum of a statistical series with Excel.
The =MIN() et =MAX() functions return the smallest and largest values in the statistical series respectively:
=MIN(‘Database1’!C2:C37) =MAX(‘Database1’!C2:C37)
For example, some provinces have no active health centres (minimum value = 0). On the other hand, one province has 773 active health centres, which is the maximum value.
4.3.6 Calculating the standard deviation
The standard deviation is used to calculate the dispersion of a dataset. For example, for the distribution of active health centres by province, the smaller the standard deviation, the more homogeneous the number of health centres (i.e., close from one province to another). Conversely, if the deviation is higher, the number of active health centres per province is not as tight (the deviations between the number of active health centres across provinces are greater).
The standard deviation has several peculiarities:
- It is used to calculate the dispersion around the average of a data range.
- The standard deviation cannot be negative.
- The result may be zero if all data is equal.
- The standard deviation is sensitive to outliers, such as the average. A data can therefore influence the result.
- If the data have a similar average, the greater the dispersion, the greater the standard deviation.
The formula for calculating the standard deviation is =STDEV.S():
=STDEV.S(‘Database1’!C2:C37)
The standard deviation of the variable “Active Health Centre” is 180. This means that, on average, each province has a total of X number of health centres that differs by +/- 180 from the variable average.
4.3.7 Calculating the variance
The variance is the square of the standard deviation. Calculating variance involves squaring deviations, which is why it may seem confusing, because it does not have the same unit of measurement as the original observations. For example, for data measured as the number of active health centres, the measure of the variance will be expressed as the number of active health centres squared. This is why in the example below, our data ranges from 0 to 773 active health centres per province but the variance is 32,627, if we calculate the square root of this variance √32 627 = 180, which corresponds to our standard deviation.
The formula used in Excel to calculate the variance is as follows, using the =VAR.S() function:
=VAR.S(‘Database1’!C2:C37)
4.3.8 Presenting descriptive statistics: rounding numbers
The number of digits to display depends on the desired level of accuracy, but the following general rules can be applied:
- Numbers should be rounded up if the last digit is greater than five (for example: 12.12507 is rounded to 12.13), or down if the last digit is less than or equal to 5 (for example: 12.12500 is rounded to 12.12).
- When drafting a report, for example, it is recommended to remove decimals wherever possible. Indeed, the emphasis is on the importance of a trend or the differences it reflects, not on the digits themselves. Put yourself in the shoes of the people who will read the narrative portion to realise what they will need to remember and understand.
The following exercise is only available on the French page at this stage- translation coming soon!
In order to practice using the various functions available for the calculation of the main descriptive statistics, here is a practical exercise using clean data from the case study - developed by CartONG as part of the Data Analysis Toolbox publication. A more complete presentation of this case study can be found in the “Data Analysis Toolbox” – Part “7 The case study ”.
In this example, we propose to generate descriptive statistics using certain functions, for:
- The age of household members
- The Food Consumption Score (FCS) of households
Use the [“Excel Toolbox - Analysis 4.3 - Practice exercise - V1 - Statement”] file to practice calculating descriptive analyses using functions.
In case of difficulties, or to compare your results with those of the case study, refer to the file [“Excel Toolbox - Analysis 4.3 - Practical exercise - V1 – Solutions”].