Quantitative data analysis Toolbox

# 4.3.2 Measures of central tendency

Measures of central tendency allow us to better understand the average of a dataset. Three commonly-used measures are described below: Mean, median (and quartiles), mode.

### Mean

One of the most common summary statistics, the mean, can be calculated by summing all observations and dividing this sum by the total number of observations. Note that the mean is quite sensitive to outliers in your dataset. If some of your observations are extremely high or low relative to most of the data, then the mean for all of these observations may be misleading in that it will be biased in the direction of these outliers. How to calculate that in Excel? (available in French).

What you need to know: The mean is a very “simple” statistical parameter. It is easy to calculate, and has been used widely. The drawback of this is that:

1. Given its sensitivity to outliers, it can quickly become meaningless if you’re facing a complex context, or if you’re data collection quality is lacking.
2. In case of a distribution of data that is not symmetrical, you should not use it or use it with caution, as it won’t be representative/statistically robust.

For example, as you can see hereunder, you have two series of data of KAP surveys looking at the average quantity of water delivered (litres/person/day) in two different locations that have the same mean (the dotted line) which is equal to 21,2 l/p/d. But these two distributions are totally different: the green one is centred around the mean, with most of the values around it and very little spread, therefore the information the mean conveys is quite robust: most of the people have access to around 21l/p/d.

But the second one (the red) shows a very different reality, as there are a lot of people who have access to a very high quantity of water and a lot of people with very few quantity of water, which means that despite a correct mean, the program still has a lot of work to do to cover the majority of the population (and we would be wrong to assume that we reached a standard).

One of the element that will allow you to highlight this difference, and not be fooled by similar means, is the standard deviation. Also, to calculate it when using, excel, you can refer to this link (available in French).

In our case study, we have calculated the mean of the Food Consumption Scores, which can provide us an indication of the food insecurity among the sample population.

As seen in the table below, the mean value of the FCS is 42.7, which indicates an ‘Acceptable’ FCS scores according to the standardized indicator thresholds.

Threshold Score
Poor 0 to 21
Borderline 21.5 to 35
Acceptable 35.5 +

However, as indicated below, this measure of central tendency alone may lead be misleading; if we stopped there, we may think the sample population does not have food insecurity. The mean must be considered in conjunction with further analysis!

### Median (and quartiles)

We can also calculate the median value of a variable by ordering all observations from smallest to largest and selecting the observation in the middle. Check out how to do it in Excel (available in French).

The median always corresponds to the second quartile and can be applied to most of the situations you will encounter in the field.

What you need to know: You should always use the median, as it is not sensitive to the outliers. After calculating it, compare it to your mean, as it will give you a first idea of the spread of your data.

In our case study, we have calculated the medium of the Food Consumption Scores. The median value FCS value is 40.8.

The median value also indicates an ‘Acceptable’ FCS score according to the standardized indicator thresholds. However, given the median is lower than the mean, we can know that there is a higher concentration of datapoints in the lower limits of FCS scores among the sample population. Therefore, our data is more concentrated below the mean than above the mean, and may be influenced by higher, ‘outlier’ values.

### Mode

The mode is the value that appears most frequently in our observations. The mode is used most often for qualitative data where the mean and median aren’t appropriate to calculate. How to calculate that in Excel: Simple (available in French)!

In our case study, respondents were asked a number of questions that provided qualitative data. One example is the question: “How was the food acquired?”, which included a list of potential answer options. The mode indicates the most common response, which was ‘Household own production’. In conjunction with the food security indicators, we could hypothesize that the target population continues to have high levels of agricultural production that has not been fully disrupted by the conflict.