Link Search Menu Expand Document
Data visualisation toolbox

4.4 Scatter plots

When to use: Scatter plots are useful for showing correlation or clustering in large datasets, particularly when the order of points in the dataset is not essential (for example, if not collected over time). Trend lines can also be used to show linear or exponential relationships in positive or negative directions, while also having the ability to highlight outliers.

In our case study, we created a scatter plot that charts the household sizes of our respondents and the number of days the food security assistance lasts the households. Given that food security assistance is often a standardized size (or value, in the case of cash-based assistance), our hypothesis was that the length of time food security assistance lasts decreases according to the size of the household.

image info

As seen above, through our plots and the trendline, there is a clear pattern showing that the number of days food security assistance lasts decreases in relation to the size of households.

Best practices: It is best to only ever include a maximum of 2 trend lines, as otherwise he relationships between variables can become crowded and difficult to decipher. Scatter plots should also only be used on quantitative variables.

When to avoid: Scatter plot diagrams only provide value in seeing patterns if there is a large dataset, so avoid when there are not many data points. Also, scatter plots only provide visual value if they show a clear correlation or clustering among two variables. Because they can only be used with quantitative variables, scatter plots are therefore not relevant with regards to frequency distribution (common in the humanitarian sector).