3.3.4 Missing data
28-Feb-2022 1 min
Missing values are present in virtually all databases. Like extreme values, the presence of missing values can lead to spurious analyses.
Hence you should be able to spot and highlight where there are missing values. If you are using Excel to analyze your data, here ( available in French) is how you can spot them.
There are actually different type of missing values that you should be aware (Source: the ACAPS 2016 resource Data Cleaning):
- A blank cell can actually mean zero, ‘none’, ‘no’, or ‘not applicable’.
- A variable may be missed and unanswered during the survey unintentionally.
- When a variable has been purposely not answered. This often happens when the question is confusing, not appropriate, or perceived of as sensitive by the enumerator or respondent. This can happen based on contextual factors (if proper testing and tool design occurs, this should not happen). However, this issue can also be a finding that is interesting to mention it in the analysis report as a potential bias, particularly for justification to adjust the tool to more appropriate questions (or question responses) if the questionnaire is to be repeated.
What to do with missing values?
- Replace blank cells that makes sense by zero, “no”, “not applicable”
- Be careful with replacing blank cells with zero, as it will definitely have an impact on the findings.
- Exclude the subjects / data points that have missing values on any of the variables under analysis.
- Which means that sample size will change from one variable to another.
- Delete all cases with missing values, hence keeping a dataset with only complete data.
- This could lead to the sample size being insufficient, and analysis bias if the profiles of subjects with missing values are similar (ex. all female).