Link Search Menu Expand Document
Quantitative data analysis Toolbox

3.4 Recoding variables

In order to facilitate analysis, you may need to recode some specific variables for them to fit your analysis needs.

Let’s say for instance you want to analyze a specific answer to your survey against the age of the respondents. You would have to recode them into age groups to have a digestible analysis.

image info

image info

You can see it is easier to process information in the second example.

In our case study, age was recoded into age groups to facilitate demographic analysis, including developing Population Pyramids of the sample and calculating the dependency ratio.

Our case study includes a sample of 92 households, which includes data on the age and gender of 392 individuals.

image info

First, we can recode each individual’s age into 5-year categories. This can be done through a COUNTIFS function with the variables ‘age’ and ‘gender’.

image info

Based on the recoding, we can then calculate the percentage of each age group by gender across the entire sample, and visualize the data in a Population Pyramid.

image info

(Review the case study analysis here to see the Excel syntax of the re-coding and analysis).

In their Technical brief on data cleaning, ACAPS provides a strong list of types of recoding for review prior to analysis (source: Data Cleaning, ACAPS, 2016, pg. 7):

  • Formatting: date (day, month, and year), pre-fixes to create better sorting in tables
  • Rounding continuous variables
  • Syntax: Translation, language style and simplification
  • Recoding a categorical variable (e.g. ethnicity, occupation, an “other” category, spelling corrections, etc.)
  • Recoding a continuous variable (e.g. age) into a categorical variable (e.g. age group)
  • Combining the values of a variable into fewer categories (e.g. grouping all problems caused by access constraints)
  • Combining several variables to create a new variable (e.g., the food consumption score, building an index based on a set of variables)
  • Defining a condition based on certain cut-off points (e.g., population “at risk” vs. “at acute risk”)
  • Changing a level of measurement (e.g. from interval to ordinal scale)