4.2 Recognizing basic data types
There are a number of different ways of classifying data. At the simplest level, we can think of data as either qualitative or quantitative.
Quantitative data is numerical and so tells us about things that can be measured or counted (such as your height or weight). Quantitative data can also be continuous or discrete. Continuous data will vary continuously along a numerical range with infinitely small separation between observations. For example; 155cm, 155.124cm, and 116cm; would all be valid measurements of an individual’s height. Discrete variables, on the other hand, will only have certain valid values. For example, we can only count numbers of people using integer values, such as 1,200 or 54. It isn’t possible to count 54.5 people.
In the case study, for example, there are a number of variables that provide discreet, quantitative data. For example, ‘Total number of household members’ must be calculated as an integer, and is therefore a discreet, quantitative variable. Others include:
- Age of head of household and other household members.
- Number of days the food from the last distribution lasted the household.
- Number of days the household engaged negative coping strategies to access food (as part of the reduced Coping Strategies Index).
- Number of days in the last week the household ate food from different food groups (as part of the Food Consumption Score calculation).
Note: ‘Time’ is a continuous variable that can be seen in the survey ‘metadata’ under the variable ‘Submission time’, but the indicator above specifically requests the number of ‘days’, rather than the time.
Qualitative data is descriptive and therefore tells us about the attributes, categories, or descriptions of things (such as the color of your hair). Qualitative data can be nominal or ordinal. Nominal variables do not have any logical ordering or ranking. For example, data on blue vs brown vs green eye color does not have any logical order. As the name suggests, ordinal data does have a logical ordering. The IPC Food Insecurity Phases is a good example of an ordinal variable. A good example of an ordinal variable is the Integrated Phase Classification Phases.
As the IPC example above demonstrates, qualitative data will sometimes be coded according to numerical levels. This means that the presence of numbers in your data doesn’t necessarily mean that you don’t have any qualitative variables. A data dictionary is valuable here in understanding where this is the case.
The case study has a number of different variables that provide qualitative data, including both nominal and ordinal. For example, data relating to the question: “What is the principal source of drinking water for members of your household?” is a qualitative, nominal variable; respondents can select among pre-defined answer options that do not have a logical order (e.g., public tap/ handpumps, water sellers, etc.). The answer options are displayed as coded numbers, but this does not mean the data is not qualitative.
The Food Consumption Scores (FCS) categorizes ‘thresholds’ for each household (‘Acceptable’, ‘Borderline’ and ‘Poor’), based on a calculation of the scores of household consumption of different food groups over the past week. These thresholds provide qualitative, ordinal data to each household, as there is a clear, logical order among the data.
As you can see in the analysis plan (and analysis), despite the presence of ‘qualitative’ data, we can still analyze the qualitative data through quantitative measures. For example, we can calculate the percentage of households among each FCS threshold as a measurement of the food security among the sample population.
You’ll also want to look out for how your analysis software will store each of your variables. In some cases, numerical variables may be stored as text data, which will leave your software unable to perform numerical calculations on this data. Software such as Excel will often notify you of this error and allow you to adjust the data format. You can check the section 3 Getting clean and usable data.
Qualitative and quantitative data will be handled differently in any analysis. Generally, quantitative data is more amenable to statistical analysis. As we’ll talk about it in more detail in the next section, we can calculate things like the mean, median, and standard deviation of quantitative variables.
This article on variable types provides some further information.
You can review the questions in the case study survey and the type of resultant type of variables in the data dictionary, here.