Link Search Menu Expand Document
Quantitative data analysis Toolbox

4.1 Checking the metadata


Metadata is a term that refers to data about data. We can think of this as covering many of the basic characteristics about a dataset; such as the date it was created, how it was collected, and the license for sharing or publishing the data. Important metadata for geographic datasets will also include things like the spatial resolution and geographic or projected coordinate system.

image info

Movement Range Maps - Humanitarian Data Exchange

For example, take a look at the screenshot above from Facebook’s Movement Range Maps on HDX. Under the Metadata tab, we can see lots of information about this dataset, including a link to a more detailed methodology document detailing how this data was created.

Checking the metadata of any dataset is a valuable way to understand if that dataset is appropriate for your use case and to ensure that you will be interpreting and using it properly. For example, imagine that you are interested in understanding the population of various IDP camps. You have a dataset with this information, but the metadata tells you that this dataset was last updated in 2005. You might then make the decision that this data isn’t appropriate for your task at hand.

Metadata will often be organized in what’s called a data dictionary. Often stored in its own document or table, a data dictionary will contain information about the structure and content of a dataset. In particular, a data dictionary may include a table with descriptions of all the variables in a dataset. This can be important as variables in a table may not always be clearly named. Sometimes column headers in a table can only include a limited number of characters and may not have spaces between words. For example, a variable that captures data on the population of women over 65 years old may be named ‘w_ov65’.

In our case study, the data dictionary provides us data about the data. The key information provided is the column/variable name, the description or related survey question (if applicable), the data type, and the codes or ranges for the data in each cell.

image info

Through the data dictionary, we can see that the variable ‘YNOFOOD’ specifically asks the respondents “Why do you not have access to the local food assistance programmes?” From here, we can see that the variable is ‘Nominal’ (data can be categorized in groups) and, therefore, we can see the potential responses in the ‘Codes/Ranges’ column. The data for this variable will be coded as ‘1’, ‘2’, ‘3’, ‘6’, or ‘8’, with each data representing the corresponding possible answers provided by the respondents.

Therefore, the key role of the data dictionary is to help you navigate the dataset and understand what type of analysis can be conducted based on the data type (further information below).