3.2 Export and format data


TABLE OF CONTENTS
3.2.1. Get your data
The way you get your data is dependent on how the data was collected in the first place. For example:
- If data was collected from mobile data collection or other digital tools, you will need to download, export or pull data from an online server (for example, your Kobo Account). Make sure to choose a format for the download that is compatible with your analysis software so you will be able to use for analysis.
- If data was collected on paper-based surveys, you will first need to conduct data entry in a dedicated software, such as a spreadsheet on Excel.
Before editing your data, make sure to create a non-editable copy to avoid modification of the original dataset.
The process can be replicated periodically when you do a lot of transformation of the data (so that you don’t start from scratch again if there is a major issue).
3.2.2. Put your data into a table format
To transform your dataset into a database, it is essential to ensure that the data is in a tabular format of rows and columns with the following characteristics:
- Data points (or subjects) must be transformed into a row with a unique identifier for each one. The data points are the unit of analysis from which the data collection sample was derived, i.e. one row represents one time a survey was filled-out. The data points of each row often represent an individual, household, or community, etc.
- A unique Identifier is usually a numeric/alphanumeric code that can be used to identify unique data points/subjects.
- Variables are transformed into columns. Variables usually signify each question that was asked in a survey0
- No cells should be merged.
- No ‘multiple answers’ in one cell0
- No “multiple answers” in a cell.
- No blank rows within the range.
A database design guide was produced by ACAPS in 2013, and can help you design your database.
How to practically manage these aspects in Excel? Check the Excel toolbox section about it (available in French).
3.2.3. Format your dataset
To easily read and use your dataset, it makes sense to ‘tidy’ it before cleaning the potential errors. Doing so will allow you to more easily visualize outliers, misspellings, duplications or other issues that need to later be cleaned. Here are a few things you can look at to tidy:
- Harmonize the font
- Align cells content (ex. Text align to left and number to right)
- Delete blank rows and columns
- Make sure your column names are clear, short and correct
- Delete useless spaces (How to do in Excel? - available in French)
While downloading data from MDC tools in Excel, it often happens that the format of cells is not well recognized. You should pay specific attention to cell format (ex. numeric field stored as text) to avoid later issues with analysis.