Link Search Menu Expand Document
Excel toolbox

3 Data cleaning

Data are unorganised raw facts or figures that need to be processed and analysed. Variables are a type of data that can change. They form the basis of most analyses performed to understand situations, trends and linkages.

Data and variables can take different forms: Simple and random in appearance, or statistical and complex in appearance. In any case, data and variables form the basis of the analysis but are of no use until they are processed, analysed and finally converted into information. Before being analysed, the data must first be checked for possible errors. Thus, database cleaning is primarily a logical process, including data consistency analysis and triangulation with other available information.

Some errors are difficult to detect before the analysis begins; for example, some outliers are identifiable only when the data is better known. However, it is preferable to detect as many errors as possible in order to avoid having to backtrack when analysing the data.

Make sure that all changes made to your dataset have been recorded in a “change log”.

This module consists of 7 sub-parts:

This section often refers to :