Link Search Menu Expand Document
Mobile Data Collection toolbox

5.5.3 Frequent beginner errors


TABLE OF CONTENTS


Working with survey forms designed directly online in user friendly interfaces has led many to forget basic good practices in form design!

Here is a list of the 6 most frequent recommendations for beginners. It is by no means complete, but will help you avoid the usual pitfalls we have all been through.

  • Constrain your data collection as much as you can
  • Triple check your skip patterns
  • Close your open questions when you can
  • Make sure your list of options are comprehensive
  • Add cascading lists when you can
  • Ensure your variable names are meaningful

To learn more about the parameters you can include in your form to ensure the quality of the output data, please refer to the section 5.6.7 The XLSForm Cheat Sheet for examples of constraints, repeat, calculations, regex, cascading lists, appearance settings.

Constrain your data collection as much as you can

In order to facilitate future data processing and thus data quality, it is possible to specify constraints when creating a form.

Risk of not following this suggestion: This avoids inconsistent values (and therefore avoids too much data cleaning at the time of data analysis)

It is therefore advisable :

  • Prevent negative values from being entered when this is not possible (e.g. There cannot be -3L of water in a container), and more generally always specify the minimum and maximum values of a figure to be entered.
  • In a multiple choice question, prevent the selection of two incompatible values (e.g. “reason 1” and “don’t know”).
  • Check the concordance of values, for example by imposing the adequacy of the subset with the sets: make the validation of a value entered conditional on a previous response (e.g. there cannot be a household composition with a greater number of children than members of the household as a whole, just as the current date cannot be in the future).
  • Forcing the input of a certain format when necessary: use of the regex constraint (e.g. such as a telephone number or an email address which must be in the format __@__._)

About regexes you can see 5.7.5 To go further “All about regex() in xls form: when, how and examples in the humanitarian and development fields”

Warning: In the case of a survey form designed to be taken up and adapted in different contexts, having constraints that are difficult to understand may make it more difficult for someone else who does not have your level of knowledge to use it. In this case, be sure to document aspects that may appear complex at first glance.

Triple check your skip patterns

It is common for a questionnaire to have skip patterns, i.e. questions that are only asked based on the answer to other questions. If these are not set up correctly, this can have an impact on the quality of your data, as they will be incomplete or inappropriate (e.g. asking a household that did not benefit from a seed distribution what they thought of it).

Risks of not following this suggestion:

  • Wasting respondents’ time on questions that do not concern them.
  • Being forced to choose inconsistent answers in order to move on to the next questions if these are mandatory.
  • Having inconsistencies when analysing the data (some of which could be remedied by tedious post-processing, but others not).
  • Identify questions that are only relevant in certain cases.
  • Make the possibility of answering this type of question conditional on previous answers.
  • Make sure that you also handle question skips consistently in the calculations you make (see section “Calculations: dealing with blanks” in the 1st advanced tutorial in section 5.7.5 To go further).

Close your open questions when you can

It is often necessary to have open-ended questions to get qualitative feedback on an aspect. However, if the analysis needs to be quantitative as well, it is important to always consider whether this question could be made up of possible response options if certain options are frequently used, even if it means having an “other” option or a “comment” text variable associated with it in order to be able to add detail in cases where it is necessary.

Risk of not following this suggestion:

  • Time wasted on data cleansing
  • Reduced data quality

For this reason, when designing a form where quantitative analysis is desired, it is suggested that a closed rather than an open-ended question be used. This…

  • Makes it easier for the enumerator to choose the answer with a well established list.
  • Avoids recoding of responses during analysis (e.g., instead of leaving an open-ended answer for “What did you eat for lunch?”, it is possible to propose a list such as “vegetables, starchy foods, meat, fruit, other, nothing” and to allow respondents to select several answers).

Warning:Having a single or multiple choice question implies the need to include the necessary safeguards!

Make sure your list of options are comprehensive

Working with single or multiple choice questions as suggested in the point above requires you to think carefully about the list of options you are planning, to ensure that you do not miss any.

Risk of not following this suggestion:

  • Incorrect and randomly selected responses (if the question is mandatory), and also difficult to identify during data cleaning
  • Frustration of enumerators and respondents

For each list of answer options, consider identifying any safeguards that might be needed for that question:

  • Other
  • None
  • Don’t know
  • Not applicable
  • Do not wish to answer
  • The right person to answer is not present

Add cascading lists when you can

As mentioned in point number two, some questions may not be relevant based on previous answers: this is also the case for some answer choices.
Indeed, the most convincing example is when a questionnaire deals with different administrative levels and asks the respondent to select the region, then the district and finally the locality. (E.g.: If a form asks the respondent for their country of origin and city of birth, it is rather strange that the respondent selects “Democratic Republic of Congo” for their country and then has the option to select “Ouagadougou” for their city).

Risk of not following this suggestion:

  • Data quality if incorrect data is inadvertently entered
  • Burdensome for interviewers

It is therefore necessary to set up a cascading list to:

  • Facilitate data entry for respondents by avoiding endless lists of responses
  • Facilitate the analysis because it reduces the possible variables for each layer filled in

For more information on cascading lists refer to the part 5.6.7 The XLSForm Cheat Sheet

Ensure your variable names are meaningful

One of the last aspects to consider when creating online forms directly is that the “names” (computer names) assigned to the questions by the design tool are generated automatically. Renaming them is good practice in data management: it is mostly used when the questionnaire is going to be used by several people or on several occasions, or when the database is very large and you want to use short, meaningful names for the columns rather than full labels. While the designer of the form will understand what it is about, someone who takes over a database without having worked on its design can quickly find themselves blocked in their understanding.

Risks of not following this suggestion:

  • It complicates the analysis: the computer names of the variables are the names that will be used for the questions in the data export, which you can use instead of the question labels, which are sometimes a bit long. Beware, if these are not clear, you will need to consult the initial questionnaire to understand what they refer to.
  • This makes it difficult to understand the technical aspects of the form: reading a computerised variable name that has not been reworked does not necessarily make it possible to identify what we are talking about, nor to distinguish between two variables that are close to each other easily - particularly for someone who has not designed the form.
    (e.g. 001 here indicates a question about the under 30s but we have no way of knowing this)

image info image info

It is therefore advisable to :

  • Name variables directly in an understandable way to allow people outside the construction of the form to understand it (remember to avoid special characters, accents, and spaces which will not be considered valid)
  • Make sure that the names are not too long to remain easily readable in the analysis tool

image info

Refer to previous section to know good practice for variables and options naming.
Refer to section 5.6.2 Getting started with the form builder to know where to rename variable and options in both KoBoToolbox online form builder and XLSForm.