5.6.4 Validation criteria


In form design, the terms validation criteria and constraint refer to the same function, the purpose of which is to constrain the input of responses to a certain format in order to improve the quality of the data by making it uniform.
TABLE OF CONTENTS
- Online KoBoToolbox form builder
- XLSForm
- Use of regex : When, how and examples in the humanitarian and development fields
Online KoBoToolbox form builder
The third menu gives the possibility to add validation criteria, a function that allows you to limit the response range and avoid invalid answers to be recorded. Here is a tutorial that provides few examples for online KoBoToolbox builder.
You can add one or more constraints by clicking on “+ Add another condition”. An error message can also be included to inform the interviewer that the data entered is not valid.
XLSForm
This page explains how to set up a validation criteria to an answer to ensure data quality. This page provides few examples as well as the CartONG XLSForm cheat sheet that gives many example of constraints.
You can also see the playlist Kobotoolbox Validation of Humanitarian Data Solution youtube channel that offers some tutorials on how to set up constraints in XLSForm.
Use of regex : When, how and examples in the humanitarian and development fields
Coding in XLSForm has become more and more documented and developed on the web. By following a few tutorials, you can start your XLSForm from scratch and build it up. However, there are some aspects of the coding that still remain in the dark and thus very blurry in our minds.
Regex (or regular expression) refers to the specific, standard textual syntax for representing patterns that matching text or integers need to conform to (source: Wikipedia). As this dedicated website perfectly shows, regex() are rich in terms of possibilities. However, only some specific parts of it come in useful in the humanitarian and development fields. In this resource, we will first briefly have a look at when and how to use regex() in XLSForm. We will then dig into more concrete examples of regex() use in our respective fields of work.
Interesting links to check out:
When to use regex()
When familiar with regex() structure, you might be tempted to use it whenever you can. However, regex() can get very complicated and sometimes it is not worth the pain. Let’s have a look at when regex() is most necessary and useful.
The most common and useful use of regex() is when you want the answer to your “text” or “integer” type of questions to have a specific format: you can do so by inputting a regex() in your question’s “constraint” column. The regex() formula will then make your answer be for instance, a 5-digit number or a word written in capital case letters etc.
Here’s an example of how a regex can look like in XLSForm:
And its result in your data collection app (ODK Collect here):
How to use regex()
The regex() constraint formula is expressed in the constraint column as follows:
regex(${value_to_apply_regex},[regular expression])
- ${value_to_apply_regex} can also be expressed by ”.” when in the constraint column of that same variable.
- As for the regular expression [regular expression] part of the regex() it has to be expressed between apostrophes (‘…’) for any “text” types of questions and between ‘^…$’ for any “integer” type of questions or “text” questions that have been transformed into integer by adding “numbers” in the “appearance” column.
Inside the regular expression, you can use […] to express all the characters that you allow into that part of the answer and {…} to express how many characters the enumerator can input from the range of characters that you have allowed to be used. You can also use ”+” in order to express the fact that the answer will contain several different parts in it. For any capital/lower case letters or numbers, there is no need to enumerate them. You can simply determine the ranges that can be used, such as:
- A-Z: all capital case letters can be used
- a-z: all lower case letters can be used
- 0-9: all digits can be used
Furthermore, as you are restraining the answer’s format by using regex(), it makes sense to make the question mandatory so that the enumerator will not to be able to skip it. By doing so, input a “yes” into the “required” column.
For further information on how to code a regex() you can consult those very useful websites:
However, be cautious as most of the available regex() information on the web is expressed in XML, which is a more complex codification.
Examples of regex() in forms
Simple example
Let’s first start with a simple example of a regex() in order to better understand how it is constructed. Suppose that you create a “text” question requesting to enter the code of the country where the survey takes place. The regex() will be expressed as follows:
regex(.,’[A-Z]{2}’)
It implies that only capital case letters can be used [A-Z] and that only 2 of them can be entered {2} for the country code to be validated as a response.
Email address
Source: regular-expressions.info
Let’s suppose that you want to ask for the interviewee’s email address and that you want to avoid any typing errors in the answer and make sure that it has a —-@—-.— format. You will have to create a “text” type of question and use the following regex() in the constraint column of your question:
regex(., ‘[A-Za-z0-9._%+-]+@[A-Za-z0-9-]+[.][A-Za-z]{2,4}’)
Now let’s have a look at how this formula is built up:
- ., : refers to the variable/question to be constrained ; here it is the question requesting the interviewee’s email address.
- [A-Za-z0-9._%+-] : it refers to all the characters that you can input before the “@” in your email address. Here, capital/lower case letters, numbers and several specific symbols (. _ % + -) are allowed. The number of characters that you can input is unlimited here.
- +@[A-Za-z0-9-] : it refers to all the characters that you can input after the “@” symbol in the email address. Here, Capital letters, regular letters, numbers and – symbols are allowed. The number of characters that you can input is unlimited here.
- +[.][A-Za-z]{2,4} : it refers to the end of your email address. First a dot (.) is required. Then 2 to 4 characters in capital or regular letters are required (i.e. .com, .int etc.)
- ‘…’ : as this is a text question, you can put the regular expression between apostrophes.
Name and Surname
Source: forum.opendatakit.org
This is another example of a “text” type of question which format you would like to constraint. Here you are asking for the interviewee’s name and surname and want to make sure that both start with a capital case letter and that there is a space in-between:
regex(.,’[A-Z]{1}[a-z]{1,50}[ ]{1}[A-Z]{1}[a-z]{1,50}’)
The formula is composed of the following elements:
- ”.,”: refers to the variable/question to be constrained ; here it is the question requesting the interviewee’s name and surname.
- “[A-Z]{1}[a-z]{1,50}”: this implies that the first letter of the name has to be a capital case letter and that the rest of it is written in lower case letters (up to a maximum of 50 letters). 50 is a safe bet as usually names are much shorter than that.
- ”[ ]{1}”: here you require a mandatory (and unique) space after the name.
- “[A-Z]{1}[a-z]{1,50}”: this implies exactly the same structure as for the name, but this time for the surname.
- “‘…’”: as this is a text question, you can put all the regular expression between apostrophes.
Household/serial numbers (containing numbers only)
Source : forum.opendatakit.org
For each “integer” type of question, ODK allows you to enter up to 9-digit integers. If you wish to limit those integers to a number containing less than 9 digits or to make the number contain more than 9 digits, you can use regex() in the constraint in order to do so. Typical examples of such “integer” questions in the humanitarian and development fields are household/serial/registration numbers or p-codes.
There are two different ways to do so, as expressed in this table:
Type of regex to be used | When? | How? |
---|---|---|
An “integer“ type of question |
|
|
A “text” type of question |
|
|
A regex() formula for an “integer” type of question – Example:
regex(., ‘^[0-9]{8}$’)
This formula requires that your number has exactly 8 digits, starting by a digit from 1 to 9 and followed by digits between 0 and 9.
As it is an “integer” type of question, the regular expression is to be put between ‘^–$’, where “^” shows where the number has to begin and “$” where it has to end.
A regex() formula for a “text” type of question – Examples:
regex(., ‘^[0-9]{10}$’) + “numbers” in the “appearance” column
This formula requires that your number has exactly 10 digits and it can start by any digit, “0” included.
Even though it is a “text “ type of question, by specifying “numbers” in appearance you transform it into an “integer” type of question. Thus the same rules as for the integer questions apply, meaning that your regular expression is to be put between ‘^…$’.
regex(., ‘^[0-9]{5,}$’) + “numbers” in the “appearance” column
This formula requires that your number has 5 or more digits ({5,}) and it can start by any digit, “0” included. The regular expression is also to be put between ‘^…$’.
Codes containing numbers and letters
A last example of regex() constraint that you might need to codify in your survey is a code containing both numbers and letters. Project codes and registration numbers are a few examples of those codes. As the codes contain both numbers and letters, you will need to create a “text” type of question. Also, there is no need to mention “numbers” in appearance.
Example – “text” type of question requesting the project’s code:
regex(.,’[M][O][B][0-9]{2}’)
This code needs to start with MOB and by followed by two numbers from 0 to 9 (i.e. MOB21, MOB76). This regex() is composed of the following:
- ”.,”: refers to the variable/question to be constrained ; here it is the question requesting the project’s code.
- “[M][O][B]”: you necessarily have to start your code with MOB (in capital case letters).
- “[0-9]{2}”: your code then has to contain two digits from 0 to 9.
- “‘…’”: as this is a text question, you can put all the regular expression between apostrophes.
You will find all the examples from this resources in the attached XLSForm. And don’t forget, testing is key to success in Mobile Data Collection, so test, test, test!