Link Search Menu Expand Document
Mobile Data Collection toolbox

5.5.2 Naming variables and options


TABLE OF CONTENTS


Naming of your questions (variables) or answer choices (options) is an important aspect during the design of your form. But, why is it so important?

  1. For your data analysis: Exporting your data with the variables and options “names” rather than labels will make your data much easier to read, with shorter and more expressive data.
  2. Whenever you need to modify your form: imagine that you have created your form and after significant testing you realize that you still need to modify it a bit. Having a consistent and harmonised naming convention throughout your survey, with descriptive and meaningful and easy to remember names for your questions and choices, will allow you to quickly find the variable(s) to be modified. On the contrary, should you have a non-harmonised naming system with names that are non-descriptive, it will take you much more time than expected to find the variables of interest and to modify them.
  3. Whenever somebody else needs to modify your form: imagine that one of your co-workers needs to do a very similar survey to yours and that they wish to start from your survey instead of starting from scratch (in order to avoid duplication). The same principles apply here: the more consistent, descriptive and meaningful and easy to remember your naming system is, the easier for him/her it is to start working from your survey. As it is said, be good to your neighbour (i.e. co-worker) and put a good naming system in place!

You can find here a a visual explanation of the definitions and the difference between name and label in coding.

How to establish a consistent and harmonised naming convention? There isn’t one “best” answer to this, however it boils down to: “Do what you want, but do it for a reason.” The fewer decisions you make without really having some reasoning behind it, the better your coding method will be. The following tips have all been set up with this in mind. Therefore, feel free to take them whole, or adapt to needs & preferences. This section also draws attention to the fact that you need to consider the use of the data when naming your questions and choices.

How to name a variable or an option

General rules for both questions (variables) and answers choices (options)

General rule Explanation Example
Name should be short
  • The longer your name is and the more time you will have to spend searching for what that specific variable or choice is in your database.
  • Also, KoBoToolbox doesn’t accept answer choice names that contain more than 32 characters and question/ variable names no longer than 30 characters.
Good examples:
- enumeratorGender
- beneficiaryGender
Bad examples:
- GenderOfTheEnumerator
- genderOfTheBeneficiary
Name should be descriptive and meaningful and easy to remember The name should mean something to you and should be clear as to what it refers to, otherwise you will have to spend much time searching for what information it refers to. Good example:
- firstWaterSource
Bad example:
- wsf1
Name should be consistent and follow the same pattern throughout your groups and whole survey You might have to ask the same set of questions several times in your survey, say when you are interviewing several members of a household. Having the same root in the naming of those questions, followed by information about the question itself (and in the case of several household members, information about which member) will help you identify which information it is more quickly. Good examples:
- beneficiaryName
- beneficiaryGender
- beneficiaryAge
Bad examples:
- nameOfTheBeneficiary
- genderOfTheFirstBeneficiary
Name should not contain any spaces, accentuation (“à è ô”) or characters (“+”, “%”, “&”, “(“, “/” etc.) You might be tempted to use some symbols in the name in order to be more specific. However, KoBoToolbox doesn’t understand those and will not accept them.
It only accepts:
- lower cases (abc)
- upper cases (ABC)
- numbers (123)
- underscores (“_”)
Good examples:
- waterConsum15LPercent
- nonApplicable
- dnk
Bad examples:
- water consum/15L%
- non/Applicable
- Don’tKnow
Use Camel Case between two different words It can happen that you will have to use several words in one name in order to be more specific. However, as you cannot use any space or special characters, it might not be as visual as you would like. You can still make it more visual and thus clearer by using Camel Case. Good examples of Camel Case:
- firstWaterSource
- beneficiaryName
- beneficiaryGender
No numbers at the beginning of the name You can use numbers in your names, but rather at the end or in the middle of your name so that it is instantly clearer as to what information it refers to. Good examples:
- marketsOpenBefore2018
- marketsOpenSince2018
Bad examples:
- before2018MarketsOpen
- since2018MarketsOpen

Specific rules for the naming of questions (variables)

Specific rule: the name of the question (variable) should be unique Whatever you do, indifferently of how many variables you create and through which method you name your variables (XLSForm versus KoBoToolbox), the name of each one of them should be unique! KoBoToolbox will not accept to deploy your form if it detects that there is a variable name attributed to a variable more than once.

Why does the name of the question (variable) have to be unique?

The names of the question (variable) have to be unique so that KoBoToolbox (or any other MDC platform) knows exactly to which one you are referring to. Take the example of a “constraint”, a “skip pattern” or a “calculate”: when codifying one of these you are telling your KoBoToolbox what to do and you sometimes have to specify exactly to which variable/question you are referring to in your formula.

For instance, you want question B to appear only if the answer to question A is “yes” (skip pattern): you will then have to specify exactly to which variable the variable B will be related to. If variable C has the exact same name as variable A, KoBoToolbox will get confused and will not know exactly of which one you are talking about.

Example Let’s consider the following question in a survey and the possibilities to assign a variable name to it: What are the two most common reasons for not attending school?

Variable name suggestion Evaluation of each suggestion
school Certainly short, but not descriptive enough – there may be more questions about school in that survey, or about attendance to something other than school.
attendance idem
why_not_attending_school_2_reasons very descriptive and can easily be linked back to its original question, but rather long
reasons_noSchool better at finding that balance, but mixes underscore & capital letters, hence is not as consistent
reasons_no_school probably the best possibilities - short, descriptive and likely to be remembered after looking them up
whyNoSchool idem

Now let’s assume that in this survey we will ask the question separately for boys and for girls. Assuming we chose the last option above as a question name, there are a few ways we could adapt our naming:

  1. whyNoSchool_boys, whyNoSchoolGirls
  2. whyNoSchoolboys, WhyNoSchoolGirls
  3. whyNoSchoolBoys, whyNoSchoolGirls

Consistency would be the determining factor in this case; hence the third combination is probably the best convention (using camel case, consistent capitalization pattern).

Refer to the part 5.6.2 Getting started with the form builder to know where and how to rename variables and options in both KoBoToolbox online form builder and XLSForm.

Specific rule 2: When creating a new variable name, try to be both concise and descriptive For example, let’s say we want to assign a name a prompt about vulnerable groups of internally displaced people by shelter type. There are many ways to name this:

  1. vulnerable_groups_displaced_people
  2. vulnerableGroupIDP
  3. VULNGRP_IDP
  4. Question_5_2A

Of course, the first one is the most descriptive, but isn’t concise. What if you have that question nested inside a group of questions that is related to Protection and Security, which is itself nested inside a group of vulnerability factors? If you’re being consistent, you might have named the groups “protection_security” and vulnerability_factors, so in the database you have a column name such as “vulnerability_factors/protection_security/vulnerable_groups_displaced_people”. There are limits to the length that can be assigned to those columns, and they are database-specific. Bumping into those can lead to problems further down the road.

The very last option of naming the prompt after the question number (probably issued in the paper form), though commonly used, has many shortcomings:

  • Adding/removing questions means the numbering will have to be adjusted (useless work)
  • It forces you to go back and forth between the XLSForm, the paper form & the output to know what means what
  • Finally, your XLSForm isn’t self-contained anymore: to really know how the form is structured, you almost need to refer to this other document (word or printed form).

From our perspective, the 3rd option will work best, especially if the expression “Vulnerable groups” comes up often: in no time VLNGRP will be synonymous to “vulnerable_group”. It strikes a balance between being descriptive enough to be recognizable at first glance, while not being too cumbersome.

Specific rules for the naming of answer choices (options or values)

Specific rule 1: the name of the choices doesn’t necessarily have to be unique Unlike the name of the variables or questions, the name of the choices doesn’t necessarily have to be unique except in a given list.

Specific rule 2: keep the same pattern (i.e. use the same names for similar labels such as “other” or “do not know”) As specified above, there is no need for the choices to have unique values. On the contrary, whenever you have similar labels such as “other” or “do not know”, it is highly recommended to use the same names for them. Harmonization in the naming of the options across a given survey (and even across a mission, country or organization if possible) facilitates the analysis of the data collected.

Specific rule 3: a list can be reused as many times as necessary in the form (only in XLSForm) If you have to use the same list of answers several times throughout your survey, no need to create a new list of options specifically for those questions. You can use that same list as many times as needed. Here are some tips for managing long lists of choices in long forms with many single or multiple choice questions:

  1. Assign similar names to the “list name” items (which constitute the multiple choice questions in the “choice” tab) and the question name in the “survey” sheet. Some differentiators can be used, such as adding an “L” at the end (for list). Why do this? In this way you can easily change any of the list choices without having to constantly consult the main survey sheet (and vice versa). You know then that in order to change the relationHosts question, you have to change the relationHostsL list in the “choice” tab.
  2. You can put the different choice lists in alphabetical order, it is also easier to find them in the choice tab (that is why you put the differentiators at the end). The exception to this rule would be very common choice lists shared by several questions, such as Yes/No, Increase/Decrease/Stable, etc.

Specific rule 4: commonly used options (yes, no, don’t know…) should have special treatment In list choices, some options are recurrent for different questions, such as “Yes”, “No”, “Other”, “I don’t know”, etc. The objective in these cases should be to respect the above-mentioned characteristics of a good naming convention, but perhaps also consider the use of the data:

  • Because it may very well happen, during data analysis or even in in-built calculations in the survey, that there will be a need to sum up all the “yes” answer to a given question, assigning 1 to “Yes” and 0 to “No” can be a smart choice.
name label
1 Yes
0 No
dnk I don’t know
other Other

image info

  • To illustrate these principles, the sample data output below can for the most part be understood without even looking at the survey that produced that data:

image info

The question “consent_survey” can easily be summed up directly to show the number of responses, % of non-responses, etc.

To go further on tips to choose naming convention refert to part I. Sorry, what’s your name again? of the Advanced XLSForm coding part 1{target=”_blank”}.

Specific rule 5 (optional): using numeric values or names In the choice list, you can use numbers as values instead of letters. There are pro & cons to this, however it adds the objectivity that comes with assigning numbers to each options: a 5 is a 5, whereas the choice “Religious Minorities” could have a number of “values” assigned to it, endless combinations of camel-case or underscore separators, capital letters, etc.

Another potential benefit of this (although it can certainly be possible using letters as values as well) is to use the same value for very common options that appear in many lists:

  • -88 for “Others”
  • -99 for “Do not know”
  • 0 for “No”, 1 for “Yes”

You can establish similar conventions for any common value in your form. The upside being that when coding constraint, relevant conditions or “If other, please specify” prompts, the value to check for will always be the same. No need to go check the choice list for that, ever. Common relevant/constraint can also be copied & pasted, because the action to take if the answer is “Other” or “No” are often similar.

Warning: This is optional and should be used in very specific cases. Indeed, it can be difficult to analyse a database containing numeric values or names in comparison with descriptive names. Depending who will be working with the output, and what your analysis set-up is, you might decide that it is better for you to have those value readable right in the output file – especially if your analysis system isn’t programed and a real person will have to manually interpret the data directly from the server output.

For the question themselves, actual names have to be used. However, with regards to the choices of select_one question, there are 2 main conventions – using descriptive names or numeric values. Consider the following, which is simply the “choices” tab of an XLSForm:

Descriptives names: choices tab Numeric values: choices tab
list name name label::English list name name label::English
container jerrycan Jerrycan container 1 Jerrycan
container bucket Bucket container 2 Bucket
container basin Basin container 3 Basin
container bottle Bottle container 4 Bottle
container saucepan Saucepan container 5 Saucepan
container drum Drum container 6 Drum
container plastic_pouch Plastic pouches container 7 Plastic pouches
container other Other container 96 Other

image info image info

One of them assigns a numeric value to each option, from 1 to 7 (and 96 for “other”), while the other one uses a descriptive name (perhaps shortened slightly) instead. Their data outputs when downloaded from KoBoToolbox:

Descriptives names: output data Numeric values: output data
container container
jerrycan 1
bucket 2
bucket 2
other 96
jerrycan 1
jerrycan 1
saucepan 5
drum 6
bottle 4

image info image info

Example As an example, let’s say that the general conventions adopted in a survey are:

  • Use of “camel case”
  • Using descriptive names for list choices

That survey then contains the following question, with options offered on the phone below: What is your household’s income?

  • Less than 100
  • 100-200
  • 200-300
  • More than 300

While the convention (descriptive names) would have assigned some sort of description to each option (e.g. “moreThan300”), it may be worthwhile (depending on your analysis tool and plan) to use the middle value for each bracket instead (50, 150, 250 and some higher value for more than 300). Analysis is likely to use such a middle value, and it would be more convenient to have it directly in numbers than having to convert “lesThan100” into a value before being able to use it.

If you are using numeric values assigned to each option (the case on the left above), then consider using higher numbers for “other”, “don’t know” and similar answer. If you use a higher number (such as 96 above), then regardless of the length of the list you can keep the same value for “other” for all questions, whereas if you assign it a low number (e.g. 5), then any list of 5 options or more may require you to change the value assigned to “other”. It also makes the data output easier to read, because 96 or 99 will stand out from regular choices.

Consider the use of the data

How the data is likely to be used should be taken into consideration when establishing a naming convention. This includes:

  • Existing conventions or standards in your sector, within partner organizations with which collaboration is frequent, or with similar surveys. To understand what are standards how they can facilitate data usability and interoperability please refer to Module 5: Making data useful, useable and shareable of the IFCR Data Playbook, especially
    • 5 - 3 Should We Apply Standards to Our Data?
    • 5 - 4 Understanding data standards
    • 5 - 2 Standards support humanitarian action
  • Your analysis plan, including whether calculations are likely to be performed directly for a given question.

For example, if P-codes are already established and available in your area, you should consider using them in your XLSForm for administrative entities. This would make sharing the data with other organization easier (assuming they also use P-codes).

Humanitarian Exchange Language (HXL)

Humanitarian Exchange Language (HXL) is a standard for classification and naming of data in the sector. It allows to have a common understanding of data by standardizing it. It avoids wasted effort on manually copying, preparing, validating, and cleaning data which can be very useful for data sharing. The tag and the attribute system is simple, allows a good level of precision and helps avoid confusion over what the data actually means. Therefore, it helps overthrow the language barriers on datasets.

HXL is a data standard for messy data, that uses hashtags to speed up data processing and create interoperability across data sources. It is ingenious approach of coding data through hashtags (#), similar to Twitter.

Here is a tutorial that explain how to use the right tag and attribute on your data that you can use conjointly to the HXL Tag Assist.

Warning: There is no tag for sensitive or personal data as the main purpose of this language is to help to share data!

Refer to section 5.6.2 Getting started with the form builder to know how to add HXL tags and attributes in both KoBoToolbox online form builder and XLSForm.