Link Search Menu Expand Document
Mobile Data Collection toolbox

5.7.1 Managing external data


TABLE OF CONTENTS


What is external data ?

External data is data that is not directly included in the survey conception but that is injected into the dataset. Importing and managing external data can be useful to integrate in the database information that already exists without the need to ask for it again, and also use existing databases for follow-up surveys, or in case of long lists that would take very long to integrate in the Mobile Data Collection (MDC) form (and can also slow down the application quite a bit). In addition, as external data lists are often characterised by their length, this tutorial also deals with the search function in a list.

Warning: The use of external data may slow down the application slightly and reduce the functionality available (e.g. the use of the Enketo online interface to create or modify a submission is not compatible with the use of external data).

Importing external data

Importing external data in .csv (comma separated value) can be useful to:

  • Update easily a list without having to create a new version of the XLSForm each time (i.e. make the versioning of forms easier)
  • Facilitate the integration of some database information that is already known, without re-asking for it

Importing external data is for example useful when having to use long list of locations (e.g. full list of villages with their districts and provinces information), or data from a previous survey (e.g. registration data with ID into a follow up form etc.).

Note: Several different csv can be used with the same XLSForm.

Using external data has however the following disadvantages:

  • Requires strong rigor in files management when the csv needs to be updated regularly (e.g. in case of follow up): proper file naming, no deletion of previous data… as well as rigor in the management of forms on the devices (needs resynchronization of the form on all the devices every time than a csv is changed)
  • Difficult to use for multi-language forms as the data stored in the csv doesn’t use the “::language” feature of XLSForm and are is therefore usually only available in one language
  • Use of external csv isn’t compatible with the “Enketo” web version of form and can only be used on mobile devices.

Comma Separated Value database (.csv)

image info

Tool focus : Steps to import external data into your KoBoToolbox project

To import external data you need to:

  1. Create an Excel database with the data you want to include (simple database with header with UNIQUE name for each column, no merged cells etc.). Each csv file should contain at least one column that can be used to uniquely identify each row.
  2. Convert your Excel file in a .csv (comma separated value) file in Excel, with the data that you want to include

    Be careful to select “CSV UTF-8” format (and not any other format of csv) when converting an Excel files. Some languages not using Latin characters can moreover not be compatible with the use of an external csv.

    If your Windows, Office or Excel is configured in French the csv generated by Excel won’t be separated by a comma but by a semicolon that isn’t compatible with KoBoToolbox. In that case either you need to:

    • Change your separators in Excel here
    • Switch the language of your Excel in English (automatically change the separator)
    • Use an Excel alternative such as LibreOfficeCalc to convert your csv with comma
    • Use an online convertor such as this one If you export directly from KoBoToolbox the data with the csv format (download data / csv export type), the exported data can NOT be automatically re-imported as the csv generated used semicolon instead of comma. It’s advised to export in Excel and then convert it in csv.
  3. Add the .csv file in the dedicated section in the settings of your form (in KoBoToolbox, look for “add document”)
  4. In your form, add :
    • [Optional] “select one” type question with search appearance to select in the external csv list the right record (see below)
    • calculate” type questions for each information you want to retrieve from your external database, and use the “pulldata function (see below)
    • If needed, a “note” or “acknowledge” question for displaying the results (otherwise the information won’t be visible for the enumerator)

If you want to have a csv with different language you can look at this blog post

3 different ways of integrating external data

As explained, using an external file as an attachment to your form can be very useful when you work with a big dataset or you would like to link your form to another database as it can both improve the performance and make it possible to update a list of options without having to upload a new version of the form. To manage external data, you have several options: here is a recap of when to use each of them.

Diffifulty Method Summary of what it does
Lowest csv file and the pulldata() function
  • Pulls into the list of options of a variable **all the elements from a column in your csv**.
  • Uses data from a csv file in a form but cannot filter the data.
Moderate csv file and the search () function
  • Pulls into the list of options of a variable all the elements from a column in your csv **based on a simple criterion**.
  • Use data from a csv file in a form. More advanced possibilities than a pulldata() function, it can filter the data based on several criteria.
Avanced XML file and an XML function Uses data form an XML file in a form. Advanced possibilities include the use of metadata and combining several filter criteria.

The first part below details how the pulldata() function works with an external csv. Unfortunately, as shown in the table above, using such a function isn’t always possible depending on what you wish to do.

Note: The use of these functions can be effective in several XLSForm fields: choice_filter, calculation, relevant, choices, etc. It will depend on the purpose of your use of external data.

Setting up the pulldata() function in your XLSForm

You can also check this video tutorial on “How to extract and access external data using “Pulldata” in XLSForm, KoBoToolbox, and ODK” and the KoBotoolbox tutorial on Pull Data functionality.

The general structure of this expression is:

  • pulldata(‘name_of_attachment’, ‘name_of_column_to_return’, ‘unique_id_column_in_attachment’, unique_id_to_match)

An example of that for a Follow-up survey is:

  • pulldata(‘preLoad’,’age’,’id_key’,${unique_id})
    • preLoad is the name of the csv file containing the constituent database
    • age is the name of the column from the csv file that we want to fetch
    • id_key is the title of the column from the csv file that contains the unique ID of the beneficiary list
    • ${unique_id} is the variable name, in the Follow-up survey, that contains the unique ID we are looking for (e.g. our current beneficiary)

The name of the constituent database, renamed “preLoad”

image info

The “preLoad” file contains an id_key column (with the unique identifier for each beneficiary in our program) as well as an “age” column (which is the value we are interested in displaying on the phone during follow-up)

image info

The related follow-up XLSForm: The question “unique_id” in the follow-up survey is the one where the enumerators indicate which beneficiary answers

image info

Using the above as an example, if the unique_id indicated by the enumerator was “idaho-3-31”, then the value returned on the phone will be “3”.

Filtering a list through the search function

Using the pulldata() function has several limitations. It can only pull data from one column of your csv file and if there are multiple rows which match your pulldata() criteria, it will pull only the first one. The search function allows for greater flexibility in filtering the csv data and can also return multiple matching values which is useful for example when populating a dynamic choice list for a select_one or select_multiple question. The search function is useful when you have a long list of elements (be it names, administrative entities etc) and want to select it.

We usually add to search function the autocomplete appearance that adds an autocomplete feature to the search (filter based on the text you begin to capture).For the respective question, it will allow the respondent to type the first few letters of their answer and will display a filtered list based on the text that has been typed. The autocomplete feature works on both Enketo and ODK Collect.

In the survey tab of the XLSForm :

type name label::English appearance
select_one nameC What is the name of the child? autocomplete

Screen of ODK Collect with autocomplete appearance

image info

Here is a video tutorial on “How to Use the Search Function in XLSForm, ODK, and KoBoToolbox - when you have thousands of choices”.

This feature could work for “select_onev” or “select_multiple” questions.

To use a search function within the .csv list uploaded it is important to adapt the following XLS syntax in the “appearance” column: autocomplete search(‘preLoad’).

The list name should then have the same name the column in the csv for which you want to filter the data.

XLSForm (Survey Tab)

image info

XLSForm (Choice tab)

image info

csv preLoad.csv

image info

The first way of using it is rather simple: for instance, you can search (‘Jean’) and it will extract all data from the list (csv file) called “Jean”. You just have to put the name of the csv file in the choice tab.

More complex : A search function within the .csv list uploaded can be set up – for this, it is important to adapt the following XLS syntax in the “appearance” column: autocomplete search{‘preLoad’,’matches’,’regionid’,${region}}:

  • preload” is the name of your csv file that was imported
  • regionid” is the filter on which the search function is based (often the name of the column where you want values to be taken from)
  • ${region} is the name of the variable to which “regionid” needs to correspond within the XLSForm

Remember that the search function only works if your list of data is in an external csv! Data fields pulled from a .csv file are considered to be text strings - therefore use the int() or number() functions to convert a pre-loaded field into numeric format.

You can also set it up in a more advanced way. For that, this function needs to be combined with options such as:

  • startswith
  • endswith
  • contains
  • matches

This will allow you have all rows with match in any specified column to be included (in the search).

For example: search(‘externalCSV’,’startswith’,’location’,${city})

  • the first parameter (“externalCSV”) is the name of your csv file
  • the second parameter is the function you want to use to search it: here the function is going to look at the beginning of the words
  • the third parameter specifies the filter on which the search function is based (often the name of the column where you want values to be taken from)
  • the fourth parameter is the name of the variable to which the filter needs to correspond

Warning: The search() function does not work with all the tools from the ODK world. It is in fact perfectly compatible with tools such as ONA, on both the Collect side but also the Enketo online capture interface. However, for KoboToolbox users, this function works on the Collect side but the version of Enketo implemented within KoboToolbox is not updated and therefore does not work for this function (on the day of writing of this resource i.e. 26/11/2019). Therefore, depending on the tools you plan to use make sure that you test it thoroughly to avoid any unwelcome surprises

Warning: The search() function will only work if the list is in an external csv. It cannot be used to filter data from the “choices” tab or from an external xml file (see “work with an XML file below” section).

Further information on the search() function can be found here:

c. Without using a csv: work with an XML file

This second function is dedicated to advanced users. It’s possible to use XML files as an alternative to CSV files to store external data which can be referred to in calculations, constraints or relevancies. This method allows the form creator to deal with yet more complex cases than those seen with the two previous functions by using the instance() function. It makes it possible to have more advanced filter criteria than the search() function presented above and also to work with metadata and really complex nodes (interdependency between several variables, existence of grouped questions inside a form, etc.) in databases. You have several possibilities:

  • Have a select_one or select_multiple question where your XML file contains the list of choices. You need to specify the question type: “xml-external” and the function instance will be in the “choices” sheet
  • Include data from your XML File in other formulas: calculation, constraint, relevant or even choice_filter.

To code this function properly please follow the instruction here.

For example in an XLSForm :

type name label::English hint::English calculation
calculate province_cal     instance(‘site_list)/root/item[name=${site}]/provincename

What the formula means:

  • your XML file is called “site_list
  • data is going to be taken in the “provincename” column
  • it will only display data from this column when the values match specific values of the column “name”, the specific value is the answer to the question “site” in the XLSForm
  • /root/item/ is the path in the XML file to get data