Link Search Menu Expand Document
GIS Toolbox

5.3 How to process sensitive data


TABLE OF CONTENTS


Reduce risk by clarifying needs

It is important to realize that we are all responsible for sensitive data. It is then important to use our common sense when dealing with this data.

One of the first steps is to take a step back on what is expected from this data:

  • Is this use really relevant?
  • Can we find a way to reduce the risk by using this data?

Asking these questions can help to better define or redefine the need to avoid using personal or sensitive data or to judge its sensitivity in the context of the need. It is always important to make joint and clear decisions and not to isolate ourselves in order to make choices that may generate risks. If, after discussion, doubt persists, it is best not to use the data.

Making the data usable

Following the various cases of confrontation with sensitive data, we can ask ourselves how to integrate this data into our analyses and productions (maps, data) without putting the people concerned at risk.

**As a reminder, this is about applying the humanitarian principle of “do no harm” to data management **{: .text-orange}

The only way to process and exploit sensitive data is to make it anonymous, so that it can be represented without any identification at the individual level. In fact, when a dataset is properly anonymized, it loses the attributes of personal data and becomes easier and more secure to process.

With various data processing and analysis methods presented below, it is possible to take advantage of data that at first sight is unusable due to its sensitive content. Here are the main scenarios that could breach data protection/privacy, and the different GIS solutions that exist to deal with them, and to allow you to represent your data.

Aggregation

The main method of anonymizing sensitive geographic data is aggregation, which involves “embedding” unique information within a group or spatial area that does not allow for individual elements to be identified. This is also known as “clusters”.

Type of data: Personal geographic coordinates Important point : While this data is essential for the spatial analysis that can be done, it puts the persons concerned at risk and can be very dangerous in some cases (for example, coordinates allowing the identification of patients carrying viruses such as HIV or Ebola or diseases such as tuberculosis).

Example: the location of patients for a catchment area of a health center. It is necessary to know the location of people to measure the distance from home to the nearest health facility. In this case, the distance to the nearest hospital is not sensitive, but the coordinates are. The coordinates can be used to identify the precise location of a home/household/person.

Solution: A compromise is needed between data localization (accuracy), and usage, while respecting confidentiality. To do this, it is possible to aggregate the data in a zonal entity, such as an administrative zone or a regular mesh. This also allows a better visualization of the data (via graduated colors or proportional symbols), while respecting the confidentiality of the data.

Aggregation by administrative boundaries

image info

A representation of patients in point form (left) and in aggregated way (right)

While this method allows for confidentiality regarding the location of individuals, it may be too general and imprecise for analysis and decision making.

Aggregation on a regular grid

Another method consists in aggregating the data on the basis of a grid that covers the study area. This grid can be different types (grid, mesh, honeycomb).

image info

Example of aggregation by gridding to go from a point representation (left) to an aggregated representation (right)

The grid is the most commonly used shape in aggregation methods, but hexagons (or honeycombs) may sometimes be better suited to the situation.

image info

Source : Smiley, K.T., Noy, I., Wehner, M.F. et al. Social inequalities in climate change-attributed impacts of Hurricane Harvey. Nat Commun 1, 3418 (2022).

Fig. of Hex Grid: Each hexagon represents the number of residential buildings that would not have flooded had it not been for the impact of climate change in Harris County, Texas, during Hurricane Harvey.

Here are a few reasons among others that favor hexagonal grid aggregation:

  • The hexagon has a smaller perimeter to area ratio, thereby reducing sampling distortions caused by section effects, which are more present in the grid.

image info

Source: ESRI, Why Hexagons?

To determine the size of the cell, you must respect two important criteria:

  • Sufficient granularity to allow for relevant analysis
  • Keep the impossibility of visually identifying the location of the person (with particular attention to rural or very sparsely populated areas).

image info

The right level of granularity: making gps coordinates approximate to preserve the anomymat

Type of data: Personal geographic coordinates (example: location of surveyed households).

Problem: We want to represent the information from the interviews in a point representation without allowing to identify the surveyed households.

Solution: In order to reduce the sensitivity of the location of the information, a simple measure is to make the GPS coordinates approximate. This step can be performed as soon as the information is collected by choosing to collect only approximate locations. Depending on the collection context, one can reduce the precision to a hundred meters or more.

In the case where the data has a precise location, it is always possible to process it to make it less precise.

To do this, one of the simplest methods is to round the coordinates in decimal degrees. The smaller the number of decimals in the coordinates, the less precise the location will be.

image info

Example of coordinates rounding and rendering according to the number of decimals

This reduces the accuracy, but can create a grid effect where several points can be overlaid on the same coordinate.

Anonymization

It can be easy to use maps to locate a team or beneficiaries for best practices, but there are immediate issues of confidentiality and consent.

If the need is specific and legitimate, a map containing names or other identifying data can of course be created… On the condition that the persons concerned are informed and that they have the opportunity to give their consent and to be able to exercise their rights (which can be complicated). Failing that, or due to lack of time, access to such maps must be strictly controlled and limited to those who really need this tool.

Finally, we must take into account the durability or rather the quick obsolescence of such information and plan the destruction of this map.

Type of data: Personal geographic coordinates (example: location of personnel).

Problem: The aim is to represent the location of personnel in a context of increased violence (war, riots) without indicating their names and professions in the organization.

The purpose of such a map presents by nature its main risk and capacity of nuisance: if the map is used by persons with bad intentions, all the persons concerned are in danger.

Show me the solution: If an internal request requires to put on a map the location of the organization’s personnel, it is still possible to realize a confidential map that will contain this location but that will anonymize the employees by the use of identifiers created and communicated only to the concerned persons.

To do this, a unique identifier can be created for each location, and will be displayed on the map. Then, a table in the appendix will match each identifier to a name and an occupation for example.

Here again, a reflection on the need for granularity of the information represented is necessary.

Restrict the attributes

Finally, in all cases of personal data processing, it is important to keep only those attributes that are necessary for the map or analyses. A lot of information can be recorded and stored in files when it is not required.