Link Search Menu Expand Document
Quantitative data analysis Toolbox

2.4.3 Building an analysis plan


Research plan

Once you’ve identified the question(s) or problem driving your analysis and the resources that you have available, it is time to pull together an overall analysis plan for how to proceed. Below is a table of the information needed to get you started with creating a research plan. This is just a starting point and you may want to make some modifications to this template depending on your specific context.

Research question Indicator / variable Questionnaire question Data collection unit Desired disaggregation Analysis type
Specific research question you are trying to answer Indicator being calculated (from your logframe!) Relevant question on the survey Individual; household; community; etc. Village; district; region; country; etc. Frequency distribution; descriptive statistics; correlation; etc.

This is not always the case, but it is also possible to include in the analysis plan details of how data should be viewed. This can help to project the type of analysis you want, and to anticipate the final output.

Building a research plan will help you to be more aware of the data you already have available and the data you really need. To find out your data needs, you can also resort to part 4 - 1 Making decisions to get the data we need and part 4 - 4 Hands-on Review with External Data sets of the Module 4: Getting the data we need of the IFRC Data Playbook.
Don’t hesitate to also look into the one-page point 1D Developing the analysis plan in the The Kap Survey model - Knowledge Attitude and Practices that gives theoretical and concrete guidelines on analysis plan.

Specific examples on each section are provided in the analysis plan for the case study that can be seen here.


To ensure your analysis will be correct, you sample should be representative of what you are trying to study.

  • The first thing for that is to know the survey population (could be the total population of a city or a camp for instance), as well as the sampling unit (could be individuals or households for instance)
  • Then you need to choose your sampling method: it can be random or non-random. For statistically representative results random sampling is recommended, but not always possible.
    • There are 5 main types of non-random sampling methods that you can use: self-selection, snowball, quota and convenience, purposive sampling (not presented here). But keep in mind, that the representativeness of your sample might not be optimal and might bias your observation.
    • Similarly, there are 3 main type of random sampling that you can use: Simple random sampling, Two stage cluster sampling, Systematic random sampling.

Non-random sampling methods


Principle: Individuals in the target population are invited to complete the survey. They are free to respond or not.

Example of use: Health needs assessment
The organisation in the field has set up a questionnaire. It is present in a community where it wants to assess health needs. The inhabitants are invited to come and answer the questionnaire. There is no obligation or compensation for completing the survey.

  • Advantages: Easy to administer
  • Limitations:
    • Difficulty in obtaining a representative sample of the target population.
    • Requires that the entire target population has access to the information and is able to express interest in participating in the survey.
  • Potential biases:
    • Over-representation of cooperators.
    • Over-representation of people who have a sensitivity/attraction to the survey topic.

Snowball sampling


  1. Selection of the first n1 respondents (= germs) from the target population
  2. The respondents redirect us to other people = the respondents of the next wave.
  3. (… etc …) Ideally : Until the composition of the final sample is independent of the composition of the initial germs.

Example of use: Assessing the needs of a homeless population in a city.
As there is no list of all homeless people in the city, random sampling is not possible. Through the NGOs working with this population, you get in touch with people who are willing to respond. You then ask them to put you in touch with other homeless people in that city.

  • Advantages:
    • Allows access to hard-to-reach respondents.
    • Better participation rate (*)
    • More sincere responses (*)

(*) Because respondents are identified and approached through a personal network

  • Limitations:
    • Difficulty in obtaining a representative sample of the target population.
  • Potential biases:
    • Over-representation of people with an extensive social network. Conversely, lack of isolated people.
    • Over-representation of cooperative people.

Quota sampling

Principle: establish quotas of people to be surveyed according to the characteristics and proportions of the population. The aim is to try to give the sample a structure similar to the mother population on certain criteria (normalisation hypothesis).

image info

The quota variables (x1 and x2) must be correlated with the variable of interest. This requires a priori knowledge of the phenomena studied and the main variables that determine behaviour. This method is used when the mother population is well known.

  • Advantages: Easy to administer
  • Limitation: There is a frequent lack of information about cross-quotas. The interviewer just has to respect the marginal quotas so it is possible to obtain cross-quotas.
  • Potential biases: The probability of selection of individuals is unknown. One type of population will remain unsurveyed (selectivity bias).

Random sampling methods

Simple random sampling

Principle: All individuals in the target population have the same probability of being selected into the sample.

Example of use:
Target population: all (N) members of a refugee camp.
Selection of n individuals from a list of refugees maintained by UNHCR.

In practice, use Excel’s RAND function to generate a random number for each individual, then select the first n individuals.

  • Limitations:
    • Requires an exhaustive list of the entire target population : Very heavy databases to manage.
    • The list must be able to identify and contact individuals without ambiguity.
    • Information must be up to date and of good quality.
    • Geographical dispersion of selected individuals : Additional costs and logistics.
  • Potential biases:
    • If the list does not meet the above criteria.
    • If the individuals in the target population have very different characteristics (very heterogeneous target population).

Example: Administrative lists: exclusion of households that are not listed.
Do the excluded households have different health behaviours? If so, then the sampling is biased.

Simple random sampling can be used for in-depth evaluations and especially when :

  • There is available data on the population
  • All parts of the affected area are accessible
  • The situation is reasonably stable
  • There is sufficient time to visit all selected households and conduct the required number of interviews.

Stratified random sampling

Principle: Individuals are partitioned into homogeneous subgroups (= strata) (with respect to certain distinct characteristics of the target population). Then simple random extraction for each stratum.

When to use this method? When the individuals in the target population have very different characteristics. In which case, a simple random sample could be catastrophic.

Example of use: Final evaluation of a health programme
Target population: members of 2 neighbouring villages in which the community health programme took place.

  1. Village 1 without clinic - N1 inhabitants
  2. Village 2 with clinic - N2 inhabitants

If n1/N1=n2/N2 –> Same probability for all individuals to be selected = Proportional stratification

Selection of individuals to be surveyed

  • Village 1: n1 randomly selected inhabitants
  • Village 2: n2 randomly selected inhabitants

Example 2: A school has 180 female students and 260 male students. You want to ensure that the sample reflects gender balance, so you sort the students into 2 strata based on gender. You then use random sampling on each group, selecting 40 girls and 60 boys, giving you a representative sample of 100 students.

image info

  • Advantages:
    • Allows the heterogeneity of the target population to be represented. To do this…the behaviour of individuals in a given stratum should be as similar as possible with respect to the variable of interest.
    • Reflects the true representation of the population - if done well.
  • Limitations:
    • Difficulty in choosing the stratification variable.
    • Requires auxiliary information for all units in the target population.

Often the stratification variables are qualitative and not quantitative which makes stratification more complex.

The random walk

Principle: The interviewer walks through the survey area following a specific pattern to randomly select the individuals to be interviewed.

Example of a random walk pattern:

  1. Randomly select a starting point (for example, a randomly selected intersection).
  2. When you arrive at this point, rotate a pen to choose the direction in which you will walk first.
  3. To decide which household you will visit first, randomly select a number from 1 to 10 and visit the given household (for example, if you randomly select 5, visit the 5th household).
  4. Continue the survey every X households (for example, every 7 households) based on the total number of households and your sample size (this is called the sampling interval).
  5. When you reach the end of the area in the given direction, you turn the pen again.
  6. Continue in this direction until you have interviewed the required number of respondents.
  • Advantages: No need to have a list of households prior to collection.
  • Limitations: Representativeness of the population in case of absence of some households during the survey.

Define the sample size

The sample size varies according to : • The size of the population of interest • The desired level of confidence • The desired level of precision (sampling error)

Confidence level

First of all, you need to define the desired level of confidence. This is the certainty that the results of the sample accurately reflect (are representative of) the entire population under study. The commonly used confidence level is 95%.
A 95% confidence level means that 95 times out of 100, the sample results will accurately reflect the entire population. If evaluators are willing to be 90% sure, they can use a smaller sample. If they want to be 99% sure, they will need a larger sample.

Level of accuracy

It is then important to consider the level of precision of the estimates desired. This is known as the sampling error, or margin of error. This is the average difference between all estimates for all possible samples and the value that would be obtained if the entire population had been surveyed.

Example: a survey may indicate that 48% of women attend a health centre, with a sampling error of +/- 3 points, which means that if all women in the parent population were surveyed, the actual proportion would be 45-51% (48 +/-3%) of women attending a health centre. This is called the confidence interval.

Defining the sample size

The sample size is a function of the size of the population of interest, the desired level of confidence and the desired level of precision. The sample size can be determined by

  1. Using a formula. The following online calculators allow you to estimate the sample size you need for :
  2. Using a reference table that indicates the sample size needed for a given confidence level:

image info

Depending on the sampling method, the sample size, margin of error and confidence level will vary:

  • Sample size: number of units surveyed = size of the girl/observed population.
  • Confidence level: the certainty that the sample results reflect reality, within the margin of error. A higher confidence level requires a larger sample size.
    • Usually 95%.
  • Sampling error: margin of error: the average difference between all estimates for all possible samples and what exists in the target/mother population (= actual value - estimated/observed value).
    • Usually 5%.

The desired confidence level and precision can be improved by increasing the sample size. The standard to aim for is a 95% confidence level (margin of error +/- 5%). The larger the margin of error, the less accurate the results. The smaller the population, the larger the ratio of sample size to population size.