Sampling, representativeness and validity

Before you choose the method that you’ll use to collect data, you need to think about the size of population you are studying (e.g. the group of visitors to your museum or online exhibition), the characteristics of that population and the amount of data (the sample) that will be representative enough so you can report what you learn with confidence.

Intended Learning outcomes

This page is designed to help you:

  • Understand the principles of sampling, representativeness and validity for the type of data collection that you are likely to be doing.

  • Apply the principles to your data collection plan so that you can work towards getting a good sample.

  • Feel confident when reporting your results.

Where do I start?

While this might seem like a challenging part of Phase two - and it’s not easy - it can be broken into several key components.

Tip.

Start writing up your methodology right away.

1. Define the population.

Population = the whole group at the centre of the research question.

Who are you trying to learn from? This is your population. You’ve already done this in Phase one - you should know who your key stakeholder(s) are. Ask yourself: can - or should - you survey the whole population?

If you are interested in feedback from the general public who might visit your website, this is a very big population. For example, we can include all Europeana Network Association (ENA) members in our population, We don’t include the wider heritage workforce in Europe who are not members, for example.

However, we’re only interested in a segment of the Network membership - educators - this becomes our target population. This is a much smaller population.

What are the characteristics of the group? Is your target population homogenous (similar) or heterogenous (different)? You can ahead and ask yourself whether you will collect and segment your data based on characteristics like agenda, gender or location. If you have a diverse and heterogenous group, this has implications for your sample.

2. Think about validity.

There are two types of validity. They raise different questions that help you determine your sample, chose your method and shape your data collection plan.

Internal validity - to what extent are the outcomes you are observing actually the result of your activity, rather than other external influences (variables)? You can think about things like causality and attribution. We can increase our internal validity by controlling other variables. For example, if we were to survey the Europeana Network Association (ENA) membership, we might consider focussing our study on those who are not involved in European-projects where Europeana is a partner. This might explain why this person counts themselves as ‘active’ in the Network, though there may be other factors too.

External validity - to what extent can your findings can be applied to the real world and to other settings (e.g. if you would repeat the study)? This is the issue of generalisability, or ‘transferability’ in qualitative studies. You can generalise results normally for other groups (or samples) who share the same characteristics. For example, when surveying ENA members, we might try to compare results across different clusters to see to what extent the results are generalisable (i.e. that it is true for all groups and not just one specific sample).

See more in thisGuide to Experimental Design | Overview, Steps, & Examples.

Tip.

It is useful to ask those that you are surveying if they attribute any positive impact to your intervention. You should be careful how you ask this, as some people may feel uncomfortable in saying ‘no’, e.g. because you are the organiser and you are asking the question, or because culturally, this is seen as very rude. An external pair of hands in data collection is sometimes valuable for this reason.

3. Agree your confidence interval and confidence level.

Confidence intervals, confidence levels and margins of error: how do we use them?

  • Primarily quantitative research (numbers)

  • Share your perceptions about the representativeness of the sample and robustness of the findings

  • Help others to interpret the data

  • Guide you to use a tool to determine the sample for your research

Confidence means probability. If you were to repeat the same study, what scale of difference would there be if you compared the results? For example, would they be within five above or five below? Ten? This is your confidence interval. Five below would indicate a 95% confidence level. Ten below would indicate a 90% confidence level.

What level of risk or uncertainty am I willing to accept?

You should ask yourself what confidence interval you would expect and/or find acceptable. You can use this both to report on your findings and, even more importantly, plan your sampling approach. If you are happy that the results are not easily generalisable, then you can proceed with a lower confidence level. 95% is a fairly common and accepted confidence level. E.g. you are 95% sure that if you repeated the survey, the results would lie within these parameters.

Higher confidence interval demands a bigger sample. Resource limitations and some degree of uncertainty means that a lower confidence level may be appropriate.

What is a margin of error?

  • Helps you ascertain the sample size

  • Looks similar to the confidence level but different

  • Expressed as

    • ⧮ (or similar)

    • Percentage (e.g. 9%)

    • ± 9

4. Calculate your sample.

Sample = a small part of the whole population intended to show what the group is like or experiences

Sampling = gathering information or data from a subset of a larger population, rather than from everyone

You now need to work out how you will sample the (target) population/stakeholder. This is based on an understanding that you can’t hear from everyone, so you have to try to get a sample that is representative enough so that you can report confidently on what you have learned. How can the sample be representative of the whole population?

Normally people say that 10% is a good sample, as long as the 10% is a homogeneous population (that is to say, a group made of people with the same backgrounds, experience, for example). This is likely to be the case if you focus on a target population, but less likely if you focus on a general population. With more heterogeneous groups you need a bigger sample size.

The sample you need will define what method you use, and each method has different considerations for agreeing your sample. Below we think about how you can work out the sample you need based on two of the most commonly used data collection methods.

  • Cost efficiency: collecting data from an entire population can be expensive and time-consuming. Sampling helps you to collect enough information with fewer resources.

  • Time efficiency: it may be impractical or impossible to collect data from a whole population, especially when the population is large or constantly changing. Sampling allows researchers to obtain results more quickly.

  • Feasibility: in some cases, it may be impossible to study an entire population due to logistical constraints

  • Accuracy: good sampling can provide accurate estimates of population experiences. Statistical methods are used to make inferences from the sample to the population, and these methods are well-established and reliable. (See more below)

  • Fewer errors: sampling reduces the chances of errors that can occur when trying to collect data from every member of a (large) population.

  • Ethical considerations: in some situations, it may be ethically or practically inappropriate to collect data from every member of a population.

  • Generalisability: if a sample is chosen correctly and represents the population well, the results from the sample can be generalised to the entire population, meaning that researchers can draw conclusions about a larger group based on the sampled data.

  • Variability management: knowing more about the characteristics of a sample and the variables that may effect their experiences helps to you manage the variability that may exist within a population.


Types of sampling

  1. Simple random sampling - randomly generate a list of the people who you will survey out of a bigger population.

  2. Stratified sampling - different groups with the same characteristics in one population are divided into separate groups or ‘strata’ (the target population). Then these groups are randomly sampled. 

  3. Cluster sampling - the whole population is broken up into a number of clusters and the results are compared.

  4. Systematic sampling - sampling at a regular interval.

  1. Maximum variation - surveying a diversity of the population

  2. Theory-based - defining who you want to sample as new theories emerge from your research

  3. Criterion - selecting people based on a particular criteria relevant to the research

  4. Snowball - those who you survey recommend others


Next step