Sampling, representativeness and validity

Before you choose the method that you’ll use to collect data, you need to think about the size of population you are studying (e.g. the group of visitors to your museum or online exhibition), the characteristics of that population and the amount of data (the sample) that will be representative enough so you can report what you learn with confidence.

Intended Learning outcomes

This page is designed to help you:

  • Understand the principles of sampling, representativeness and validity for the type of data collection that you are likely to be doing.

  • Apply the principles to your data collection plan so that you can work towards getting a good sample.

  • Feel confident when reporting your results.

Where do I start?

While this might seem like a challenging part of Phase two - and it’s not easy - it can be broken into several key components.

Tip.

Start writing up your methodology right away.

1. Define the population.

Population = the whole group at the centre of the research question.

Who are you trying to learn from? This is your population. You’ve already done this in Phase one - you should know who your key stakeholder(s) are. Ask yourself: can - or should - you survey the whole population?

If you are interested in feedback from the general public who might visit your website, this is a very big population. For example, we can include all Europeana Network Association (ENA) members in our population, We don’t include the wider heritage workforce in Europe who are not members, for example.

However, we’re only interested in a segment of the Network membership - educators - this becomes our target population. This is a much smaller population.

What are the characteristics of the group? Is your target population homogenous (similar) or heterogenous (different)? You can ahead and ask yourself whether you will collect and segment your data based on characteristics like agenda, gender or location. If you have a diverse and heterogenous group, this has implications for your sample.

2. Think about validity.

There are two types of validity. They raise different questions that help you determine your sample, chose your method and shape your data collection plan.

Internal validity - to what extent are the outcomes you are observing actually the result of your activity, rather than other external influences (variables)? You can think about things like causality and attribution. We can increase our internal validity by controlling other variables. For example, if we were to survey the Europeana Network Association (ENA) membership, we might consider focussing our study on those who are not involved in European-projects where Europeana is a partner. This might explain why this person counts themselves as ‘active’ in the Network, though there may be other factors too.

External validity - to what extent can your findings can be applied to the real world and to other settings (e.g. if you would repeat the study)? This is the issue of generalisability, or ‘transferability’ in qualitative studies. You can generalise results normally for other groups (or samples) who share the same characteristics. For example, when surveying ENA members, we might try to compare results across different clusters to see to what extent the results are generalisable (i.e. that it is true for all groups and not just one specific sample).

See more in thisGuide to Experimental Design | Overview, Steps, & Examples.

Tip.

It is useful to ask those that you are surveying if they attribute any positive impact to your intervention. You should be careful how you ask this, as some people may feel uncomfortable in saying ‘no’, e.g. because you are the organiser and you are asking the question, or because culturally, this is seen as very rude. An external pair of hands in data collection is sometimes valuable for this reason.

3. Agree your confidence interval and confidence level.

Confidence intervals, confidence levels and margins of error: how do we use them?

  • Primarily quantitative research (numbers)

  • Share your perceptions about the representativeness of the sample and robustness of the findings

  • Help others to interpret the data

  • Guide you to use a tool to determine the sample for your research

Confidence means probability. If you were to repeat the same study, what scale of difference would there be if you compared the results? For example, would they be within five above or five below? Ten? This is your confidence interval. Five below would indicate a 95% confidence level. Ten below would indicate a 90% confidence level.

What level of risk or uncertainty am I willing to accept?

You should ask yourself what confidence interval you would expect and/or find acceptable. You can use this both to report on your findings and, even more importantly, plan your sampling approach. If you are happy that the results are not easily generalisable, then you can proceed with a lower confidence level. 95% is a fairly common and accepted confidence level. E.g. you are 95% sure that if you repeated the survey, the results would lie within these parameters.

Higher confidence interval demands a bigger sample. Resource limitations and some degree of uncertainty means that a lower confidence level may be appropriate.

What is a margin of error?

  • Helps you ascertain the sample size

  • Looks similar to the confidence level but different

  • Expressed as

    • ⧮ (or similar)

    • Percentage (e.g. 9%)

    • ± 9

4. Calculate your sample.

Sample = a small part of the whole population intended to show what the group is like or experiences

Sampling = gathering information or data from a subset of a larger population, rather than from everyone

You now need to work out how you will sample the (target) population/stakeholder. This is based on an understanding that you can’t hear from everyone, so you have to try to get a sample that is representative enough so that you can report confidently on what you have learned. How can the sample be representative of the whole population?

Normally people say that 10% is a good sample, as long as the 10% is a homogeneous population (that is to say, a group made of people with the same backgrounds, experience, for example). This is likely to be the case if you focus on a target population, but less likely if you focus on a general population. With more heterogeneous groups you need a bigger sample size.

The sample you need will define what method you use, and each method has different considerations for agreeing your sample. Below we think about how you can work out the sample you need based on two of the most commonly used data collection methods.

  • Cost efficiency: collecting data from an entire population can be expensive and time-consuming. Sampling helps you to collect enough information with fewer resources.

  • Time efficiency: it may be impractical or impossible to collect data from a whole population, especially when the population is large or constantly changing. Sampling allows researchers to obtain results more quickly.

  • Feasibility: in some cases, it may be impossible to study an entire population due to logistical constraints

  • Accuracy: good sampling can provide accurate estimates of population experiences. Statistical methods are used to make inferences from the sample to the population, and these methods are well-established and reliable. (See more below)

  • Fewer errors: sampling reduces the chances of errors that can occur when trying to collect data from every member of a (large) population.

  • Ethical considerations: in some situations, it may be ethically or practically inappropriate to collect data from every member of a population.

  • Generalisability: if a sample is chosen correctly and represents the population well, the results from the sample can be generalised to the entire population, meaning that researchers can draw conclusions about a larger group based on the sampled data.

  • Variability management: knowing more about the characteristics of a sample and the variables that may effect their experiences helps to you manage the variability that may exist within a population.

Questionnaires

10% is the minimum you should aim for for a representative survey sample. This is the case when you are collecting data up to 1,000 responses. For example, if you only have 600 visitors, try to collect at least 60 responses.

After you collect 1,000 responses, no matter how big your population size, you should normally have a good sample. For example, if you have 60,000 visitors, you don’t need to collect 6,000 responses - 1000 should normally give you representative perspective (see more in How to choose a sample size (for the statistically challenged) - tools4dev).

Sample sizes in qualitative research

Qualitative methods like interviews result are likely to result in rich, qualitative data. Numbers of responses, or people that you interview, matter less in this context.

For a small population size, you might interview everyone involved. This will give you a very complete insight into the perspectives of the population.

For a big population size, it’s unlikely that you will have time or the money to interview 10% of the population. Rather, you should aim for a smaller sample that is representative of the overall group. If your overall group is very diverse, think about whose perspectives you need most (e.g. educators) and how many you would need to interview to get their perspectives. You should also think about how many people it would take to interview before you start seeing the same patterns or trends - this is called saturation. After this point, there is less value in interviewing more people because you do not learn anything new.

The sample therefore depends on how homogenous (how much it is the same) or heterogenous (how much it is different) your population is. If you have a very heterogenous or diverse population, you need to interview more people. If your population is homogenous, you might need to interview a small group. Therefore there is no fixed right or wrong with interview sample sizes, though some sources suggest that between 10 - 30 interviews are usually sufficient.

The sample will also depend of course on how much time you have available to conduct interviews. You have to be realistic and consider what you want to know and what you can do to get this perspective. If you need more data than you think you can get through interviews, consider combining them with another method, like a survey.

Tips:

  • Be clear on the diversity of the population and your target population.

  • Define clear criteria for who should be interviewed.

  • Be realistic about the time it takes to schedule, prepare for, deliver and transcribe an interview. This might have to inform your sample size.

  • If the sample is big, consider first sending a questionnaire then following up with interviews.

  • Acknowledge the time demands of data collection and analysis.

Tips:

  • Don’t let the complexity stop you from tackling quantitative data analysis

  • Use the principles to define your ideal sample size

  • Use margin of error or sample size calculators - look at different sample size calculators to help you determine what sample size will work for you.

Such tools ask you to consider: 

  • The population size (e.g. everyone who participated in an event); 

  • The sample size that you were able to survey (in terms of numbers or the percentage of respondents); and 

  • Your confidence interval, namely, how confident you are (up to 100%) that the sample that you surveyed has the same attitudes or perspectives as the overall sample (see).

The calculator then works out your margin of error, which you should ideally report with your findings. Based on observation in the non-academic cultural sector, such margins of error are rarely reported.

Examples


Types of sampling

  1. Simple random sampling - randomly generate a list of the people who you will survey out of a bigger population.

  2. Stratified sampling - different groups with the same characteristics in one population are divided into separate groups or ‘strata’ (the target population). Then these groups are randomly sampled. 

  3. Cluster sampling - the whole population is broken up into a number of clusters and the results are compared.

  4. Systematic sampling - sampling at a regular interval.

  1. Maximum variation - surveying a diversity of the population

  2. Theory-based - defining who you want to sample as new theories emerge from your research

  3. Criterion - selecting people based on a particular criteria relevant to the research

  4. Snowball - those who you survey recommend others


Next step