# Sampling, representativeness and validity

## Before you choose the method that youâ€™ll use to collect data, you need to think about the size of population you are studying (e.g. the group of visitors to your museum or online exhibition), the characteristics of that population and the amount of data (the sample) that will be representative enough so you can report what you learn with confidence.

## Intended Learning outcomes

This page is designed to help you:

Understand the principles of sampling, representativeness and validity for the type of data collection that you are likely to be doing.

Apply the principles to your data collection plan so that you can work towards getting a good sample.

Feel confident when reporting your results.

# Where do I start?

While this might seem like a challenging part of Phase two - and itâ€™s not easy - it can be broken into several key components.

**Tip. **

Start writing up your methodology right away.

## 1. Define the population.

* Population = the whole group at the centre of the research question*.

Who are you trying to learn from? This is your *population*. Youâ€™ve already done this in Phase one - you should know who your key stakeholder(s) are. Ask yourself: can - or should - you survey the whole population?

If you are interested in feedback from the general public who *might* visit your website, this is a very big population. For example, we can include all Europeana Network Association (ENA) members in our population, We donâ€™t include the wider heritage workforce in Europe who are not members, for example.

However, weâ€™re only interested in a segment of the Network membership - educators - this becomes our *target population. *This is a much smaller population.

What are the * characteristics* of the group? Is your target population homogenous (similar) or heterogenous (different)? You can ahead and ask yourself whether you will collect and segment your data based on characteristics like agenda, gender or location. If you have a diverse and heterogenous group, this has implications for your sample.

## 2. Think about validity.

There are two types of validity. They raise different questions that help you determine your sample, chose your method and shape your data collection plan.

**Internal validity** - to what extent are the outcomes you are observing actually the result of your activity, rather than other external influences (variables)? You can think about things like causality and attribution. We can increase our internal validity by controlling other variables. For example, if we were to survey the Europeana Network Association (ENA) membership, we might consider focussing our study on those who are not involved in European-projects where Europeana is a partner. This might explain why this person counts themselves as â€˜activeâ€™ in the Network, though there may be other factors too.

**External validity** - to what extent can your findings can be applied to the real world and to other settings (e.g. if you would repeat the study)? This is the issue of generalisability, or â€˜transferabilityâ€™ in qualitative studies. You can generalise results normally for other groups (or samples) who share the same characteristics. For example, when surveying ENA members, we might try to compare results across different clusters to see to what extent the results are generalisable (i.e. that it is true for all groups and not just one specific sample).

See more in thishttps://www.scribbr.com/methodology/experimental-design/.

**Tip**.

It is useful to ask those that you are surveying if they attribute any positive impact to your intervention. You should be careful how you ask this, as some people may feel uncomfortable in saying â€˜noâ€™, e.g. because you are the organiser and you are asking the question, or because culturally, this is seen as very rude. An external pair of hands in data collection is sometimes valuable for this reason.

## 3. Agree your confidence interval and confidence level.

**Confidence intervals, confidence levels and margins of error: how do we use them?**

Primarily quantitative research (numbers)

Share your perceptions about the representativeness of the sample and robustness of the findings

Help others to interpret the data

Guide you to use a tool to determine the sample for your research

**Confidence means probability. **If you were to repeat the same study, what scale of difference would there be if you compared the results? For example, would they be within five above or five below? Ten? This is your *confidence interval.* Five below would indicate a 95% *confidence level*. Ten below would indicate a 90% *confidence level. *

**What level of risk or uncertainty am I willing to accept?**

You should ask yourself what confidence interval you would expect and/or find acceptable. You can use this both to report on your findings and, even more importantly, plan your sampling approach. If you are happy that the results are not easily generalisable, then you can proceed with a lower confidence level. 95% is a fairly common and accepted confidence level. E.g. you are 95% sure that if you repeated the survey, the results would lie within these parameters.

Higher confidence interval demands a bigger sample. Resource limitations and some degree of uncertainty means that a lower confidence level may be appropriate.

**What is a margin of error? **

Helps you ascertain the sample size

Looks similar to the confidence level

*but different*Expressed as

â§® (or similar)

Percentage (e.g. 9%)

Â± 9

## 4. Calculate your sample.

**Sample **= a small part of the whole population intended to show what the group is like or experiences

**Sampling** = gathering information or data from a subset of a larger population, rather than from everyone

You now need to work out how you will *sample *the (target) population/stakeholder. This is based on an understanding that you canâ€™t hear from everyone, so you have to try to get a * sample* that is representative enough so that you can report confidently on what you have learned. How can the sample be representative of the whole population?

Normally people say that 10% is a good sample, as long as the 10% is a homogeneous population (that is to say, a group made of people with the same backgrounds, experience, for example). This is likely to be the case if you focus on a target population, but less likely if you focus on a general population. With more heterogeneous groups you need a bigger sample size.

The sample you need will define what method you use, and each method has different considerations for agreeing your sample. Below we think about how you can work out the sample you need based on two of the most commonly used data collection methods.