Before you choose the method that you’ll use to collect data, you need to think about the population you are studying (e.g. the group of visitors to your museum or online exhibition) and what amount of data (the sample) will be representative enough so you can report what you learn with confidence.
Intended Learning outcomes
This page is designed to help you:
Understand the principles of sampling, representativeness and validity for the type of data collection that you are likely to be doing.
Apply the principles to your data collection plan so that you can work towards getting a good sample.
Feel confident when reporting your results.
Where do I start?
While this might seem like a challenging part of Phase two - and it’s not easy - it can be broken into several key components.
Tip.
Start writing up your methodology right away.
1. Define the population.
Who are you trying to learn from? This is your population. You’ve already done this in Phase one - you should know who your key stakeholder(s) are.
If you are interested in feedback from the general public who might visit your website, this is a very big population. For example, we can include all Europeana Network Association (ENA) members in our population, We don’t include the wider heritage workforce in Europe who are not members, for example.
However, we’re only interested in a segment of the Network membership - educators - this becomes our target population. This is a much smaller population.
Is your target population homogenous (similar) or heterogenous (different)? You can ahead and ask yourself whether you will collect and segment your data based on characteristics like agenda, gender or location. If you have a diverse and heterogenous group, this has implications for your sample.
2. Think about validity.
There are two types of validity. They raise different questions that help you determine your sample, chose your method and shape your data collection plan.
Internal validity - to what extent are the outcomes you are observing actually the result of your activity, rather than other external influences (variables)? You can think about things like causality and attribution. We can increase our internal validity by controlling other variables. For example, if we were to survey the Europeana Network Association (ENA) membership, we might consider focussing our study on those who are not involved in European-projects where Europeana is a partner. This might explain why this person counts themselves as ‘active’ in the Network, though there may be other factors too.
External validity - to what extent can your findings can be applied to the real world and to other settings (e.g. if you would repeat the study)? This is the issue of generalisability, or ‘transferability’ in qualitative studies. You can generalise results normally for other groups (or samples) who share the same characteristics. For example, when surveying ENA members, we might try to compare results across different clusters to see to what extent the results are generalisable (i.e. that it is true for all groups and not just one specific sample).
See more in thishttps://www.scribbr.com/methodology/experimental-design/.
Tip.
It is useful to ask those that you are surveying if they attribute any positive impact to your intervention. You should be careful how you ask this, as some people may feel uncomfortable in saying ‘no’, e.g. because you are the organiser and you are asking the question, or because culturally, this is seen as very rude. An external pair of hands in data collection is sometimes valuable for this reason.
3. Agree your confidence interval and confidence level.
Looking back at external validity above, it’s important now to think about your confidence interval. If you were to repeat the same study, what scale of difference would there be if you compared the results? For example, would they be within five above or five below? Ten? This is your confidence interval. Five below would indicate a 95% confidence level. Ten below would indicate a 10% confidence level.
You should ask yourself what confidence interval you would expect and/or find acceptable. You can use this both to report on your findings and, even more importantly, plan your sampling approach. If you are happy that the results are not easily generalisable, then you can proceed with a lower confidence level.
95% is a fairly common and accepted confidence level.
4. Calculate your sample.
You now need to work out how you will sample the (target) population/stakeholder. This is based on an understanding that you can’t hear from everyone, so you have to try to get a sample that is representative enough so that you can report confidently on what you have learned.
Normally people say that 10% is a good sample, as long as the 10% is a heterogenous population (that is to say, a group made of people with the same backgrounds, experiences, age, for example). This is likely to be the case if you focus on a target population, but less likely if you focus on a general population. With more homogenous groups you need a bigger sample size.
The sample you need will define what method you use, and each method has different considerations for agreeing your sample. Below we think about how you can work out the sample you need based on two of the most commonly used data collection methods.
Questionnaires
10% is the minimum you should aim for for a representative survey sample. This is the case when you are collecting data up to 1,000 responses. For example, if you only have 600 visitors, try to collect at least 60 responses.
After you collect 1,000 responses, no matter how big your population size, you should normally have a good sample. For example, if you have 60,000 visitors, you don’t need to collect 6,000 responses - 1000 should normally give you representative perspective (see more in https://tools4dev.org/resources/how-to-choose-a-sample-size/).
Interviews
Interviews collect rich, qualitative data. Numbers of responses, or people that you interview, matter less in this context.
For a small population size, you might interview everyone involved. This will give you a very complete insight into the perspectives of the population.
For a big population size, it’s unlikely that you will have time or the money to interview 10% of the population. Rather, you should aim for a smaller sample that is representative of the overall group. If your overall group is very diverse, think about whose perspectives you need most (e.g. educators) and how many you would need to interview to get their perspectives. You should also think about how many people it would take to interview before you start seeing the same patterns or trends - this is called saturation. After this point, there is less value in interviewing more people because you do not learn anything new.
The sample therefore depends on how homogenous (how much it is the same) or heterogenous (how much it is different) your population is. If you have a very heterogenous or diverse population, you need to interview more people. If your population is homogenous, you might need to interview a small group. Therefore there is no fixed right or wrong with interview sample sizes, though some sources suggest that between 10 - 30 interviews are usually sufficient.
The sample will also depend of course on how much time you have available to conduct interviews. You have to be realistic and consider what you want to know and what you can do to get this perspective. If you need more data than you think you can get through interviews, consider combining them with another method, like a survey.
Tips:
Be clear on the diversity of the population and your target population.
Define clear criteria for who should be interviewed.
Be realistic about the time it takes to schedule, prepare for, deliver and transcribe an interview. This might have to inform your sample size.
Tip.
Look at different sample size calculators to help you determine what sample size will work for you.
Such tools ask you to consider:
The population size (e.g. everyone who participated in an event);
The sample size that you were able to survey (in terms of numbers or the percentage of respondents); and
Your confidence interval, namely, how confident you are (up to 100%) that the sample that you surveyed has the same attitudes or perspectives as the overall sample (see).
The calculator then works out your margin of error, which you should ideally report with your findings. Based on observation in the non-academic cultural sector, such margins of error are rarely reported.
Examples
Types of sampling
There are four main types of probability sampling used in quantitative research.
Simple random sampling
Randomly generate a list of the people who you will survey out of a bigger population. For example, out of 4,000 Europeana Network Association (ENA) members, we might use a random sampling tool to decide who we will survey instead of surveying the whole membership.
Stratified sampling
Different groups with the same characteristics in one population are divided into separate groups or ‘strata’ (the target population). Then these groups are randomly sampled. For example, we add all educators in ENA into one group and then randomly select members to survey, and do the same for researchers.
Cluster sampling
Somewhat the same as stratified sampling, but the whole population is broken up into clusters (randomly selected). All, or a certain number, of these groups are then randomly sampled and you can compare the results. For example, we group all ENA members into ten clusters, and we randomly sample members in five of these ten groups.
Systematic sampling
Select members of the population at a regular interval that you agree in advance. For example, we would survey every 10th person in a list of ENA members arranged alphabetically.
Add Comment