I often get asked “What sample size do I need to get a representative sample?” The problem is that this question is not formulated correctly.
Sample size and representativeness are two related, but different issues. The sheer size of a sample is not a guarantee of its ability to accurately represent a target population. Large unrepresentative samples can perform as badly as small unrepresentative samples.
A survey sample’s ability to represent a population has to do with the sampling frame; that is the list from which the sample is selected. When some parts of the target population are not included in the sampled population, we are faced with selection bias, which prevent us from claiming that the sample is representative of the target population.
Selection bias can occur in different ways:
- Convenience sample: This includes respondents who are easier to select or who are most likely to respond. This sample will not be representative of harder-to-select individuals. Samples from online panels are a good example of convenience samples. These panels are composed by individuals who have expressed interest in participating in surveys, leaving out individuals who may be part of the target population, but are not available for interviewing through the panel.
- Undercoverage: This happens when we fail to include all the target population in the sampling frame. Many online panels work hard at avoiding undercoverage bias, but the fact remains that certain demographics are underrepresented. For example, it is difficult to field online studies targeted at the total Hispanic population in the US without using a hybrid data collection approach that allows us to reach unacculturated Hispanics, who are usually underrepresented in most online panels. Coverage bias is also found in phone surveys that use telephone list sampling frames that exclude households without landline access. As more households substitute cell phones for their landlines, obtaining representative samples of certain demographic groups will soon be difficult without including cell phone lists in the sampling frame.
- Nonresponse: Selection bias also takes place when we fail to obtain responses from all respondents in the selected sample. Nonrespondents tend to differ from respondents, so their absence in the final sample makes it difficult to generalize the results to the overall target population. This is why the design of a survey is far more important than the absolute sample size to get a representative sample of the target population.
- Judgment sample: This is a sample selected based on “representative” criteria based on prior knowledge of the topic or target population. An example would be a study looking for a sample of teenagers, and trying to intercept them at a cross-section near a high school.
- Misspecification of target population: This happens when we use intentionally or unintentionally screening criteria that leave out important subgroups of the population.
- Poor data collection quality: This can introduce selection bias when there are poor quality controls to ensure that we interview the designated members of the sample. An example of this include allowing whoever is available in the household to take the survey instead of the intended member based on certain screening criteria.
So when it comes to getting a representative sample, sample source is more important than sample size. If you want a representative sample of a particular population, you need to ensure that:
- – The sample source includes the whole target population
- – The selected data collection method (online, phone, paper, in person) can reach individuals, with characteristics typical of those possessed by the population of interest
- – The screening criteria truly reflect the target population
- – You can minimize nonresponse bias with good survey design, incentives and the appropriate contact method
- – There are quality controls in place during the data collection process to guarantee that designated members of the sample are reached.
For help on sample size calculation use our Sample Size and Margin of Error Calculators.