Why a Large Sample Doesn’t Guarantee a Representative Sample

Summary: The size of a sample is not a guarantee of its ability to accurately represent a target population. Large unrepresentative samples can perform as badly as small unrepresentative samples.

4 minutes to read. By author Michaela Mora on August 14, 2019
Topics: Analysis Techniques, Market Research Cartoons, Sample Size

Why a Large Sample Doesn’t Guarantee a Representative Sample

Let me say it, a large sample doesn’t guarantee a representative sample.

I often get asked, “What sample size do I need to get a representative sample?” The problem is that this question is not formulated correctly. 

Sample size and representativeness are two related, but different issues. The sheer size of a sample is not a guarantee of its ability to accurately represent a target population. Large unrepresentative samples can perform as badly as small unrepresentative samples.

A survey sample’s ability to represent a population has to do with the sampling frame; that is the list from which the sample is selected.

When we exclude some groups from the sampled population, we are faced with selection bias, which prevents us from claiming that the sample is representative of the target population.

Sample Selection Bias

Selection bias can occur in different ways:

Convenience sample

This includes respondents who are easier to select or who are most likely to respond. This sample will not be representative of harder-to-select individuals. Samples from online panels are a good example of convenience samples.

These panels include individuals who have expressed interest in participating in surveys, leaving out individuals who may be part of the target population, but are not available for interviews through the panel. Cost is often the main driver to use a convenience sample.

Under-coverage

 This happens when we fail to include all the target population in the sampling frame. Many online panels work hard at avoiding under-coverage bias, but the fact remains that certain demographics are underrepresented.

For example, it is difficult to field online studies targeted at the total Hispanic population in the US without using a hybrid data collection approach that allows us to reach unacculturated Hispanics. This group is usually underrepresented in most online panels.

We also see coverage bias in phone surveys that use telephone list sampling frames that exclude households without landline access. As more households substitute cell phones for their landlines, obtaining representative samples of certain demographic groups will soon be difficult without including cell phone lists in the sampling frame.

Nonresponse

 Selection bias also takes place when we fail to obtain responses from all respondents in the selected sample. Non-respondents tend to differ from respondents, so their absence in the final sample makes it difficult to generalize the results to the overall target population. This is why the design of a survey is far more important than the absolute sample size to get a representative sample of the target population.

Judgment Sample

This is a sample we select based on “representative” criteria from prior knowledge of the topic or target population. An example would be a study looking for a sample of teenagers, and trying to intercept them at a cross-section near a high school.

Misspecification of Target Population

 This happens when we use intentionally or unintentionally screening criteria that leave out important subgroups of the population.

Poor Data Collection Quality

 This can introduce selection bias when there are poor quality controls to ensure that we interview the designated members of the sample. An example of this includes allowing whoever is available in the household to take the survey instead of the intended member based on certain screening criteria.

Guidelines for A Representative Sample

So when it comes to getting a representative sample, the sample source is more important than the sample size. If you want a representative sample of a particular population, you need to ensure that:

  • The sample source includes the whole target population
  • The selected data collection method (online, phone, paper, in-person) can reach individuals, with characteristics typical of those possessed by the population of interest
  •  The screening criteria truly reflect the target population
  •  You can minimize nonresponse bias with good survey design, incentives, and the appropriate contact method
  • There are quality controls in place during the data collection process to guarantee that you reach designated members of the sample

For help on sample size calculation use our Sample Size and Margin of Error Calculators.

(An earlier version of this article was originally published on May 13, 2010. The article was last updated and revised on August 14, 2019.)