Why a Large Sample Doesn’t Guarantee a Representative Sample

 |  Posted: by
Does a large sample guarantee a representative sample

Let me say it, a large sample doesn’t guarantee a representative sample.

I often get asked, “What sample size do I need to get a representative sample?” The problem is that this question is not formulated correctly. 

Sample size and representativeness are two related, but different issues. The sheer size of a sample is not a guarantee of its ability to accurately represent a target population. Large unrepresentative samples can perform as badly as small unrepresentative samples.

A survey sample’s ability to represent a population has to do with the sampling frame; that is the list from which the sample is selected.

When we exclude some groups from the sampled population, we are faced with selection bias, which prevents us from claiming that the sample is representative of the target population.

Sample Selection Bias

Selection bias can occur in different ways:

Convenience sample

This includes respondents who are easier to select or who are most likely to respond. This sample will not be representative of harder-to-select individuals. Samples from online panels are a good example of convenience samples.

These panels include individuals who have expressed interest in participating in surveys, leaving out individuals who may be part of the target population, but are not available for interviews through the panel. Cost is often the main driver to use a convenience sample.

Under-coverage

 This happens when we fail to include all the target population in the sampling frame. Many online panels work hard at avoiding under-coverage bias, but the fact remains that certain demographics are underrepresented.

For example, it is difficult to field online studies targeted at the total Hispanic population in the US without using a hybrid data collection approach that allows us to reach unacculturated Hispanics. This group is usually underrepresented in most online panels.

We also see coverage bias in phone surveys that use telephone list sampling frames that exclude households without landline access. As more households substitute cell phones for their landlines, obtaining representative samples of certain demographic groups will soon be difficult without including cell phone lists in the sampling frame.

Nonresponse

 Selection bias also takes place when we fail to obtain responses from all respondents in the selected sample. Nonrespondents tend to differ from respondents, so their absence in the final sample makes it difficult to generalize the results to the overall target population. This is why the design of a survey is far more important than the absolute sample size to get a representative sample of the target population.

Judgment Sample

This is a sample we select based on “representative” criteria from prior knowledge of the topic or target population. An example would be a study looking for a sample of teenagers, and trying to intercept them at a cross-section near a high school.

Misspecification of Target Population

 This happens when we use intentionally or unintentionally screening criteria that leave out important subgroups of the population.

Poor Data Collection Quality

 This can introduce selection bias when there are poor quality controls to ensure that we interview the designated members of the sample. An example of this includes allowing whoever is available in the household to take the survey instead of the intended member based on certain screening criteria.

Guidelines for A Representative Sample

So when it comes to getting a representative sample, the sample source is more important than the sample size. If you want a representative sample of a particular population, you need to ensure that:

  • The sample source includes the whole target population
  • The selected data collection method (online, phone, paper, in-person) can reach individuals, with characteristics typical of those possessed by the population of interest
  •  The screening criteria truly reflect the target population
  •  You can minimize nonresponse bias with good survey design, incentives, and the appropriate contact method
  • There are quality controls in place during the data collection process to guarantee that you reach designated members of the sample

For help on sample size calculation use our Sample Size and Margin of Error Calculators.

(An earlier version of this article was originally published on May 13, 2010. The article was last updated and revised on August 14, 2019.)

Comments Comments

Sample Questionnaires Posted: June 7, 2010

Convenience sample explains about those people who have interest in participating survey. what do we call others.

Michaela Mora Posted: June 7, 2010

When we select a sample from a sample frame where we know the probability of someone being chosen, we call that a random sample. They are given a chance to participate regardeless whether they want to participate or not.

CJ Posted: June 23, 2010

Actually, convenience sample refers to how people are chosen to participate, (i.e., everyone who shows up at a clinic on a certain day that is chosen for data collection), not whether they opted to participate.

Michaela Mora Posted: June 23, 2010

There is an opt-in component when we talk about online panels, not necesarily particular studies. When we use online panels, we get samples from people that happen to be in the panel the same way you get those that happen to show up at the clinic. Panel samples are non-probability samples. Not all members of the population have the same chance of being chosen, but only those who are part of the panel.

Example Questionnaire Posted: August 3, 2011

Indeed it’s not so hard to calculate a representive sample, the hard part is to conduct the survey on this sample cause there are always people who don’t to part of a survey

Only logged in users can leave comments.