Testing For Significant Differences In Convenience Samples – What Is The Point?

 |  Posted: by

Most commercial survey research done these days uses convenience samples due to cost and feasibility issues.

Convenience Samples

AAPOR defines convenience sampling as “a form of non-probability sampling in which the ease with which potential participants can be located or recruited is the primary consideration.”

Convenience samples include online panels, mall intercepts, river samples, snowballing samples, and observational studies.

Why does this matter? I meet many clients who worry about sample size trying to ensure they get enough large samples to find statistically significant differences and make inferences to a larger population.

However, they often don’t know that these statistical tests were meant to work within the probability sampling theory framework.

Probability samples

In probability sampling, each possible respondent from the target population has a known probability to be chosen. Probability sampling helps us to avoid some of the selection biases that can make a sample not representative of the target population.

For more on this read Does A Large Sample Size Guarantee A Representative Sample?

A single probability sample doesn’t guarantee to be representative of a target population. On the other hand, we can quantify how often samples will meet some criterion of representativeness. This is the notion behind confidence intervals. The probability sampling procedure guarantees a chance that each unit in the population of interest could appear in the sample.

By taking into account all possible random samples we can draw from a population, we can estimate how confident we could be in the method we use to find the true value of an estimate within a specific range of values.

When an opinion poll indicates that 50% of people are in favor of a political decision with a +/-3% margin of error at a 95% confidence interval, it is really saying that we can expect that between 47% and 53% of people will be in favor of the decision 95 out 100 times, if we were to repeat the poll. When we test for significant differences, we are looking to see if the value falls outside that range.

Unfortunately, taking a probability sample is hard and costly. For most consumer research studies and social behavior studies, we really don’t know the size of the actual population of consumers behaving a certain way. Drawing a sample from a large unknown universe can make the research prohibitively expensive.

Online Panels

Due to cost constraints, we often have to settle for convenience samples like the ones offered by online panels. They still can provide valuable insights if designed with care.

However, without some modeling assumptions, testing for statistical differences and making statistical inferences with convenience samples may be pointless since the assumptions about probability sampling are likely to be violated due to:

  • Exclusion bias:  The majority of the target population is not exposed to recruitment and is likely to have no chance of inclusion in the sample.
  • Self-selection bias:  Those who volunteer to be of a panel and take the survey to determine the probabilities of participation. These participation probabilities are effectively unknown to the researcher.
  • Non-response bias: Opt-in panels tend to have very low participation rates, often in the single digits. 

Notwithstanding, online panels are here to stay, and they will continue to be a source for affordable samples for market research.

Research using convenience samples is often better than no research at all if we pay attention to survey design and use screening criteria to define the target population.

A more appropriate case for testing statistically significant differences is to take random samples from a customer database. This is essentially the population frame where we can count all members and estimate their probability to choose them.

How To Deal With Convenience Samples

However, if you don’t have a customer database or can only afford a convenience sample, don’t fret about testing for significant differences. To do that, you need to make adjustments to support assumptions underlying the model of what you are researching.

The recommended approaches to deal with convenience samples include:

  • Sample Matching: Match the sample with the target population on key relevant variables to reduce bias. The most common way to do this is with quota sampling.
  • Panel Sample Strategies: Inquire how the panel sample is sourced and what procedures are followed to create a representation of the general population. Still, you should position it as a sample from a panel population.
  • Weighting: Although not always adequate or accurate, weighting the sample by key variables may provide some corrections to sample skewness.

You may feel more confident if you are able to replicate the results in repeated surveys. However, be always cautious about inferences made from convenience samples since systematic bias in the data may still exist.

It is always important that whenever you use convenience samples you consider the following when analyzing the results:

         1. Who is the sample excluding in a systematic way?

         2. What groups are over- or underrepresented in the sample?

         3. Can you replicate the results with different samples and data collection methods?

If testing for significant difference gives you peace of mind, even when using convenience samples, do it to confirm the “direction” of the data. However, restrain yourself from doing inferences to a larger population.

Comments Comments

Ray Poynter Posted: January 12, 2011

I’d like to nuance your answer about sig testing convenience samples. A a measure of validity sig testing a convenience sample is pointless, but it can (and I think should) have a role in reliability.

When we use a convenience sample the population is the collection of people who could have been sampled, for example all the members of an online panel who meet the demographic screener. Let us say we interview 1000 people from a large panel, and assume we are interested in a value that comes out at 50%. Sig testing will suggest to us that the true value for the entire, relevant, convenience sample is between 47% and 53%. (We do not know what the true population is, but we do have an estimate for the panel).

If we conduct sig tests on a convenience sample we know that any changes that are too small to be significant are probably not reliable. So, we can use sig testing as a method of rejecting findings from our panel as being too small, leaving us to work with the results that are a least big enough to be reliable, and then use something else, perhaps triangulation from other sources to assess the trustworthiness of the findings.

Michaela Mora Posted: January 12, 2011

Thanks for your comment Ray. I understand your point, which is fairly common practice since most client can’t afford a truly random sample. The question is what constitute “too small”? For a large sample of 1000, you will find significant differences with small changes, which won’t show up with smaller samples.

Only logged in users can leave comments.