Testing For Significant Differences In Convenience Samples – What Is The Point?

Thursday, May 20th, 2010
by Michaela Mora Follow me on Twitter

Posted on May 20, 2010

Testing for Statistically Significant Differences

I meet many clients who worry about sample size  trying to ensure they get an enough large sample so that statistically significant differences can be found and inferences to a larger population can be made, but they often don’t know that these statistical tests were meant to work within the probability sampling theory framework.

Since the advent of online panels and the increase of online surveys using panel-provided samples, the issue of testing for significant differences using standard parametric tests has become a moot point in many research studies.

Nowadays many of the surveys conducted online use samples provided by online panels, but these are mostly convenience samples (non-probability). The populations of online panels include respondents who are willing to participate in studies, excluding those unwilling to be part of the panel who may be members of the target population we are after.

In probability sampling, each possible respondent from the target population has a known probability to be chosen. Probability sampling helps us to avoid some of the selection biases that can make a sample not representative of the target population. For more on this read Does A Large Sample Size Guarantee A Representative Sample?

A single probability sample doesn’t guarantee to be representative of a target population, but we can quantify how often samples will meet some criterion of representativeness. This is the notion behind confidence intervals. The probability sampling procedure guarantees that each unit in the population of interest could appear in the sample.

By taking into account all possible random samples that can be taken from a population, we can estimate how often the true value of an estimate can be expected to be within a specific range of values. So, when we  talk about a 95% confidence interval, this really means that the true value of a particular variable is expected to fall within an interval of values 95  out of 100 times we repeat the procedure. When an opinion poll indicates that 50% of people are in favor of a political decision with a +/-3% margin of error at a 95% confidence interval, it is really saying that we can expect that between 47% and 53% of people will be in favor of the decision 95 out 100 times, if we were to repeat the poll. When we test for significant differences, we are looking to see if the value falls outside that range.

Unfortunately, taking a probability sample is hard and costly. For most consumer research studies and social behavior studies, we really don’t know the size of the actual population of consumers behaving in certain ways or consuming certain products, and trying to find out would make the research prohibitively expensive. This is why we often have to settle for convenience samples like the ones offered by online panels. They still can offer valuable insights if designed with care, but again doing statistical testing in a convenience sample is pointless since the assumptions about probability sampling are violated.

Online panels are here to stay, and they will continue to be a source for affordable sample for market research. Research using convenience sample is often better than not research at all if the survey is well designed and screening criteria are used to define the target population.

A more appropriate case for testing statistically significant differences are random samples taken from a customer database, since this is essentially the population frame where we can count all members and estimate their probability to be chosen.

 However, if you don’t have a customer database or are interested in surveying non-customers, then  use a convenience sample, if that is what your research budget can afford or there is no other way to get to the actual population frame (list to pull the sample from), but don’t fret about testing for significant differences. You may feel more confidence if you are able to replicate the results in repeated surveys, but be always cautious about inferences made from convenience samples since there could be a hidden systematic bias in the data.

It is always important that whenever you use convenience samples  you consider the following when analyzing the results:

         1. Who is systematically excluded from the sample?

         2. What groups are over- or underrepresented in the sample?

         3. Have the results been replicated with different samples and data collection methods?

If testing for significant difference gives you peace of mind, even when using convenience samples, do it to confirm the “direction” of the data, but restrain yourself from doing inferences to a larger population.


OTHER RELATED ARTICLES


To learn more about our consumer data service visit Consumer Shopping Behavior Insights. To request consumer shopping behavior data and insights don’t hesitate to contact us.

What Is The Right Sample Size For A Survey?

Thursday, May 6th, 2010
by Michaela Mora Follow me on Twitter

Posted on May 6, 2010

Sample Size Trade-offs

Determining the sample size is one of the early steps that must be taken in the planning of a survey. Unfortunately, there is no magic formula that will tell us what the perfect sample is since there are several factors we need to think about:

  • ANALYTICAL PLAN: The research objectives and planned analytical approach should be the first factor to consider when making the decision on sample size. For instance, there are statistical procedures (e.g. regression analysis) that require a certain number of observations per variable. Moreover, if comparative analysis between subgroups in the sample is expected, the sample size should be adjusted for it to be able to identify statistically significant differences between the groups.
  • POPULATION VARIABILITY: This refers to the target population’s diversity. If the target population exhibits large variability in the behaviors and attitudes of interest being researched, a large sample is needed. If 20% or 80% of the population behaves in certain way, this indicates less variability than if 50% would do so. To be conservative, it is standard practice to use 50% (0.5) as the event probability in sample size calculations since it represents the highest variability that can be expected in the population.
  • LEVEL OF CONFIDENCE: This is the level of risk we are willing to tolerate usually expressed as a percentage (e.g. 95% confidence level). Although survey results are reported as point estimates (e.g. 75% of respondents like this product), the fact is that since we are working with a sample of the target population, we can only be confident that the true value of the estimate in that population falls within a particular range or what is called confidence interval. The level of confidence indicates the probability that the true value of the estimate in fact will fall within the boundaries of the confidence interval. How confident can you be? As confident as your tolerance for risk allows you to, knowing that the confidence level is inversely proportional to estimate accuracy or margin of error. The more confident you want to be, the larger the confidence interval that is needed, which leads to lower levels of precision.
  • MARGIN OF ERROR: Also known as sampling error, indicates the desired level of precision of the estimate. You have probably seen poll results quoted in the media, saying that the margin of error was plus or minus a particular percentage (e.g. +/-3%). This percentage defines the lower and upper bounds of the confidence interval likely to include the parameter estimate, and it is a measure of its reliability. The larger the sample, the smaller the margin of error and the greater the estimate precision.

Below is a table illustrating how the margin of error and level of confidence interact with sample size. To get the same level of precision (e.g. +/-3.2%), larger samples are needed as the confidence level increases. For example, if we want to be certain that in 95 out of 100 times the survey is repeated the estimate will be +/- 3.2%, we need a sample of 950.

representative sample vs. sample size

For more help on calculating sample size and margin of error, use our Sample Size and Margin of Error Calculators.

  • COST: Sample size cost is often one of the largest items in the budget for market research studies, especially if the target sample includes low-incidence segments or the response rates is low. Many times, our clients have to make a tradeoff between statistical accuracy and research cost. Recently, I received a call from a client who wanted to conduct an online survey with a sample of 1,000 respondents, which would give a statistical accuracy of +/-3.1% at the 95% confidence level, but would cost $8,000 based on certain screening criteria. At the same time, a sample of 400 respondents would give a statistical accuracy of +/-4.9% and cost $3,400. In this case, a 135% increase in sample cost would only yield a 60% gain in statistical accuracy. The client decided to conduct the study on the smaller sample.
  • POPULATION SIZE: Most of the time, the size of the total target population is unknown, and it is assumed to be large ( >100,000), but in studies where the sample is a large fraction of the population of interest, some adjustments may be needed.

SAMPLE SIZE CALCULATION CHECK LIST

As a summary, to determine the sample size needed in a survey, we need to answer the following questions:

  • What type of data of data analysis will be conducted? Will subgroups be compared?
  • What is the probability of the event occurring? – If not previous data exists, use 50% for a conservative sample size estimate.
  • How much error is tolerable (confidence interval)? How much precision do we need?
  • How confident do we need to be that the true population value falls within the confidence interval?
  • What is the research budget? Can we afford the desired sample?
  • What is the population size? Large? Small/Finite? If unknown, assume it to be large ( >100,000)

So the answer to the question “What is the right sample size for a survey?” is: It depends. I hope I gave you some guidance in choosing sample size, but the final decision is up to you. To calculate sample size and margin of error, use our Sample Size and Margin of Error Calculators.

Have you wondered, what sample size is needed to get a representative sample, read Does A Large Sample Size Guarantee A Representative Sample?

 

OTHER RELATED ARTICLES


To learn more about our consumer data service visit Consumer Shopping Behavior Insights. To request consumer shopping behavior data and insights don’t hesitate to contact us.

Our Clients Say...

Recently when our research manager left on emergency medical leave, we had the most important piece of annual research produced by the association looming over our heads. We were then introduced to Relevant Insights who analyzed the situation and stepped into help. With their assistance, we were able to complete the research project on schedule and in a way that met the expectations of our thousands of members. I would definitively recommend them.

Keith Vincent, Director of Marketing
PPAI