The Rise Of The Wireless-Only Household [Infographic]

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here

In the most recent National Health Interview Survey (July – December, 2011) from the CDC, the data confirms an ever-increasing trend in the US of households that only have wireless phones. In the second half of 2011, a third of households were wireless-only, a 7.5% increase over the first part of the year.

Hispanics and young adults ( 25 – 29 years old) are over-represented in these households, information that should be considered when designing samples for phone surveys to minimize coverage error. Unfortunately, including cell phones in phone surveys’ samples is still an expensive undertaking. If you can’t afford it, at least be aware of the potential coverage error you will have by not including cell phones in your sample..

Check this infographic with some of the key demographics of wireless-only households.

Wireless-Only Households Infographic

What Is Statistical Significance?

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here

Menu

I hear questions related to statistical significance on a daily basis. It is usually some variation of “How much sample do we need to be significant?” which often reflects some confusion about the term.

Statistical significance is a concern when we are interested in detecting differences not due to chance between two or more groups (people, objects, ads etc.) being compared.

As sample size increases, the margin of error around a percent or a mean get smaller and we get, not only more precise estimates, but also more sensitivity to detect differences that are not due to chance. In a large sample, a difference of 1 or 2 percentage points may be significant, while in a smaller sample, where there is more variation, we may need to see more than 10 percentage points to detect significant differences.

In survey research, we often talk as if the results are finite point estimates when in fact we should be talking in ranges since there is always a margin of error around any estimate. So if the margin of error is +/-3% and we get a value of 50% for a variable, it means that the true value of the variable should be between 47% and 53%.

Now, if we measure the same variable in another group with a sample size where the margin of error is +/-5% and we get a value of 57% for the same variable, it means that the true value is expected to be between 52% and 62%. Despite the 7 percentage point differences, which seems large, we can’t say that it is statistically significant because there is some overlap between the margin of error range of each group (47% – 53% and 52%- 62%) and the true value of the variable in the second group could be 52% or 53% which are values included in the first group’s margin of error range.

How confident are we about this? We often say 95% confident, which means that if we repeat the study 100 times, we can expect similar results 95 times and be wrong 5 times. This is called Confidence level and the margin of error range is called Confidence Interval. In short, we want to make sure the true value falls within the same range every time we repeat the study. Unfortunately, statistical confidence has an inverse relation with estimate precision. If you want to be 99% certain then you have to allow for a larger confidence interval that will include the true value.

If there is no comparative analysis involved, it doesn’t make any sense to talk in terms of statistical significance. However, we are still concerned about estimate precision of results in total. We want our margin of error to be as small as our budget and tolerance for risk allow. To get greater precision, we need larger sample, which in turn costs more money. To be more certain, you sacrifice some precision. There is always a trade-off to make.

Next time when you considering sample size for a survey get ready to answer these questions:

  • What is the desired precision (margin of error)?
  • How confident do you want to be?
  • Can your budget accommodate the required sample size for the desired precision? If not, what are you willing to settle for?
  • Are you doing any comparisons between groups? If so, how many?
  • Can your budget accommodate the required sample size by group to make meaningful comparisons?

Unfortunately, the difference between the sample you want and the one you can afford is often significant (pun intended), so budget questions are always in the mix. For more help on calculating sample size and margin of error, use our Sample Size and Margin of Error Calculators.

How To Determine Sample Size for Segments?

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here

Sample Size for Segments

I recently got a request for advice via Twitter with this question: What % of segment needs to be interviewed to gain reliable insight for product optimization?

Reliability has to do with consistency of results across data collection instruments and points in time when the data is collected. I see this question being more about validity and representativeness which is related to population heterogeneity and sample source.

To determine the sample size of a segment we need to ask:

  • How homogeneous is the so-called segment? Are there any sub-segments that need to be represented? Usually, the larger the segments the more heterogeneous they tend to be. As heterogeneity increases, the need for a larger sample increases as well, so all subgroups are represented.
  • What is the sample source? Representativeness has to do more with where the sample comes from than with sample size. If you get it from the appropriate source with the right screening criteria you are a step closer to more valid results, although there are other factors that affect validity.
  • Is the segment going to be compared to another segment(s)? We should avoid too small samples if we are going to make comparisons since smaller samples have larger margin of errors. This means that the range, in which the true value of a parameter is found for a segment, is large and may overlap with the range for the true value in the segment we are comparing it to. The end points of each range are what we call Margin of Error. If we compare two small samples and can’t detect any significant difference it may be due to overlapping margins of error, not to actual lack of differences.
  • What is the level of risk we are willing to take? As we increase sample size the margins of error get tighter and precision improves. But how confident do we want to be in that the true value is indeed within the margins of error? Here we need to consider the Confidence Interval (C.I). The most commonly use is 95%. This simple says that if we repeat the study 100 times, in 95 times we should get similar results and we can expect to be wrong in 5 of 100.
  • How much certain and precise do we need to be? The thing is that Confidence and Precision go in opposite directions. If we want to increase our certainty that the true value falls within a range of values, we have to widen the range (margins of error), but this leads to a lost in precision.

Depending on budget and timeline constraints you could use two approaches to sampling for segments:

  1. Create quotas by segment. These act as independent groups, like smaller “total samples.” These quotas can be proportional to their size in the population or could be all the same size. In the latter case, you would need to weight the segments if you decide to merge the quotas in a total sample, otherwise some segments will be overrepresented and others underrepresented.

  2. Let the segments fall naturally in the total sample. This approach can be more expensive since you will need a larger total sample if you need large enough samples by segment to be able to do comparative analyses. If no comparisons will be carried out, then this is a more desirable approach to get all segments represented in the average values.

As you can see, estimating the sample size for a segment is not different from estimating the size for the total sample and there is no magical % to determine how large the sample size should be. Sorry.

Sample Size Matters

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here

What Sample Size Do we Need?

The first question I always get from clients interested in conducting a survey is about sample size. Many confuse sample size with representativeness. They are related, but not the same, particularly if convenience samples are used.

In random samples, as we increase sample size the chance each member of the target population has of being selected increases and consequently more segments of the population are likely to be represented. This is based on the assumption that we have a list with all the population members (population frame) and know their probability of being chosen. This could be the case of a customer database/list, if that’s our population of interest. 

In convenience samples, the population frame becomes the pool of individuals in the sample source (e.g. online panels), which may not include all segments in the target population or only have a few members of certain segments, depending on how the sample source is built. In this case sample quotas, weighting schemes, and mixed mode data collection methods (online/phone/intercepts) are often used in an effort to reach representativeness. 

Assuming that we are able to pull a representative sample of the target population by whatever affordable means are available to us, we need to give serious consideration to sample size. This is a case where size matters (pun intended). Why?

It is all about precision, tolerance for risk and cost. For samples smaller than 1000, we always have to think about how confident we want to be that estimates are within a particular range (level of confidence and risk), and how small we want that range to be (level of precision). Unfortunately, they go in opposite directions. Higher levels of confidence require greater ranges (margins of error) in small sample sizes.

For instance, we can be 95% confident that the true estimate for a variable in a sample of 400 is within +/-4.9%, however, if we want a smaller margin of error, in an attempt to gain more precision, with the same sample size, we have to sacrifice certainty and may need to accept a 90% confidence level to get a +/-4.1% margin of error. At the 95% confidence level you are more certain, but less precise as you expand the range to make sure the true value falls in it. At the 90%, you are more precise, but less certain.

If you want to get more precise estimates without sacrificing certainty in the results, then you have to increase sample size, which in turn increases research costs. As the table below shows, as sample increases the differences in margin of error across the different confidence intervals become smaller.

representative sample vs. sample size

At the end of the day, when it comes to sample size, you need to decide what it is more important to you, certainty or precision, and what your tolerance for risk is, especially if your market research budget is small.

For more help on calculating sample size and margin of error, use our Sample Size and Margin of Error Calculators.

 

How To Improve Online Survey Response Rates

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here

representative sample vs. sample size

I recently got an inquiry from a SurveyGizmo user asking about what response rate he could expect from using this online survey tool. Fortunately for any online survey tool, including SurveyGizmo, response rates to online surveys don’t depend on the survey tool you use.

First let’s distinguish between response rates, incidence rates, completion rates and non-response. They are related, but not the same, and some clients use these concepts interchangeably, which lead to confusion in sample size and cost estimations.

Response rates  are usually calculated based on the number of respondents who attempt to participate in a survey, even if they are disqualified after they have been screened with certain questions. If we send a survey invitation to a sample size of 100 people and only 5 attempt to take the survey, then the response rate would be 5%.  Response rates have been used for years as indicators of data accuracy, however recent research has indicated that lower response rates don’t necessarily mean low quality data.

Response rates are affected by:

  • Survey topic relevancy: People will not dedicate time to participate in surveys that are perceived as irrelevant.
  • Incentives: Sometimes an incentive is needed to motivate respondents, but careful consideration needs to be given to this. Incentives are a tricky subject since we may attract only certain types of respondents and insert selection bias in the sample.
  • Survey invitation: Survey invitations should be personalized and provide compelling reasons to participate in the survey. A poorly written invitation can drive respondents away or not catch their attention. Use appealing subject taglines and make the invitation short, clear and persuasive.
  • Type of relationship with target survey audience: Depending on the level of relationship respondents have with the brand, organization or company sponsoring the project they will be more or less motivated to participate. For example, customer surveys tend to have higher response rates than those targeted at non-customers. For more on this, check Survey Response Rate Directly Proportional to Strength of Relationship by Jeffrey Henning.
  • Privacy protection concerns: People are not comfortable sharing information if they don’t know how it is going to be used. Communication about privacy policy and data security should be clear.
  • Reminders: These may be needed to reach busy people or those not available  within a certain time frame when the first invitation is sent out.

Incidence rates are based on the number of respondents that qualify for a study based on certain screening criteria. For example, if we need a sample of females in the general population without any other requirements, the incidence rate is expected to be 50% since half of the population are women. Incidence rates will vary depending on who we are targeting with the study.

Response rates are often used to indicate the number of completed surveys, but I think it is worth to make the distinction between response rates and completion rates since this has methodological and cost implications ( e.g.  when we need to purchase sample from online panel providers).

Completion rates indicate how many people who qualified for the study completed the survey. If they enter the survey, answer some questions and then abandon the survey, they will be counted as incompletes and are usually excluded from the final data. The number of incompletes increases when:

  1. The survey is too long
  2. Survey flow is confusing
  3. There are skip logic errors that show irrelevant questions to respondents who can’t answer them
  4. Questions are poorly worded and instructions are unclear
  5. Questions are complex and requite a lot of mental effort from the respondent
  6. The respondent is not rewarded accordingly based on survey length and amount of effort required
  7. The topic and survey format can’t hold the respondent’s interest
  8. Privacy protection is unclear or lacking

Non-response occurs when we fail to get a response from the total sample either because respondents refuse to participate in the survey or they start but never complete it. If non-responses follow a pattern that systematically excludes a particular segment of the sample, they introduce what it calls selection bias, which will prevent us from getting a representative sample of opinions in the population of interest. Nonrespondents are often different from respondents, so their absence in the final sample can make it difficult to generalize the results to the overall target population.

In short, regardless of the survey tool you use, you can improve response rates and completion rates if you avoid most of the problems mentioned above.


To learn more about our consumer data service visit Consumer Shopping Behavior Insights. To request consumer shopping behavior data and insights don’t hesitate to contact us.


Testing For Significant Differences In Convenience Samples – What Is The Point?

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here

Testing for Statistically Significant Differences

I meet many clients who worry about sample size  trying to ensure they get an enough large sample so that statistically significant differences can be found and inferences to a larger population can be made, but they often don’t know that these statistical tests were meant to work within the probability sampling theory framework.

Since the advent of online panels and the increase of online surveys using panel-provided samples, the issue of testing for significant differences using standard parametric tests has become a moot point in many research studies.

Nowadays many of the surveys conducted online use samples provided by online panels, but these are mostly convenience samples (non-probability). The populations of online panels include respondents who are willing to participate in studies, excluding those unwilling to be part of the panel who may be members of the target population we are after.

In probability sampling, each possible respondent from the target population has a known probability to be chosen. Probability sampling helps us to avoid some of the selection biases that can make a sample not representative of the target population. For more on this read Does A Large Sample Size Guarantee A Representative Sample?

A single probability sample doesn’t guarantee to be representative of a target population, but we can quantify how often samples will meet some criterion of representativeness. This is the notion behind confidence intervals. The probability sampling procedure guarantees that each unit in the population of interest could appear in the sample.

By taking into account all possible random samples that can be taken from a population, we can estimate how often the true value of an estimate can be expected to be within a specific range of values. So, when we  talk about a 95% confidence interval, this really means that the true value of a particular variable is expected to fall within an interval of values 95  out of 100 times we repeat the procedure. When an opinion poll indicates that 50% of people are in favor of a political decision with a +/-3% margin of error at a 95% confidence interval, it is really saying that we can expect that between 47% and 53% of people will be in favor of the decision 95 out 100 times, if we were to repeat the poll. When we test for significant differences, we are looking to see if the value falls outside that range.

Unfortunately, taking a probability sample is hard and costly. For most consumer research studies and social behavior studies, we really don’t know the size of the actual population of consumers behaving in certain ways or consuming certain products, and trying to find out would make the research prohibitively expensive. This is why we often have to settle for convenience samples like the ones offered by online panels. They still can offer valuable insights if designed with care, but again doing statistical testing in a convenience sample is pointless since the assumptions about probability sampling are violated.

Online panels are here to stay, and they will continue to be a source for affordable sample for market research. Research using convenience sample is often better than not research at all if the survey is well designed and screening criteria are used to define the target population.

A more appropriate case for testing statistically significant differences are random samples taken from a customer database, since this is essentially the population frame where we can count all members and estimate their probability to be chosen.

 However, if you don’t have a customer database or are interested in surveying non-customers, then  use a convenience sample, if that is what your research budget can afford or there is no other way to get to the actual population frame (list to pull the sample from), but don’t fret about testing for significant differences. You may feel more confidence if you are able to replicate the results in repeated surveys, but be always cautious about inferences made from convenience samples since there could be a hidden systematic bias in the data.

It is always important that whenever you use convenience samples  you consider the following when analyzing the results:

         1. Who is systematically excluded from the sample?

         2. What groups are over- or underrepresented in the sample?

         3. Have the results been replicated with different samples and data collection methods?

If testing for significant difference gives you peace of mind, even when using convenience samples, do it to confirm the “direction” of the data, but restrain yourself from doing inferences to a larger population.

To learn more about our consumer data service visit Consumer Shopping Behavior Insights. To request consumer shopping behavior data and insights don’t hesitate to contact us.


Does A Large Sample Size Guarantee A Representative Sample?

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here
by Michaela Mora Follow me on Twitter

Posted on May 13, 2010

Representative Sample

I often get asked “What sample size do I need to get a representative sample?” The problem is that this question is not formulated correctly. 

Sample size and representativeness are two related, but different issues. The sheer size of a sample is not a guarantee of its ability to accurately represent a target population. Large unrepresentative samples can perform as badly as small unrepresentative samples.

A survey sample’s ability to represent a population has to do with the sampling frame; that is the list from which the sample is selected. When some parts of the target population are not included in the sampled population, we are faced with selection bias, which prevent us from claiming that the sample is representative of the target population. Selection bias can occur in different ways:

  • Convenience sample: This includes respondents who are easier to select or who are most likely to respond. This sample will not be representative of harder-to-select individuals. Samples from online panels are a good example of convenience samples. These panels are composed by individuals who have expressed interest in participating in surveys, leaving out individuals who may be part of the target population, but are not available for interviewing through the panel.
  • Undercoverage: This happens when we fail to include all the target population in the sampling frame. Many online panels work hard at avoiding undercoverage bias, but the fact remains that certain demographics are underrepresented. For example, it is difficult to field online studies targeted at the total Hispanic population in the US without using a hybrid data collection approach that allows us to reach unacculturated Hispanics, who are usually underrepresented in most online panels. Coverage bias is also found in phone surveys that use telephone list sampling frames that exclude households without landline access. As more households substitute cell phones for their landlines, obtaining representative samples of certain demographic groups will soon be difficult without including cell phone lists in the sampling frame.
  • Nonresponse: Selection bias also takes place when we fail to obtain responses from all respondents in the selected sample. Nonrespondents tend to differ from respondents, so their absence in the final sample makes it difficult to generalize the results to the overall target population. This is why the design of a survey is far more important than the absolute sample size to get a representative sample of the target population.
  • Judgment sample: This is a sample selected based on “representative” criteria based on prior knowledge of the topic or target population. An example would be a study looking for a sample of teenagers, and trying to intercept them at a cross-section near a high school.
  • Misspecification of target population: This happens when we use intentionally or unintentionally screening criteria that leave out important subgroups of the population.
  • Poor data collection quality: This can introduce selection bias when there are poor quality controls to ensure that we interview the designated members of the sample. An example of this include allowing whoever is available in the household to take the survey instead of the intended member based on certain screening criteria.

So when it comes to getting a representative sample, sample source is more important than sample size. If you want a representative sample of a particular population, you need to ensure that:

  1. The sample source includes the whole target population
  2. The selected data collection method (online, phone, paper, in person) can reach individuals, with characteristics typical of those possessed by the population of interest
  3. The screening criteria truly reflect the target population
  4. You can minimize nonresponse bias with good survey design, incentives and the appropriate contact method
  5. There are quality controls in place during the data collection process to guarantee that designated members of the sample are reached.

For help on sample size calculation use our Sample Size and Margin of Error Calculators.

What Is The Right Sample Size For A Survey?

Twitter Facebook


by Michaela Mora Follow Me on Twitter Here

Sample Size Trade-offs

Determining the sample size is one of the early steps that must be taken in the planning of a survey. Unfortunately, there is no magic formula that will tell us what the perfect sample is since there are several factors we need to think about:

  • ANALYTICAL PLAN: The research objectives and planned analytical approach should be the first factor to consider when making the decision on sample size. For instance, there are statistical procedures (e.g. regression analysis) that require a certain number of observations per variable. Moreover, if comparative analysis between subgroups in the sample is expected, the sample size should be adjusted for it to be able to identify statistically significant differences between the groups.
  • POPULATION VARIABILITY: This refers to the target population’s diversity. If the target population exhibits large variability in the behaviors and attitudes of interest being researched, a large sample is needed. If 20% or 80% of the population behaves in certain way, this indicates less variability than if 50% would do so. To be conservative, it is standard practice to use 50% (0.5) as the event probability in sample size calculations since it represents the highest variability that can be expected in the population.
  • LEVEL OF CONFIDENCE: This is the level of risk we are willing to tolerate usually expressed as a percentage (e.g. 95% confidence level). Although survey results are reported as point estimates (e.g. 75% of respondents like this product), the fact is that since we are working with a sample of the target population, we can only be confident that the true value of the estimate in that population falls within a particular range or what is called confidence interval. The level of confidence indicates the probability that the true value of the estimate in fact will fall within the boundaries of the confidence interval. How confident can you be? As confident as your tolerance for risk allows you to, knowing that the confidence level is inversely proportional to estimate accuracy or margin of error. The more confident you want to be, the larger the confidence interval that is needed, which leads to lower levels of precision.
  • MARGIN OF ERROR: Also known as sampling error, indicates the desired level of precision of the estimate. You have probably seen poll results quoted in the media, saying that the margin of error was plus or minus a particular percentage (e.g. +/-3%). This percentage defines the lower and upper bounds of the confidence interval likely to include the parameter estimate, and it is a measure of its reliability. The larger the sample, the smaller the margin of error and the greater the estimate precision.

Below is a table illustrating how the margin of error and level of confidence interact with sample size. To get the same level of precision (e.g. +/-3.2%), larger samples are needed as the confidence level increases. For example, if we want to be certain that in 95 out of 100 times the survey is repeated the estimate will be +/- 3.2%, we need a sample of 950.

representative sample vs. sample size

For more help on calculating sample size and margin of error, use our Sample Size and Margin of Error Calculators.

  • COST: Sample size cost is often one of the largest items in the budget for market research studies, especially if the target sample includes low-incidence segments or the response rates is low. Many times, our clients have to make a tradeoff between statistical accuracy and research cost. Recently, I received a call from a client who wanted to conduct an online survey with a sample of 1,000 respondents, which would give a statistical accuracy of +/-3.1% at the 95% confidence level, but would cost $8,000 based on certain screening criteria. At the same time, a sample of 400 respondents would give a statistical accuracy of +/-4.9% and cost $3,400. In this case, a 135% increase in sample cost would only yield a 60% gain in statistical accuracy. The client decided to conduct the study on the smaller sample.
  • POPULATION SIZE: Most of the time, the size of the total target population is unknown, and it is assumed to be large ( >100,000), but in studies where the sample is a large fraction of the population of interest, some adjustments may be needed.

SAMPLE SIZE CALCULATION CHECK LIST

As a summary, to determine the sample size needed in a survey, we need to answer the following questions:

  • What type of data of data analysis will be conducted? Will subgroups be compared?
  • What is the probability of the event occurring? – If not previous data exists, use 50% for a conservative sample size estimate.
  • How much error is tolerable (confidence interval)? How much precision do we need?
  • How confident do we need to be that the true population value falls within the confidence interval?
  • What is the research budget? Can we afford the desired sample?
  • What is the population size? Large? Small/Finite? If unknown, assume it to be large ( >100,000)

So the answer to the question “What is the right sample size for a survey?” is: It depends. I hope I gave you some guidance in choosing sample size, but the final decision is up to you. To calculate sample size and margin of error, use our Sample Size and Margin of Error Calculators.

Have you wondered, what sample size is needed to get a representative sample, read Does A Large Sample Size Guarantee A Representative Sample?

To learn more about our consumer data service visit Consumer Shopping Behavior Insights. To request consumer shopping behavior data and insights don’t hesitate to contact us.


Subscribe
To Our Blog
Read market research articles with zero fluff!

Our Clients Say...

Red Mango needed first class consumer insight on a startup company budget. And we needed it fast. Relevant Insights was able to handle study design, execution, field work, and analysis in a way that met all of our needs.

Jim Notarnicola, CMO
Red Mango USA
Certifications
  • Minority Business Enterprise (MBE), National Minority Supplier Development Council (NMSDC)

  • Women Business Enterprise (WBE), Women's Business Enterprise National Council (WBENC)

  • Historically Underutilized Business (HUB), State of Texas

  • Disadvantaged Business Enterprise (DBE), North Central Texas Regional Certification Agency (NCTRCA)