If you are interested in comparing segments, you need to consider the sample size for segments.
Not long ago, someone asked: What is the sample size of different segments needed to get reliable insight?
Reliability has to do with the consistency of results across data collection instruments and points in time when the data is collected. I see this question being more about validity and representativeness which is related to population heterogeneity and sample source.
To determine the sample size for segments we need to consider a series of factors.
Are there any sub-segments that need to be represented? Usually, the larger the segments the more heterogeneous they tend to be. As heterogeneity increases, the need for a larger sample increases as well to represent all subgroups.
What Is the Sample Source?
Representativeness has to do more with where the sample comes from than with sample size. If you get it from an appropriate source with the right screening criteria you are a step closer to more valid results. Notwithstanding, there are other factors that affect validity.
We should avoid too small samples if we are going to make comparisons since smaller samples have a larger margin of errors.
This means that the ranges for the true value of a parameter for the segments in the comparison may overlap. The endpoints of each range are what we call Margin of Error.
If we compare two small samples and can’t detect any significant difference, it may be due to overlapping margins of error, not to actual lack of differences.
What Level of Risk Are You Willing to Take?
As we increase sample size the margins of error get tighter and precision improves. However, how confident do we want to be in that the true value is indeed within the margins of error? Here we need to consider the Confidence Interval (C.I). The most commonly used is 95%. This simply says that if we repeat the study 100 times, we should get similar results in 95 times, and expect an error in 5 of 100.
How Certain and Precise You Need to Be?
The thing is that Confidence and Precision go in opposite directions. If we want to increase our certainty that the true value will fall within a range of values, we have to widen the range (margins of error). However, this leads to a loss in precision.
Depending on budget and timeline constraints you could use two approaches to sampling for segments:
1. Create quotas by segment. These act as independent groups, like smaller “total samples.” These quotas can be proportional to their size in the population or could be all the same size. In the latter case, you would need to weight the segments if you decide to merge the quotas in a total sample, otherwise, some segments will be overrepresented and others underrepresented.
2. Let the segments fall naturally in the total sample. This approach can be more expensive for comparative analysis. In such cases, you will need a larger total sample and enough samples by segment. If we don’t make segment comparisons, then this is a more desirable approach to get all segments represented in the average values.
As you can see, estimating the sample size for segments is not different from estimating the size for the total sample. There is no magical number to determine how large the sample size should be. Sorry.
(An earlier version of this article was originally published on August 26, 2011. The article was last updated and revised on August 27, 2019.)