Clients often ask me to review surveys or analyze data collected via surveys they developed themselves. More often than not I find rating scales, (aka Likert scales) of different sizes and directions within the same survey. When I ask why I get answers such as “This is the one we have always used.”
It seems this question type is often chosen based on preference or habit (e.g. legacy surveys). This is not surprising since there is no consensus on which scales work best. They all yield different results, which is disheartening in a way.
What The Research Says
A lot of research has been dedicated to this subject. Unfortunately, there is no simple answer to the question on which rating scales we should use.
How to Avoid or Handle Rating Scales
This extensive body of research shows that different rating scales are bound to yield different results as we are mainly dealing with human perception. They mean different things to different people and the values, words, and order in which we present them have an impact on how they are interpreted. What to do?
- Whenever possible, favor question formats other than rating scales. For example, Maxdiff has been shown to discriminate better in preference and important measurements.
- If you still have to use rating scales, strive for consistency and use them with full knowledge of the bias they introduce in the data, particularly if you want to analyze data from different rating scales and data from different surveys. This is particularly relevant in tracking studies. A change in rating scale from one wave to another may show artificial significant differences mainly due to the measurement error introduced by the change in scale.
- Above all, triangulate the results with other data sources to understand how different scale points correlate with actual behavior and ask why the person gives a particular rating. If possible use a text analytics tool to get at the heart of what the scale really means for a respondent. The example below says it all.