2 minutes to read. By author Michaela Mora on June 20, 2022 Topics: Analysis Techniques, Market Research, Survey Design
Clients often request we review surveys or analyze data collected via surveys they developed themselves. More often than not I find rating scales of different sizes and directions within the same survey. When I ask why I get answers such as “This is the one we have always used.”
It seems this question type is often chosen based on preference or habit (e.g. legacy surveys). This is not surprising since there is no consensus on which scales work best. They all yield different results, which is disheartening in a way. Consequently, the search for reliability is prioritized in the use of rating scales.
Reliability refers to the extent to which a scale produces consistent results if repeated measurements are made within the same study or across studies. Systematics errors affecting measurements in constant ways, don’t have to affect reliability.
We assess reliability by determining the proportion of systematic variation in a scale and determine the correlation between the scores obtained from different administrations of the scale. If the correlation is high, the scale yields consistent results and is therefore considered reliable.
The validity of a scale refers to the extent to which differences in observed scale scores reflect true differences among the characteristics being measured rather than systematic or random errors. There are different types of validity categories.
This is sometimes called “face validity” and refers to a subjective evaluation of how well the content of the items included in rating questions covers the domain we are studying. It is about the statements used to represent the phenomenon we are studying and helps in a common-sense interpretation of the scale scores.
This type of validity evaluates whether the measured items perform as expected in relationship with other meaningful criteria included in the study (e.g., behavioral, attitudinal measures, demographics, etc.).
In construct validity, we address the question of what characteristic we are actually measuring. This is connected to the underlying theory we use to develop the items measured with the rating questions. In practice, we often don’t know what items should be included to describe the phenomenon we are trying to study (e.g., drivers of user experience, customer satisfaction, brand attitudes, barriers to purchases, etc.), so conducting exploratory qualitative research to develop relevant items and support construct validity is highly recommended. Otherwise, we are just guessing or working from biased assumptions that may miss key aspects of what we are trying to study.
This is the extent to which the items used in the rating questions correlate positively with other measures of the same construct, even if they are not measured with rating scales.
Reliability and validity are related in ways that sometimes sound counter-intuitive to those not familiar with measurement scales and statistics.
In addition to issues concerning the reliability and validity of the items being included in a question, we need to consider:
The most commonly used scaling technique in market research surveys is the itemized rating scale, which is a measurement scale that has numbers and or labels associated with each scale point and ordered in a particular position.
The most popular types are:
Variations of the Likert and Semantic Differential scales abound. They have been adapted to different topics and extended to different numbers of scale points and labels. The debate on whether to use bipolar or unipolar scales and research trying to find definitive answers continues to this day.
A lot of research has been dedicated to this subject. Unfortunately, there is no simple answer to the question of which rating scales we should use.
Rating questions are a familiar question format to internal stakeholders, researchers, and participants.
This extensive body of research shows that different rating scales are bound to yield different results as we are mainly dealing with human perception. They mean different things to different people and the values, words, and order in which we present them have an impact on how they are interpreted. What to do?
Share on:
Subscribe to our newsletter to get notified about future articles
Subscribe and don’t miss anything!
Subscribe
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.