# 6.3: Levels of Measurement


The first decision to be made in operationalizing a construct is to decide on what is the intended level of measurement. Levels of measurement, also called rating scales, refer to the values that an indicator can take (but says nothing about the indicator itself). For example, male and female (or M and F, or 1 and 2) are two levels of the indicator “gender.” In his seminal article titled "On the theory of scales of measurement" published in Science in 1946, psychologist Stanley Smith Stevens (1946) defined four generic types of rating scales for scientific measurements: nominal, ordinal, interval, and ratio scales. The statistical properties of these scales are shown in Table 6.1.

 Scale Central Tendency Statistics Transformations Nominal Mode Chi-square One-to-one (equality) Ordinal Median Percentile, non-parametric statistics Monotonic increasing (order) Interval Arithmetic mean, range, standard deviation Correlation, regression, analysis of variance Positive linear (affine) Ratio Geometric mean, harmonic mean Coefficient of variation Positive similarities (multiplicative, logarithmic)
 Note: All higher-order scales can use any of the statistics for lower order scales.

Table 6.1. Statistical properties of rating scales

Nominal scales, also called categorical scales, measure categorical data. These scales are used for variables or indicators that have mutually exclusive attributes. Examples include gender (two values: male or female), industry type (manufacturing, financial, agriculture, etc.), and religious affiliation (Christian, Muslim, Jew, etc.). Even if we assign unique numbers to each value, for instance 1 for male and 2 for female, the numbers don’t really mean anything (i.e., 1 is not less than or half of 2) and could have been easily been represented non-numerically, such as M for male and F for female. Nominal scales merely offer names or labels for different attribute values. The appropriate measure of central tendency of a nominal scale is mode, and neither the mean nor the median can be defined. Permissible statistics are chi-square and frequency distribution, and only a one-to-one (equality) transformation is allowed (e.g., 1=Male, 2=Female).

Ordinal scales are those that measure rank-ordered data, such as the ranking of students in a class as first, second, third, and so forth, based on their grade point average or test scores. However, the actual or relative values of attributes or difference in attribute values cannot be assessed. For instance, ranking of students in class says nothing about the actual GPA or test scores of the students, or how they well performed relative to one another. A classic example in the natural sciences is Moh’s scale of mineral hardness, which characterizes the hardness of various minerals by their ability to scratch other minerals. For instance, diamonds can scratch all other naturally occurring minerals on earth, and hence diamond is the “hardest” mineral. However, the scale does not indicate the actual hardness of these minerals or even provides a relative assessment of their hardness. Ordinal scales can also use attribute labels (anchors) such as “bad”, “medium”, and “good”, or "strongly dissatisfied", "somewhat dissatisfied", "neutral", or "somewhat satisfied", and "strongly satisfied”. In the latter case, we can say that respondents who are “somewhat satisfied” are less satisfied than those who are “strongly satisfied”, but we cannot quantify their satisfaction levels. The central tendency measure of an ordinal scale can be its median or mode, and means are uninterpretable. Hence, statistical analyses may involve percentiles and non-parametric analysis, but more sophisticated techniques such as correlation, regression, and analysis of variance, are not appropriate. Monotonically increasing transformation (which retains the ranking) is allowed.

Interval scales are those where the values measured are not only rank-ordered, but are also equidistant from adjacent attributes. For example, the temperature scale (in Fahrenheit or Celsius), where the difference between 30 and 40 degree Fahrenheit is the same as that between 80 and 90 degree Fahrenheit. Likewise, if you have a scale that asks respondents’ annual income using the following attributes (ranges): $0 to 10,000,$10,000 to 20,000, $20,000 to 30,000, and so forth, this is also an interval scale, because the mid-point of each range (i.e.,$5,000, $15,000,$25,000, etc.) are equidistant from each other. The intelligence quotient (IQ) scale is also an interval scale, because the scale is designed such that the difference between IQ scores 100 and 110 is supposed to be the same as between 110 and 120 (although we do not really know whether that is truly the case). Interval scale allows us to examine “how much more” is one attribute when compared to another, which is not possible with nominal or ordinal scales. Allowed central tendency measures include mean, median, or mode, as are measures of dispersion, such as range and standard deviation. Permissible statistical analyses include all of those allowed for nominal and ordinal scales, plus correlation, regression, analysis of variance, and so on. Allowed scale transformation are positive linear. Note that the satisfaction scale discussed earlier is not strictly an interval scale, because we cannot say whether the difference between “strongly satisfied” and “somewhat satisfied” is the same as that between “neutral” and “somewhat satisfied” or between “somewhat dissatisfied” and “strongly dissatisfied”. However, social science researchers often “pretend” (incorrectly) that these differences are equal so that we can use statistical techniques for analyzing ordinal scaled data.

Ratio scales are those that have all the qualities of nominal, ordinal, and interval scales, and in addition, also have a “true zero” point (where the value zero implies lack or nonavailability of the underlying construct). Most measurement in the natural sciences and engineering, such as mass, incline of a plane, and electric charge, employ ratio scales, as are some social science variables such as age, tenure in an organization, and firm size (measured as employee count or gross revenues). For example, a firm of size zero means that it has no employees or revenues. The Kelvin temperature scale is also a ratio scale, in contrast to the Fahrenheit or Celsius scales, because the zero point on this scale (equaling -273.15 degree Celsius) is not an arbitrary value but represents a state where the particles of matter at this temperature have zero kinetic energy. These scales are called “ratio” scales because the ratios of two points on these measures are meaningful and interpretable. For example, a firm of size 10 employees is double that of a firm of size 5, and the same can be said for a firm of 10,000 employees relative to a different firm of 5,000 employees. All measures of central tendencies, including geometric and harmonic means, are allowed for ratio scales, as are ratio measures, such as studentized range or coefficient of variation. All statistical methods are allowed. Sophisticated transformation such as positive similar (e.g., multiplicative or logarithmic) are also allowed.

Based on the four generic types of scales discussed above, we can create specific rating scales for social science research. Common rating scales include binary, Likert, semantic differential, or Guttman scales. Other less common scales are not discussed here.

Binary scales. Binary scales are nominal scales consisting of binary items that assume one of two possible values, such as yes or no, true or false, and so on. For example, a typical binary scale for the “political activism” construct may consist of the six binary items shown in Table 6.2. Each item in this scale is a binary item, and the total number of “yes” indicated by a respondent (a value from 0 to 6) can be used as an overall measure of that person’s political activism. To understand how these items were derived, refer to the “Scaling” section later on in this chapter. Binary scales can also employ other values, such as male or female for gender, fulltime or part-time for employment status, and so forth. If an employment status item is modified to allow for more than two possible values (e.g., unemployed, full-time, part-time, and retired), it is no longer binary, but still remains a nominal scaled item.

 Have you ever written a letter to a public official Yes No Have you ever signed a political petition Yes No Have you ever donated money to a political cause Yes No Have you ever donated money to a candidate running for public office Yes No Have you ever written a political letter to the editor of a newspaper or magazine Yes No Have you ever persuaded someone to change his/her voting plans Yes No

Table 6.2. A six-item binary scale for measuring political activism

Likert scale. Designed by Rensis Likert, this is a very popular rating scale for measuring ordinal data in social science research. This scale includes Likert items that are simply-worded statements to which respondents can indicate their extent of agreement or disagreement on a five or seven-point scale ranging from “strongly disagree” to “strongly agree”. A typical example of a six-item Likert scale for the “employment self-esteem” construct is shown in Table 6.3. Likert scales are summated scales, that is, the overall scale score may be a summation of the attribute values of each item as selected by a respondent.

 Strongly Disagree Somewhat Disagree Neutral Somewhat Agree Strongly Agree I feel good about my job 1 2 3 4 5 I get along well with others at work 1 2 3 4 5 I’m proud of my relationship with my supervisor at work 1 2 3 4 5 I can tell that other people at work are glad to have me there 1 2 3 4 5 I can tell that my coworkers respect me 1 2 3 4 5 I feel like I make a useful contribution at work 1 2 3 4 5

Table 6.3. A six-item Likert scale for measuring employment self-esteem

Likert items allow for more granularity (more finely tuned response) than binary items, including whether respondents are neutral to the statement. Three or nine values (often called “anchors”) may also be used, but it is important to use an odd number of values to allow for a “neutral” (or “neither agree nor disagree”) anchor. Some studies have used a “forced choice approach” to force respondents to agree or disagree with the LIkert statement by dropping the neutral mid-point and using even number of values and, but this is not a good strategy because some people may indeed be neutral to a given statement and the forced choice approach does not provide them the opportunity to record their neutral stance. A key characteristic of a Likert scale is that even though the statements vary in different items or indicators, the anchors (“strongly disagree” to “strongly agree”) remain the same. Likert scales are ordinal scales because the anchors are not necessarily equidistant, even though sometimes we treat them like interval scales.

How would you rate your opinions on national health insurance?

 Very much Somewhat Neither Somewhat Very much Good □ □ □ □ □ Bad Useful □ □ □ □ □ Useless Caring □ □ □ □ □ Uncaring Interesting □ □ □ □ □ Boring

Table 6.4. A semantic differential scale for measuring attitude toward national health insurance

Semantic differential scale. This is a composite (multi-item) scale where respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. For instance, the construct “attitude toward national health insurance” can be measured using four items shown in Table 6.4. As in the Likert scale, the overall scale score may be a summation of individual item scores. Notice that in Likert scales, the statement changes but the anchors remain the same across items. However, in semantic differential scales, the statement remains constant, while the anchors (adjective pairs) change across items. Semantic differential is believed to be an excellent technique for measuring people’s attitude or feelings toward objects, events, or behaviors.

Guttman scale. Designed by Louis Guttman, this composite scale uses a series of items arranged in increasing order of intensity of the construct of interest, from least intense to most intense. As an example, the construct “attitude toward immigrants” can be measured using five items shown in Table 6.5. Each item in the above Guttman scale has a weight (not indicated above) which varies with the intensity of that item, and the weighted combination of each response is used as aggregate measure of an observation.

 How will you rate your opinions on the following statements about immigrants?
 Do you mind immigrants being citizens of your country Yes No Do you mind immigrants living in your own neighborhood Yes No Would you mind living next door to an immigrant Yes No Would you mind having an immigrant as your close friend Yes No Would you mind if someone in your family married an immigrant Yes No

Table 6.5. A five-item Guttman scale for measuring attitude toward immigrants

This page titled 6.3: Levels of Measurement is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Anol Bhattacherjee (Global Text Project) .