Skip to main content
Social Sci LibreTexts

4.5: Psychological Measurement (Summary)

  • Page ID
    309639
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Key Takeaways

    • Measurement is the assignment of scores to individuals so that the scores represent some characteristic of the individuals. Psychological measurement can be achieved in a wide variety of ways, including self-report, behavioral, and physiological measures.
    • Psychological constructs such as intelligence, self-esteem, and depression are variables that are not directly observable because they represent behavioral tendencies or complex patterns of behavior and internal processes. An important goal of scientific research is to conceptually define psychological constructs in ways that accurately describe them.
    • For any conceptual definition of a construct, there will be many different operational definitions or ways of measuring it. The use of multiple operational definitions, or converging operations, is a common strategy in psychological research.
    • Variables can be measured at four different levels—nominal, ordinal, interval, and ratio—that communicate increasing amounts of quantitative information. The level of measurement affects the kinds of statistics you can use and conclusions you can draw from your data.
    • Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
    • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
    • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
    • Good measurement begins with a clear conceptual definition of the construct to be measured. This is accomplished both by clear and detailed thinking and by a review of the research literature.
    • You often have the option of using an existing measure or creating a new measure. You should make this decision based on the availability of existing measures and their adequacy for your purposes.
    • Several simple steps can be taken in creating new measures and in implementing both existing and new measures that can help maximize reliability and validity.
    • Once you have used a measure, you should reevaluate its reliability and validity based on your new data. Remember that the assessment of reliability and validity is an ongoing process.

    Key Terms and Concepts

    MEASUREMENT

    The assignment of scores to individuals to represent some characteristic.

    PSYCHOMETRICS

    The science of measuring psychological constructs.

    CONSTRUCTS

    Variables that are not directly observable but inferred from behavior, such as intelligence or anxiety.

    CONCEPTUAL DEFINITION

    An abstract, theoretical description of what a construct means.

    OPERATIONAL DEFINITION

    A precise specification of how a construct will be measured or manipulated in a study.

    SELF-REPORT MEASURES

    Measures in which participants report their own thoughts, feelings, or behaviors.

    BEHAVIORAL MEASURES

    Measures based on direct observation of behavior.

    PHYSIOLOGICAL MEASURES

    Measures of bodily functions such as heart rate, brain activity, or hormone levels.

    CONVERGING OPERATIONS

    Using multiple operational definitions to measure the same construct.

    NOMINAL LEVEL

    A level of measurement used for categorical variables.

    ORDINAL LEVEL

    A scale where data is categorized and ranked in a natural order, but the differences between the ranks are not uniform or quantifiable.

    INTERVAL LEVEL

    A level of measurement with equal distances between values, but without a true zero point.

    RATIO LEVEL

    A level of measurement with equal distances between values and a true zero point.

    RELIABILITY

    The consistency or repeatability of measurement.

    TEST-RETEST RELIABILITY

    The consistency of scores from the same test given at different times.

    INTERNAL CONSISTENCY

    The extent to which items on a test measure the same thing.

    SPLIT-HALF CORRELATION

    A reliability estimate based on correlating two halves of a test.

    CRONBACH’S Α

    A measure of internal consistency.

    INTER-RATER RELIABILITY

    The degree of agreement between different observers or raters.

    VALIDITY

    The extent to which a measure actually measures what it claims to measure.

    FACE VALIDITY

    The degree to which a measure appears to measure what it is supposed to measure.

    CONTENT VALIDITY

    The extent to which test items adequately represent the construct domain.

    CRITERION VALIDITY

    The extent to which scores correlate with relevant external criteria.

    CONCURRENT VALIDITY

    Correlation between test scores and a criterion measured at the same time.

    PREDICTIVE VALIDITY

    Correlation between test scores and a criterion measured at a later time.

    CONVERGENT VALIDITY

    The extent to which a measure correlates with other measures of the same construct.

    DISCRIMINANT VALIDITY

    The extent to which a measure does not correlate with measures of different constructs.

    SOCIALLY DESIRABLE RESPONDING

    The tendency to give responses that are socially acceptable rather than truthful.

    DEMAND CHARACTERISTICS

    Cues in a study that reveal its purpose and influence participants' responses.

    Test Your Knowledge (answers at end of section)

    1. What is the primary difference between a psychological construct and the operational definition used to measure it?

    A. There is no difference; they are the same thing

    B. A construct is an abstract concept while an operational definition specifies how the construct will be measured

    C. An operational definition is more theoretical than a construct

    D. Constructs can be measured directly but operational definitions cannot

    2. In Bandura's Bobo doll study, aggression was operationally defined as the number of specific acts (hitting with mallet, punching, kicking) a child performed in 20 minutes. This is an example of:

    A. A self-report measure

    B. A behavioral measure

    C. A physiological measure

    D. An ordinal level measure

    3. A researcher measures temperature using the Kelvin scale, which has an absolute zero point representing the complete absence of kinetic energy. This is what level of measurement?

    A. Nominal

    B. Ordinal

    C. Interval

    D. Ratio

    4. A researcher develops a new stress measure that produces very similar scores when the same person takes it multiple times over a short period. However, the measure doesn't actually correlate with physiological indicators of stress. This measure has:

    A. High reliability but questionable validity

    B. High validity but low reliability

    C. Both high reliability and high validity

    D. Neither reliability nor validity

    5. A measure of mood that produces a low test-retest correlation over a month would:

    A. Always indicate the measure is unreliable and should not be used

    B. Not necessarily be a concern because mood is expected to change over time

    C. Indicate the measure has poor internal consistency

    D. Mean the measure lacks face validity

    6. The Minnesota Multiphasic Personality Inventory (MMPI-2) includes items like 'I enjoy detective or mystery stories' to measure aggression suppression, even though these items have no obvious connection to aggression. This demonstrates that:

    A. The MMPI-2 lacks validity because it has poor face validity

    B. Face validity is necessary for a measure to be useful

    C. Measures can work well despite lacking face validity

    D. The measure has low internal consistency

    7. When selecting an existing psychological measure for a research study, which of the following is the MOST important consideration?

    A. Whether the measure is published in a prestigious journal

    B. Whether the measure has good reliability and validity evidence for your specific population and purpose

    C. Whether the measure is the shortest one available

    D. Whether the measure was developed recently

    8. A researcher creates a measure with multiple items rather than a single item primarily because:

    A. Multiple items make the study appear more rigorous to reviewers

    B. Multiple items improve both content validity and reliability by covering the construct better and reducing random error

    C. Participants prefer longer questionnaires

    D. Single items always produce ceiling effects

    Answer Key with Explanations

    1. B - A construct is an abstract concept while an operational definition specifies how the construct will be measured

    A psychological construct is an abstract, theoretical concept (like intelligence or anxiety) that cannot be directly observed. An operational definition specifies the concrete procedures and measures used to assess that construct in a particular study.

    2. B - A behavioral measure

    This question tests understanding of the three broad categories of operational definitions. Bandura's Bobo doll study as a good example of a behavioral measure. Behavioral measures observe and record participants' behavior, contrasting with self-report measures (participants report their own thoughts/feelings) and physiological measures (recording physiological processes like heart rate).

    3. D - Ratio

    This question tests understanding of Stevens's levels of measurement, specifically the distinction between interval and ratio scales. The Fahrenheit scale is interval level because zero degrees Fahrenheit does not represent the complete absence of temperature, making ratios meaningless. However, zero on the Kelvin scale is absolute zero. This makes the Kelvin scale a ratio scale. The defining feature of ratio scales is having a true zero point representing complete absence of the quantity, which allows meaningful ratio comparisons.

    4. A - High reliability but questionable validity

    Reliability refers to consistency of measurement - getting similar scores on repeated testing. Validity refers to whether the measure actually assesses what it claims to measure. A measure can be reliable (consistent) without being valid (accurate). In this case, the measure is consistent but doesn't correlate with what it should if it were truly measuring stress.

    5. B - Not necessarily be a concern because mood is expected to change over time

    High test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern. This demonstrates that the appropriateness of test-retest reliability depends on the theoretical nature of the construct being measured.

    6. C - Measures can work well despite lacking face validity

    Many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items "I enjoy detective or mystery stories" and "The sight of blood doesn't frighten me or make me sick" both measure the suppression of aggression.' Face validity is at best a very weak kind of evidence because it's based on intuitions that can be wrong.

    7. B - Whether the measure has good reliability and validity evidence for your specific population and purpose

    The most critical factor when selecting a measure is whether it has demonstrated good psychometric properties (reliability and validity) specifically for the population and purpose you intend to use it for. A measure that works well for one population or context may not be appropriate for another.

    8. B - Multiple items improve both content validity and reliability by covering the construct better and reducing random error

    Multiple items are often required to cover a construct adequately. In addition, responses to single items can be influenced by irrelevant factors—misunderstanding the particular item, a momentary distraction, or a simple error such as checking the wrong response option. But when several responses are summed or averaged, the effects of these irrelevant factors tend to cancel each other out to produce more reliable scores. This demonstrates how multiple items address both validity (better coverage of construct) and reliability (reduced random error).

    References

    Amir, N., Freshman, M., & Foa, E. (2002). Enhanced Stroop interference for threat in social phobia. Journal of Anxiety Disorders, 16, 1–9.

    Bandura, A., Ross, D., & Ross, S. A. (1961). Transmission of aggression through imitation of aggressive models. Journal of Abnormal and Social Psychology, 63, 575–582.

    Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42, 116–131.

    Cohen, S., Kamarck, T., & Mermelstein, R. (1983). A global measure of perceived stress. Journal of Health and Social Behavior, 24, 386-396.

    Costa, P. T., Jr., & McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO Personality Inventory. Psychological Assessment, 4, 5–13.

    Delongis, A., Coyne, J. C., Dakof, G., Folkman, S., & Lazarus, R. S. (1982). Relationships of daily hassles, uplifts, and major life events to health status. Health Psychology, 1(2), 119-136.

    Gosling, S. D., Rentfrow, P. J., & Swann, W. B., Jr. (2003). A very brief measure of the Big Five personality domains. Journal of Research in Personality, 37, 504–528.

    Holmes, T. H., & Rahe, R. H. (1967). The Social Readjustment Rating Scale. Journal of Psychosomatic Research, 11(2), 213-218.

    Levels of Measurement. (2016, August 26). Retrieved from http://wikieducator.org/Introduction_to_Research_Methods_In_Psychology/Theories_and_Measurement/Levels_of_Measurement

    MacDonald, T. K., & Martineau, A. M. (2002). Self-esteem, mood, and intentions to use condoms: When does low self-esteem lead to risky health behaviors? Journal of Experimental Social Psychology, 38, 299–306.

    Petty, R. E, Briñol, P., Loersch, C., & McCaslin, M. J. (2009). The need for cognition. In M. R. Leary & R. H. Hoyle (Eds.), Handbook of individual differences in social behavior (pp. 318–329). New York, NY: Guilford Press.

    Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: Princeton University Press

    Rosenberg, M. (1989). Society and the adolescent self-image (rev. ed.). Middletown, CT: Wesleyan University Press.

    Segerstrom, S. E., & Miller, G. E. (2004). Psychological stress and the human immune system: A meta-analytic study of 30 years of inquiry. Psychological Bulletin, 130, 601–630.

    Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680.

    Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643–662.

    Exercises
    • Practice: Complete the Rosenberg Self-Esteem Scale and compute your overall score.
    • Practice: Think of three operational definitions for sexual jealousy, decisiveness, and social anxiety. Consider the possibility of self-report, behavioral, and physiological measures. Be as precise as you can.
    • Practice: For each of the following variables, decide which level of measurement is being used.
      • A university instructor measures the time it takes her students to finish an exam by looking through the stack of exams at the end. She assigns the one on the bottom a score of 1, the one on top of that a 2, and so on.
      • A researcher accesses her participants’ medical records and counts the number of times they have seen a doctor in the past year.
      • Participants in a research study are asked whether they are right-handed or left-handed.
    • Practice: Ask several friends to complete the Rosenberg Self-Esteem Scale. Then assess its internal consistency by making a scatterplot to show the split-half correlation (even- vs. odd-numbered items). Compute the correlation coefficient too if you know how.
    • Discussion: Think back to the last college exam you took and think of the exam as a psychological measure. What construct do you think it was intended to measure? Comment on its face and content validity. What data could you collect to assess its reliability and criterion validity?
    • Practice: Write your own conceptual definition of self-confidence, irritability, and athleticism.
    • Practice: Choose a construct (sexual jealousy, self-confidence, etc.) and find two measures of that construct in the research literature. If you were conducting your own study, which one (if either) would you use and why?

    This page titled 4.5: Psychological Measurement (Summary) is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton via source content that was edited to the style and standards of the LibreTexts platform.