4.5: Psychological Measurement (Summary)
- Page ID
- 309639
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Key Takeaways
Key Terms and Concepts
MEASUREMENT
The assignment of scores to individuals to represent some characteristic.
PSYCHOMETRICS
The science of measuring psychological constructs.
CONSTRUCTS
Variables that are not directly observable but inferred from behavior, such as intelligence or anxiety.
CONCEPTUAL DEFINITION
An abstract, theoretical description of what a construct means.
OPERATIONAL DEFINITION
A precise specification of how a construct will be measured or manipulated in a study.
SELF-REPORT MEASURES
Measures in which participants report their own thoughts, feelings, or behaviors.
BEHAVIORAL MEASURES
Measures based on direct observation of behavior.
PHYSIOLOGICAL MEASURES
Measures of bodily functions such as heart rate, brain activity, or hormone levels.
CONVERGING OPERATIONS
Using multiple operational definitions to measure the same construct.
NOMINAL LEVEL
A level of measurement used for categorical variables.
ORDINAL LEVEL
A scale where data is categorized and ranked in a natural order, but the differences between the ranks are not uniform or quantifiable.
INTERVAL LEVEL
A level of measurement with equal distances between values, but without a true zero point.
RATIO LEVEL
A level of measurement with equal distances between values and a true zero point.
RELIABILITY
The consistency or repeatability of measurement.
TEST-RETEST RELIABILITY
The consistency of scores from the same test given at different times.
INTERNAL CONSISTENCY
The extent to which items on a test measure the same thing.
SPLIT-HALF CORRELATION
A reliability estimate based on correlating two halves of a test.
CRONBACH’S Α
A measure of internal consistency.
INTER-RATER RELIABILITY
The degree of agreement between different observers or raters.
VALIDITY
The extent to which a measure actually measures what it claims to measure.
FACE VALIDITY
The degree to which a measure appears to measure what it is supposed to measure.
CONTENT VALIDITY
The extent to which test items adequately represent the construct domain.
CRITERION VALIDITY
The extent to which scores correlate with relevant external criteria.
CONCURRENT VALIDITY
Correlation between test scores and a criterion measured at the same time.
PREDICTIVE VALIDITY
Correlation between test scores and a criterion measured at a later time.
CONVERGENT VALIDITY
The extent to which a measure correlates with other measures of the same construct.
DISCRIMINANT VALIDITY
The extent to which a measure does not correlate with measures of different constructs.
SOCIALLY DESIRABLE RESPONDING
The tendency to give responses that are socially acceptable rather than truthful.
DEMAND CHARACTERISTICS
Cues in a study that reveal its purpose and influence participants' responses.
Test Your Knowledge (answers at end of section)
1. What is the primary difference between a psychological construct and the operational definition used to measure it?
A. There is no difference; they are the same thing
B. A construct is an abstract concept while an operational definition specifies how the construct will be measured
C. An operational definition is more theoretical than a construct
D. Constructs can be measured directly but operational definitions cannot
2. In Bandura's Bobo doll study, aggression was operationally defined as the number of specific acts (hitting with mallet, punching, kicking) a child performed in 20 minutes. This is an example of:
A. A self-report measure
B. A behavioral measure
C. A physiological measure
D. An ordinal level measure
3. A researcher measures temperature using the Kelvin scale, which has an absolute zero point representing the complete absence of kinetic energy. This is what level of measurement?
A. Nominal
B. Ordinal
C. Interval
D. Ratio
4. A researcher develops a new stress measure that produces very similar scores when the same person takes it multiple times over a short period. However, the measure doesn't actually correlate with physiological indicators of stress. This measure has:
A. High reliability but questionable validity
B. High validity but low reliability
C. Both high reliability and high validity
D. Neither reliability nor validity
5. A measure of mood that produces a low test-retest correlation over a month would:
A. Always indicate the measure is unreliable and should not be used
B. Not necessarily be a concern because mood is expected to change over time
C. Indicate the measure has poor internal consistency
D. Mean the measure lacks face validity
6. The Minnesota Multiphasic Personality Inventory (MMPI-2) includes items like 'I enjoy detective or mystery stories' to measure aggression suppression, even though these items have no obvious connection to aggression. This demonstrates that:
A. The MMPI-2 lacks validity because it has poor face validity
B. Face validity is necessary for a measure to be useful
C. Measures can work well despite lacking face validity
D. The measure has low internal consistency
7. When selecting an existing psychological measure for a research study, which of the following is the MOST important consideration?
A. Whether the measure is published in a prestigious journal
B. Whether the measure has good reliability and validity evidence for your specific population and purpose
C. Whether the measure is the shortest one available
D. Whether the measure was developed recently
8. A researcher creates a measure with multiple items rather than a single item primarily because:
A. Multiple items make the study appear more rigorous to reviewers
B. Multiple items improve both content validity and reliability by covering the construct better and reducing random error
C. Participants prefer longer questionnaires
D. Single items always produce ceiling effects
Answer Key with Explanations
1. B - A construct is an abstract concept while an operational definition specifies how the construct will be measured
A psychological construct is an abstract, theoretical concept (like intelligence or anxiety) that cannot be directly observed. An operational definition specifies the concrete procedures and measures used to assess that construct in a particular study.
2. B - A behavioral measure
This question tests understanding of the three broad categories of operational definitions. Bandura's Bobo doll study as a good example of a behavioral measure. Behavioral measures observe and record participants' behavior, contrasting with self-report measures (participants report their own thoughts/feelings) and physiological measures (recording physiological processes like heart rate).
3. D - Ratio
This question tests understanding of Stevens's levels of measurement, specifically the distinction between interval and ratio scales. The Fahrenheit scale is interval level because zero degrees Fahrenheit does not represent the complete absence of temperature, making ratios meaningless. However, zero on the Kelvin scale is absolute zero. This makes the Kelvin scale a ratio scale. The defining feature of ratio scales is having a true zero point representing complete absence of the quantity, which allows meaningful ratio comparisons.
4. A - High reliability but questionable validity
Reliability refers to consistency of measurement - getting similar scores on repeated testing. Validity refers to whether the measure actually assesses what it claims to measure. A measure can be reliable (consistent) without being valid (accurate). In this case, the measure is consistent but doesn't correlate with what it should if it were truly measuring stress.
5. B - Not necessarily be a concern because mood is expected to change over time
High test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern. This demonstrates that the appropriateness of test-retest reliability depends on the theoretical nature of the construct being measured.
6. C - Measures can work well despite lacking face validity
Many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items "I enjoy detective or mystery stories" and "The sight of blood doesn't frighten me or make me sick" both measure the suppression of aggression.' Face validity is at best a very weak kind of evidence because it's based on intuitions that can be wrong.
7. B - Whether the measure has good reliability and validity evidence for your specific population and purpose
The most critical factor when selecting a measure is whether it has demonstrated good psychometric properties (reliability and validity) specifically for the population and purpose you intend to use it for. A measure that works well for one population or context may not be appropriate for another.
8. B - Multiple items improve both content validity and reliability by covering the construct better and reducing random error
Multiple items are often required to cover a construct adequately. In addition, responses to single items can be influenced by irrelevant factors—misunderstanding the particular item, a momentary distraction, or a simple error such as checking the wrong response option. But when several responses are summed or averaged, the effects of these irrelevant factors tend to cancel each other out to produce more reliable scores. This demonstrates how multiple items address both validity (better coverage of construct) and reliability (reduced random error).


