Skip to main content
Social Sci LibreTexts

3.1: What is developmental measurement equivalence?

  • Page ID
    10331
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    As developmentalists, whatever our target phenomena may be, we are interested in comparing people of different ages (cross-sectional) or following people across different ages (longitudinal). So, our goal with developmental measurement equivalence is to validly measure that target phenomena at different developmental periods/ ages. It sounds simple: As shown in Figure 20.1, we just want to measure a construct in a way that is both: (1) valid at each age and (2) comparable across ages. It seems like the most straightforward solution to maximize comparability is simply to use the same measure at different ages. But the problem is, when you use exactly the same measure, you may actually be measuring different constructs at different ages.

    [Insert Figure 2 here]

    How is that possible?

    Let’s think of an example. In early cross-sectional studies of intellectual functioning, researchers used standardized tests of intelligences, kind of like the SAT or GRE exams with which most students are familiar. Researchers gave the exact same tests to people of age 20 to age 80. So you would expect the tests, since they were exactly the same, to measure just what is the items on the test were designed to capture, namely, intellectual functioning. But even a few moments’ reflection suggests that the 20-year-olds and the 80-year-olds are likely to experience such tests differently. For example, let’s consider recent practice effects. When would we guess is the last time that the typical 20-year-old has taken a test like our intelligence test, namely, one that uses a Scantron-type form where the answers are bubbled in? The answer might range from “yesterday” (if they are in college) to “a few years ago” (while they were in high school). But in any case, the format of the test looks familiar. Now, let’s think about a typical 80-year-old. The last time they took such a test would likely be “never.” Even if they got a Ph.D., the last exam would have been completed 50 years ago.

    So how well participants perform on our intelligence test also depends on things like familiarity and other performance factors, which turn out to be very different between age groups. For example, performance includes a variety of “test-taking skills”—older people tend to be more cautious, so if they don't know the answer, they are less likely to guess than younger people, and so miss out on the random correct choices; older people care more about their performance, and so tend to be more worried, which may impair performance. One very big difference with age is just plain speed—speed of processing. So any assessments that are timed put older people at a disadvantage.

    Taken all together, this makes us rethink the connection between our target construct (i.e., intellectual functioning) and the measure we are using to map it (namely, participants’ answers to the multiple choice questions on our intelligence test). Performance on that test taps intelligence all right, but it turns out that it also taps familiarity-caution-anxiety-and-speed as well, all of which are factors that weigh differentially on people of different ages. And very clever intervention research by Paul Baltes and his team demonstrated just how much (Baltes & Lindenberger, 1988). They showed that (at least for the young-old), it is possible to systematically peel back those performance factors—if researchers allow older people to become familiar with how the tests look and work, give them practice using the Scantrons, add instructions that encourage them to guess on items even if they are not sure, help reduce worry by underplaying the importance of what the tests measure, and allow them to take the tests under “power” (untimed) conditions. And if researchers do this-- lo and behold. Those elderly people get smarter and smarter and smarter. Their performance improves markedly. So exactly the same test measures different constructs (or combinations of constructs) depending on the age of the participants.

    Is this a problem for non-developmental researchers, too?

    Yes, just like sampling equivalence, measurement equivalence is everywhere. It is an issue for any researcher who wants to make comparisons between groups. It is very common to think about cross-cultural equivalence of measures, or gender equivalence, or racial and ethnic equivalence. The issue of equivalence is at the heart of discussions about tests (and especially tests with consequences—like ones that decide whether students will get into and out of special services or advanced placement courses or college or graduate school) that are “biased”—against people who were not part of the group on which most tests are created (i.e., the dominant group—typically white middle class native-English speaking groups). The definition of a “biased test” is one that does not measure the same thing across groups, and almost always, this means tests that inadvertently privilege the dominant group.