Skip to main content
Social Sci LibreTexts

9.4: Threats to Internal Validity

  • Page ID
    240812
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    1. Identify the threats to internal validity associated with quasi-experimental designs.
    2. Identify the threats to internal validity associated with experimental designs.

    We've touched on the fact that quasi-experimental designs have issues with showing that the treatment, and nothing other than the treatment, could have caused changes in the outcome. This causal interpretation is about internal validity. As a reminder, internal validity of a study is the degree to which we can confidently infer a causal relationship between variables.

    Researchers have come up with some of the most common alternative explanations that make it hard to make causal interpretations. These are called threats to validity. We will start by discussing threats to internal validity that are common in quasi-experimental designs.

    Threats to Validity in Quasi-Experimental Designs

    In Table \(\PageIndex{1}\), you can see descriptions of some common threats to internal validity in quasi-experimental designs. The right-most column will also describe what this threat to internal validity could look like in a quasi-experimental study comparing classes that use OER (openly-licensed educational resources, like the textbook you are reading) to classes that do not use OER, with the outcome being student learning. For these examples, the study is expected to be a pretest-posttest nonequivalent groups design, with the measure of student learning be chapter quizzes.

    Table \(\PageIndex{1}\): Common Threats to Internal Validity for Quasi-Experimental Designs
    Name Description Example
    History The treatment group experiences events (other than the IV) that the control group does not experience. Many instructors started using OER during the pandemic because it was easier for students to access than commercial textbooks. However, if we compared this group of students to students in previous semester, then we would have the threat of history since the previous students (comparison group) did not have to attend school during a pandemic.
    (Re-)Testing

    Repeated exposure to DV.

    [This threat to internal validity is more of a concern when there is no comparison group. When there is a comparison group, we could see that both groups improve, but test to see if the intervention group improved more.]

    Students may improve their test-taking through the practice of weekly chapter quizzes.

    Instrumentation When the instrument changes through use. This includes observer bias.

    Many instructors use quizzes developed by the publisher, so the DV would automatically change for those instructors any time a different textbook is use.

    This still could be a threat for instructors who develop their own quizzes if they change some questions in their quiz between the pretest and posttest.

    Regression to the Mean Just due to chance, extreme scores tend to regress towards the mean in later measurements. Some students, some sections, some semesters just go better than others. If, by chance, a group of students, a section or a commercial textbook, or a good semester was in the comparison group on the pretest, then their inordinately high scores could decrease (regress) to more typical levels (the mean) on the posttest. This would make it look like the commercial textbook hurt student learning, even if it was an artifact of statistics and probability.
    Attrition/ Mortality Participants drop-out (further reducing initial equivalence). Because of the high cost of textbooks, it's more likely that students who have to buy a textbook, compared to students in an OER section, would drop out of the class because they need to work more.
    Selection There are relevant differences between the groups before the experiment begins.

    Alternatively, an instructor may choose to use the OER in thei online section and a traditional textbook in their face-to-face section. However, there are lots of reasons why students would take an online course (or take a face-to-face course) that could also affect their scores on the chapter quizzes. Online students tend to work more hours or have more family responsibilities than students in on-campus sections. These work and family responsibilities may limit how much time the student can study for the course, so poor scores on the chapter quizzes would be due to time spent on studying rather than the type of textbook.

    Also, the example above comparing students from before the pandemic to students after the pandemic is a good example of this.

    One less common threat is maturation. During longer studies, like research that lasts a years rather than months, participants may naturally change between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a year long course program, participants might learn how to study better or inefficient it is to try to multitasking. It could be that natural development that leads to better scores, not the intervention or comparison. Like testing (sometimes called re-testing), this threat to internal validity applies more to posttest only nonequivalent groups designs (also called natural groups designs).

    A threat to internal validity related to regression to the mean is spontaneous remission. This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001). Thus one must generally be very cautious about inferring causality from pretest-posttest designs.

    What do you think?

    Do you think that spontaneous remission could happen in our study on student learning comparing different types of textbooks?

    Threats to Internal Validity in Between Groups Designs

    With quasi-experimental designs, we know that something else may have caused the effect because we didn't have initial equivalence between the groups (or may have only had one group). However, between groups experimental designs also have threats to validity. Quasi-Experimental designs share the threats to internal validity shown in Table \(\PageIndex{2}\) with between groups experimental designs. Again, the study in these examples a pretest-posttest nonequivalent groups design between sections that use an OER to sections that use a commercial textbook. The measure of student learning is still chapter quizzes.

    Table \(\PageIndex{2}\): Common Threats to Internal Validity for Experimental and Quasi-Experimental Designs
    Name Description Example
    Contamination (3 similar kinds)
    • Rivalry: Control group is creative or motivated to out-perform treatment group.
    • Diffusion of Treatment: Control group affected by the treatment.
    • Equalization: Control group receives similar treatment through alternate means.

    Note that in all three types, the control group is doing something to overcome their disadvantage (compared to the treatment group).

    • Rivalry: The group with the traditional textbook might learn about the group with a free, online textbook. Thinking that it is easier to access the course materials, the traditional textbook group may put more reminders to bring their textbook with them to class and study session.
    • Diffusion of Treatment: The group with the free, online acces to their textbook may share the link with their friends in a section of the course that only had the traditional textbook.
    • Equalization: The group with the traditional textbook might search online for learning materials for the course, or they may spend more time on the learning materials posted in their course shell.
    Experimenter Expectancy Effects Experimenter’s beliefs may affect their behavior towards the participants, or their measurements. This can also include observer bias. In this case, the experimenter is probably the instructor so they would expect higher scores for the OER group. They may unconsciously treat the students in the OER section better, or grade their quizzes with more generosity.
    Novelty Effects New activities or additional attention (Hawthorne effect) can change participants’ behaviors. Students who haven't enrolled in a class with an OER might find having a free, online textbook very novel and interesting, which may lead to them spending more time reviewing the textbook. This could affect quiz scores, even if nothing else about having OER affect the students' learning.

    Thinking back to Hatzenbuehler et al. (2012), what threats to internal validity might be relevant?

    Why do you think?

    What threats to internal validity for quasi-experimental designs seem to apply to the comparison of medical care visits, medical care costs, mental health care visits, and mental health care costs for gay and bisexual men in Massachusets between a year before Massachusets became the first state to legally recognize same-gender marriage (Hatzenbuehler et al., 2012) to a year after same-gender marriage was legally recognized in the state? Why those?

    • History
    • (Re-)Testing
    • Instrumentation
    • Regression to the Mean
    • Attrition/ Mortality
    • Selection
    • Maturation
    • Spontaneous Remission

    What threats to internal validity for any experimental design seem to apply to Hatzenbuehler et al. (2012)? Why those?

    • Contamination: Rivalry
    • Contamination: Diffusion of Treatment
    • Contamination: Equalization
    • Experimenter Expectancy Effects
    • Novelty Effects

    References

    Hatzenbuehler, M. L, O'Cleirigh, C., Grasso, C., Mayer, K., Safren, S., & Bradford, J. (2012). Effect of same-sex marriage laws on health care use and expenditures in sexual minority men: A quasi-natural experiment. American Journal of Public Health, 102(2), 285-291.

    Posternak, M. A., & Miller, I. (2001). Untreated short-term course of major depression: A meta-analysis of studies using outcomes from studies using wait-list control groups. Journal of Affective Disorders, 66, 139–146.

    This page titled 9.4: Threats to Internal Validity is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton via source content that was edited to the style and standards of the LibreTexts platform.