9.4: Threats to Internal Validity
- Page ID
- 240812
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Researchers have come up with some of the most common alternative explanations that make it hard to make causal interpretations. These are called threats to validity. We will start by discussing threats to internal validity that are common in quasi-experimental designs.
Threats to Validity in Quasi-Experimental Designs
In Table \(\PageIndex{1}\), you can see descriptions of some common threats to internal validity in quasi-experimental designs. The right-most column will also describe what this threat to internal validity could look like in a quasi-experimental study comparing classes that use OER (openly-licensed educational resources, like the textbook you are reading) to classes that do not use OER, with the outcome being student learning. For these examples, the study is expected to be a pretest-posttest nonequivalent groups design, with the measure of student learning be chapter quizzes.
| Name | Description | Example |
|---|---|---|
| History | The treatment group experiences events (other than the IV) that the control group does not experience. | Many instructors started using OER during the pandemic because it was easier for students to access than commercial textbooks. However, if we compared this group of students to students in previous semester, then we would have the threat of history since the previous students (comparison group) did not have to attend school during a pandemic. |
| (Re-)Testing |
Repeated exposure to DV. [This threat to internal validity is more of a concern when there is no comparison group. When there is a comparison group, we could see that both groups improve, but test to see if the intervention group improved more.] |
Students may improve their test-taking through the practice of weekly chapter quizzes. |
| Instrumentation | When the instrument changes through use. This includes observer bias. |
Many instructors use quizzes developed by the publisher, so the DV would automatically change for those instructors any time a different textbook is use. This still could be a threat for instructors who develop their own quizzes if they change some questions in their quiz between the pretest and posttest. |
| Regression to the Mean | Just due to chance, extreme scores tend to regress towards the mean in later measurements. | Some students, some sections, some semesters just go better than others. If, by chance, a group of students, a section or a commercial textbook, or a good semester was in the comparison group on the pretest, then their inordinately high scores could decrease (regress) to more typical levels (the mean) on the posttest. This would make it look like the commercial textbook hurt student learning, even if it was an artifact of statistics and probability. |
| Attrition/ Mortality | Participants drop-out (further reducing initial equivalence). | Because of the high cost of textbooks, it's more likely that students who have to buy a textbook, compared to students in an OER section, would drop out of the class because they need to work more. |
| Selection | There are relevant differences between the groups before the experiment begins. |
Alternatively, an instructor may choose to use the OER in thei online section and a traditional textbook in their face-to-face section. However, there are lots of reasons why students would take an online course (or take a face-to-face course) that could also affect their scores on the chapter quizzes. Online students tend to work more hours or have more family responsibilities than students in on-campus sections. These work and family responsibilities may limit how much time the student can study for the course, so poor scores on the chapter quizzes would be due to time spent on studying rather than the type of textbook. Also, the example above comparing students from before the pandemic to students after the pandemic is a good example of this. |
Threats to Internal Validity in Between Groups Designs
With quasi-experimental designs, we know that something else may have caused the effect because we didn't have initial equivalence between the groups (or may have only had one group). However, between groups experimental designs also have threats to validity. Quasi-Experimental designs share the threats to internal validity shown in Table \(\PageIndex{2}\) with between groups experimental designs. Again, the study in these examples a pretest-posttest nonequivalent groups design between sections that use an OER to sections that use a commercial textbook. The measure of student learning is still chapter quizzes.
| Name | Description | Example |
|---|---|---|
| Contamination (3 similar kinds) |
Note that in all three types, the control group is doing something to overcome their disadvantage (compared to the treatment group). |
|
| Experimenter Expectancy Effects | Experimenter’s beliefs may affect their behavior towards the participants, or their measurements. This can also include observer bias. | In this case, the experimenter is probably the instructor so they would expect higher scores for the OER group. They may unconsciously treat the students in the OER section better, or grade their quizzes with more generosity. |
| Novelty Effects | New activities or additional attention (Hawthorne effect) can change participants’ behaviors. | Students who haven't enrolled in a class with an OER might find having a free, online textbook very novel and interesting, which may lead to them spending more time reviewing the textbook. This could affect quiz scores, even if nothing else about having OER affect the students' learning. |
Thinking back to Hatzenbuehler et al. (2012), what threats to internal validity might be relevant?
What threats to internal validity for quasi-experimental designs seem to apply to the comparison of medical care visits, medical care costs, mental health care visits, and mental health care costs for gay and bisexual men in Massachusets between a year before Massachusets became the first state to legally recognize same-gender marriage (Hatzenbuehler et al., 2012) to a year after same-gender marriage was legally recognized in the state? Why those?
- History
- (Re-)Testing
- Instrumentation
- Regression to the Mean
- Attrition/ Mortality
- Selection
- Maturation
- Spontaneous Remission
What threats to internal validity for any experimental design seem to apply to Hatzenbuehler et al. (2012)? Why those?
- Contamination: Rivalry
- Contamination: Diffusion of Treatment
- Contamination: Equalization
- Experimenter Expectancy Effects
- Novelty Effects
References
Hatzenbuehler, M. L, O'Cleirigh, C., Grasso, C., Mayer, K., Safren, S., & Bradford, J. (2012). Effect of same-sex marriage laws on health care use and expenditures in sexual minority men: A quasi-natural experiment. American Journal of Public Health, 102(2), 285-291.


