Skip to main content
Social Sci LibreTexts

6.5: Comparing Means

  • Page ID
    268036
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    1. Interpret one-sample, dependent-samples, and independent-samples t- tests.
    2. Interpret the results of one-way and repeated measures ANOVAs.
    3. Identify when a design would need a factorial ANOVA.

    The emphasis here is on providing enough information to remember what you learned in your statistics course to be able to interpret common statistical analyses to compare means. For details on how to calculate these analyses, and when to use each one, review one of the openly-licensed textbooks on statistics on LibreTexts; textbooks for some disciplines within the social sciences might be most helpful: https://stats.libretexts.org/Bookshe...ied_Statistics You can find the the formulas for many of these in statistics textbooks, but you can also check out the Common Formulas page in the Back Matter of this book.

    The t- Test

    Many studies in the social sciences focus on the difference between two means. The most common statistical analysis to compare a quantitative DV for two different groups is the t- test. There are three different kinds of t-tests for three different kinds of groups.

    One-Sample t- Test

    The one-sample t-test is used to compare a sample mean (M) with a hypothetical population mean (μ, mu) that provides some interesting standard of comparison. The null hypothesis is that the mean of the sample (M) is statistically similar to the mean for the population (µ); in symbols, this is represented as: M=µ. There could be three different research hypothesis:

    1. The mean of the sample (M) is statistically different than the mean for the population (µ):, M≠µ
    2. The mean of the sample (M) is statistically smaller than the mean for the population (µ): M<µ
    3. The mean of the sample (M) is statistically bigger than the mean for the population (µ): M>µ

    The first research hypothesis is non-directional, and the last two research hypotheses are directional.

    The reason the t-statistic (or any test statistic) is useful is that we know how it is distributed when the null hypothesis is true. As shown in Figure \(\PageIndex{1}\), the distribution of t, or the t-distribution, is unimodal and symmetrical, and it has a mean of 0. Its precise shape depends on a statistical concept called the degrees of freedom, which for a one-sample t-test is N − 1. (There are 24 degrees of freedom for the distribution shown in Figure \(\PageIndex{1}\).) The important point is that knowing this distribution makes it possible to find the p-value for any t-score, or the probability of obtaining this result if the null hypothesis were true (that the means were similar). Consider, for example, a t score of 1.50 based on a sample of 25. The probability of a t-score at least this extreme is given by the proportion of t-scores in the distribution that are at least this extreme. For now, let us define extreme as being far from zero in either direction. Thus the p-value is the proportion of t-scores that are 1.50 or above or that are −1.50 or below—a value that turns out to be .14.

    Fig13-1.png
    Figure \(\PageIndex{1}\): Distribution of t Scores (With 24 Degrees of Freedom) When the Null Hypothesis Is True. The red vertical lines represent the two-tailed critical values, and the green vertical lines the one-tailed critical values when α = .05.

    If p is equal to or less than .05 (p<.05), we reject the null hypothesis and conclude that the population mean differs from the hypothetical mean of interest. If p is greater than .05 (p>.05), we retain the null hypothesis and conclude that there is not enough evidence to say that the population mean differs from the hypothetical mean of interest. (Again, technically, we conclude only that we do not have enough evidence to conclude that it does differ.)

    A two-tailed test is when we reject the null hypothesis if the t-score for the sample is extreme in either direction. This test makes sense when we believe that the sample mean might differ from the population mean but we do not have good reason to expect the difference to go in a particular direction. This follows our non-directional research hypothesis (M≠µ). But it is also possible to do a one-tailed test, where we reject the null hypothesis only if the t-score for the sample is extreme in one direction that we specify before collecting the data; our directional research hypotheses are examples of these. This test makes sense when we have good reason to expect the sample mean will differ from the hypothetical population mean in a particular direction.

    Here is how it works. Each one-tailed critical value in Table \(\PageIndex{1}\) can again be interpreted as a pair of values: one positive and one negative. A t-score below the lower critical value is in the lowest 5% of the distribution, and a t-score above the upper critical value is in the highest 5% of the distribution. For 24 degrees of freedom, these values are −1.711 and 1.711. (These are represented by the green vertical lines in Figure \(\PageIndex{1}\).) However, for a one-tailed test, we must decide before collecting data whether we expect the sample mean to be lower than the population mean (Research Hypothesis #2: M<µ), in which case we would use only the lower critical value, or we expect the sample mean to be greater than the population mean (Research Hypothesis #3: M>µ), in which case we would use only the upper critical value. Notice that we still reject the null hypothesis when the t-score for our sample is in the most extreme 5% of the t scores we would expect if the null hypothesis were true—so are probability remains at .05. We have simply redefined extreme to refer only to one tail of the distribution. The advantage of the one-tailed test is that critical values are less extreme. If the sample mean differs from the population mean in the expected direction, then we have a better chance of rejecting the null hypothesis. The disadvantage is that if the sample mean differs from the population mean in the unexpected direction, then there is no chance at all of rejecting the null hypothesis.

    The Dependent-Samples t– Test

    The dependent-samples t-test (sometimes called the paired-samples t-test) is used to compare two means for the same sample tested at two different times or under two different conditions. This comparison is appropriate for pretest-posttest designs or repeated measures experiments. The null hypothesis is that the means at the two times or under the two conditions are the same in the population. Again, there could be three different research hypothesis:

    1. The mean of one sample (M1) is statistically different than the mean for the other sample (M2):, M1M2
    2. The mean of one sample (M1) is statistically smaller than the mean for the other sample (M2):, M1<M2
    3. The mean of one sample (M1) is statistically bigger than the mean for the other sample (M2):, M1>M2

    Again, the first research hypothesis is non-directional, and the last two research hypotheses are directional, so this test can also be either two-tailed (non-directional) or one-tailed (directional); choose a directional research hypothesis if there's good reason to expect the difference goes in a particular direction.

    The Independent-Samples t- Test

    The independent-samples t-test is used to compare the means of two separate samples (MA and MB). The two samples might have been tested under different conditions in a between-subjects experiment, or they could be pre-existing groups in a cross-sectional design (e.g., women and men, extraverts and introverts). The null hypothesis is that the means of the two populations are the same: MA =MB. Again, there could be three different research hypothesis:

    1. The mean of one sample (MA) is statistically different than the mean for the other sample (MB):, MAMB
    2. The mean of one sample (MA) is statistically smaller than the mean for the other sample (MB):, MA<MB
    3. The mean of one sample (MA) is statistically bigger than the mean for the other sample (MB):, MA>MB

    Again, the first research hypothesis is non-directional, and the last two research hypotheses are directional, so this test can also be either two-tailed (non-directional) or one-tailed (directional).

    But what happens if you want to compare the means of more than two groups? Then you use an ANOVA.

    ANOVA: The Analysis of Variance

    Research designs comparing one quantitative DV between two different groups is very common. However, when you want to compare the means of more than two groups or conditions, the most common null hypothesis test is the analysis of variance (ANOVA). In this section, we look at the one-way ANOVA, within-subjects ANOVA, and factorial ANOVA. We will spend more time on factorial research designs later in this textbook.

    One-Way ANOVA

    The one-way ANOVA is used to compare the means of more than two samples (MA, MBM) in a between-subjects design. The null hypothesis is that all the means are equal in the population: µA= µB =…= M. For research hypotheses, there is the non-directional version that not all the means in the population are equal (at least one mean differs from at least one other mean). For these more complex designs, the directional research hypotheses could be based on theory or prior research results. The easiest way to think about this is that every group's mean would be compared with every other groups mean. When there are three groups, this might look like:

    1. MA<MB
    2. MA<MC
    3. MB>MC

    This gets even more complicated if there are more than three groups since every group should have a research hypothesis that predicts a direction for every other group!

    Exercise \(\PageIndex{1}\)

    What would a directional research hypothesis look like if you compared four IV groups?

    Answer
    1. MA<MB
    2. MA<MC
    3. MA<MD
    4. MB<MC
    5. MB<MD
    6. MC<MD

    In this example, the group on the left was always hypothesized to be smaller than the group on the right, but, as long as the pairings make sense, this doesn't have to be the case. Again, follow theory or prior research to determine your predictions.

    What's also interesting is that as long as there is one pair of groups that are hypothesized to be statistically different, the other pairs can be hypothesized to be similar. This might look like:

    1. MA<MB
    2. MA=MC
    3. MB=MC

    For a t-test, the test statistic is a t. The test statistic for the ANOVA is called F for Ronald Fisher, the statistician who is credited with developing the ANOVA (Salsburg, 2002). It is a ratio of two estimates of the population variance based on the sample data. One estimate of the population variance is for the variability between groups, which is what t-tests only looked at. The other population variance is within groups, which is based on the differences among the scores within each group.

    Again, the reason that F is useful is that we know how it is distributed when the null hypothesis is true. As shown in Figure \(\PageIndex{2}\), this distribution is unimodal and positively skewed with values that cluster around 1. The precise shape of the distribution depends on both the number of groups and the sample size, and there are degrees of freedom values associated with each of these. Again, knowing the distribution of F when the null hypothesis is true allows us to find the p-value.

    Fig13-2.png
    Figure \(\PageIndex{2}\): Distribution of the F Ratio With 2 and 37 Degrees of Freedom When the Null Hypothesis Is True. The red vertical line represents the critical value when α is .05.

    As alwys, if p is equal to or less than .05 (p<.05), then we reject the null hypothesis and conclude that there are differences among the group means in the population. If p is greater than .05 (p>.05), then we retain the null hypothesis (or fail to reject the null) and conclude that there is not enough evidence to say that there are differences.

    Post Hoc Comparisons

    You may have noticed that for directional research hypotheses, rejecting the null hypothesis doesn't really tell us much; when we reject the null hypothesis in a one-way ANOVA, we conclude that the group means are not all the same in the population. But this can indicate different things. With three groups, it can indicate that all three means are significantly different from each other. Or it can indicate that one of the means is significantly different from the other two, but the other two are not significantly different from each other. It could be, for example, that the mean calorie estimates of psychology majors, nutrition majors, and dieticians are all significantly different from each other. Or it could be that the mean for dieticians is significantly different from the means for psychology and nutrition majors, but the means for psychology and nutrition majors are not significantly different from each other. For this reason, statistically significant one-way ANOVA results are typically followed up with a series of post hoc comparisons of selected pairs of group means to determine which are different from which others.

    One approach to post hoc comparisons would be to conduct a series of independent-samples t-tests comparing each group mean to each of the other group means. But there is a problem with this approach. In general, if we conduct a t-test when the null hypothesis is true, we have a 5% chance of mistakenly rejecting the null hypothesis for each comparison. If we conduct several t-tests when the null hypothesis is true, the chance of mistakenly rejecting at least one null hypothesis increases with each test we conduct. Researchers do not usually make post hoc comparisons using standard t-tests because there is too great a chance that they will mistakenly reject at least one null hypothesis. Instead, they use one of several modified t-test procedures—among them the Bonferonni procedure, Fisher’s least significant difference (LSD) test, and Tukey’s honestly significant difference (HSD) test. The details of these approaches are beyond the scope of this book, but it is important to understand their purpose: To test for differences between pairs of means while keeping the risk of mistakenly rejecting a true null hypothesis to an acceptable level (close to 5%).

    Repeated-Measures ANOVA

    Recall that the one-way ANOVA is appropriate for between-subjects designs in which the means being compared come from separate groups of participants. It is not appropriate for within-subjects designs in which the means being compared come from the same participants tested under different conditions or at different times. This requires a slightly different approach, called the repeated-measures ANOVA. The basics of the repeated-measures ANOVA are the same as for the one-way ANOVA. The main difference is that measuring the dependent variable multiple times for each participant allows for a more refined measure of the variation within each group. Imagine, for example, that the dependent variable in a study is a measure of reaction time. Some participants will be faster or slower than others because of stable individual differences in their nervous systems, muscles, and other factors. In a between-subjects design, these stable individual differences would simply add to the variability within the groups and increase the value of the within-groups variability (which would, in turn, decrease the value of F). In a within-subjects design, however, these stable individual differences can be measured and subtracted, lowering value of the within-groups variation; this will result in a higher value of F and a more sensitive test.

    Factorial ANOVA

    Finally, when more than one independent variable, not just one IV comparing different groups, we have factorial design. The appropriate approach is the factorial ANOVA. Again, the basics of the factorial ANOVA are the same as for the one-way and repeated-measures ANOVAs. The main difference is that it produces an F ratio and p value for each IV (main effect) and the combination of each IV (interaction). This will be detailed more in a later chapter, but returning to our calorie estimation example: Imagine that a health psychologist tests the effect of participant major (psychology vs. nutrition) and timing (morning vs. evening) in a factorial design. A factorial ANOVA would produce separate F ratios and p values for the main effect of major, the main effect of timing, and the interaction between major and timing. Appropriate modifications must be made depending on whether the design is between-subjects, within-subjects, or mixed.


    References

    Salsburg, D. (2002). The lady tasting tea: How statistics revolutionized science in the twentieth century. Macmillan.


    This page titled 6.5: Comparing Means is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton via source content that was edited to the style and standards of the LibreTexts platform.