15.3: Two-Group Comparison
-
- Last updated
- Save as PDF
One of the simplest inferential analyses is comparing the posttest outcomes of treatment and control group subjects in a randomised posttest-only control group design—such as whether students enrolled to a special program in mathematics perform better than those in a traditional math curriculum. In this case, the predictor variable is a dummy variable (1 = treatment group, 0 = control group), and the outcome variable, performance, is ratio-scaled—e.g., score on a math test following the special program. The analytic technique for this simple design is a one-way ANOVA (one-way because it involves only one predictor variable), and the statistical test used is called a Student’s -test , or -test).
The -test was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness Brewery in Dublin, Ireland to monitor the quality of stout—a dark beer popular with ninteenth century porters in London. Because his employer did not want to reveal the fact that it was using statistics for quality control, Gosset published the test in Biometrika using his pen-name ‘Student’—he was a student of Sir Ronald Fisher—and the test involved calculating the value of t, which was a letter used frequently by Fisher to denote the difference between two groups. Hence, the name Student’s -test, although Student’s identity was known to fellow statisticians.
The -test examines whether the means of two groups are statistically different from one another (non-directional or two-tailed test), or whether one group has a statistically larger (or smaller) mean than the other (directional or one-tailed test). In our example, if we wish to examine whether students in the special math curriculum perform better than those in the traditional curriculum, we have a one-tailed test. This hypothesis can be stated as:
\mu_{2} \qquad \mbox{(alternative hypothesis)} \end{eqnarray*}" decoding="async" height="44" loading="lazy" src=" https://usq.pressbooks.pub/app/uploa...69ee5fc_l3.svg " title="Rendered by QuickLaTeX.com" width="344">
where represents the mean population performance of students exposed to the special curriculum (treatment group) and is the mean population performance of students with traditional curriculum (control group). Note that the null hypothesis is always the one with the ‘equal’ sign, and the goal of all statistical significance tests is to reject the null hypothesis.
How can we infer about the difference in population means using data from samples drawn from each population? From the hypothetical frequency distributions of the treatment and control group scores in Figure 15.2, the control group appears to have a bell-shaped (normal) distribution with a mean score of 45 (on a 0–100 scale), while the treatment group appear to have a mean score of 65. These means look different, but they are really sample means ( ), which may differ from their corresponding population means ( ) due to sampling error. Sample means are probabilistic estimates of population means within a certain confidence interval—95% CI is the sample mean two standard errors, where standard error is the standard deviation of the distribution in sample means as taken from infinite samples of the population. Hence, statistical significance of population means depends not only on sample mean scores, but also on the standard error or the degree of spread in the frequency distribution of the sample means. If the spread is large (i.e., the two bell-shaped curves have a lot of overlap), then the 95% CI of the two means may also be overlapping, and we cannot conclude with high probability ( ) that that their corresponding population means are significantly different. However, if the curves have narrower spreads—i.e., they are less overlapping), then the CI of each mean may not overlap, and we reject the null hypothesis and say that the population means of the two groups are significantly different at .
To conduct the -test, we must first compute a -statistic of the difference in sample means between the two groups. This statistic is the ratio of the difference in sample means relative to the difference in their variability of scores (standard error):
where the numerator is the difference in sample means between the treatment group (Group 1) and the control group (Group 2) and the denominator is the standard error of the difference between the two groups, which in turn, can be estimated as:
is the variance and is the sample size of each group. The -statistic will be positive if the treatment mean is greater than the control mean. To examine if this -statistic is larger than that possible by chance, we must look up the probability or -value associated with our computed -statistic in statistical tables available in standard statistics textbooks or on the Internet or as computed by statistical software programs such as SAS and SPSS. This value is a function of the -statistic, whether the -test is one-tailed or two-tailed, and the degrees of freedom ( ) or the number of values that can vary freely in the calculation of the statistic—usually a function of the sample size and the type of test being performed. The degree of freedom of the t-statistic is computed as:
which often approximates to ( ). If this -value is smaller than a desired significance level (say ) or the highest level of risk (probability) we are willing to take to conclude that there is a treatment effect when in fact there is none (Type I error), then we can reject the null hypotheses.
After demonstrating whether the treatment group has a significantly higher mean than the control group, the next question usually is what is the effect size (ES) or the magnitude of the treatment effect relative to the control group? We can estimate the ES by conducting regression analysis with performance scores as the outcome variable ( ) and a dummy coded treatment variable as the predictor variable ( ) in a two-variable GLM. The regression coefficient of the treatment variable ( ), which is also the slope of the regression line ( ), is an estimate of the effect size. In the above example, since is a dummy variable with two values (0 and 1), , and hence the effect size or is simply the difference between treatment and control means .