12.6: Inferential Statistics (Summary)
- Page ID
- 309695
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Key Takeaways
Key Terms and Concepts
STATISTICS
Numerical characteristics of samples.
PARAMETERS
Numerical characteristics of populations.
SAMPLING ERROR
Random variation in statistics from sample to sample.
NULL HYPOTHESIS TESTING
Using sample data to test whether an effect exists in the population.
ALTERNATIVE HYPOTHESIS
The claim that an effect or relationship does exist in the population.
REJECT THE NULL HYPOTHESIS
Concluding that an effect exists based on low p-value.
RETAIN THE NULL HYPOTHESIS
Fail to concluding that an effect exists based on p-value.
ALPHA
The significance level; the criterion for rejecting the null hypothesis (typically .05).
STATISTICALLY SIGNIFICANT
A result with p-value less than alpha, typically p < .05.
PRACTICAL SIGNIFICANCE
Usefulness of research in real-world context.
TEST STATISTIC
A statistical test to evaluate your hypothesis.
CRITICAL VALUES TABLE
Used to interpret the significance of a test statistic.
ONE-SAMPLE t-TEST
Comparing a sample mean to a hypothetical population mean.
ONE-TAILED TEST
Testing for an effect in one specific direction.
TWO-TAILED TEST
Testing for an effect in either direction.
DEPENDENT-SAMPLES t-TEST
Used to test two means for the same sample tested at two different times or conditions.
DIFFERENCE SCORE
The change score for each participant in a repeated measures design.
INDEPENDENT-SAMPLES
Used to compare the means of two separate samples.
ANALYSIS OF VARIANCE (ANOVA)
A test of the null hypothesis when there are more than two groups or conditions to compare.
ONE-WAY ANOVA
ANOVA used when there is one group with more than two levels..
MEAN SQUARES BETWEEN GROUPS
An estimate of population variance based on differences among sample means.
MEAN SQUARES WITHIN GROUPS
An estimate of population variance based on differences within sample means.
POST HOC COMPARISONS
When ANOVA analysis indicates a significant difference between means, a post hoc comparison is conducted to indicate where the significant difference lies among the pairs of levels.
REPEATED-MEASURES ANOVA
Similar to one-way ANOVA, but measuring dependent variable multiple times for each participant.
FACTORIAL ANOVA
More than one independent variable included in the design.
CHI-SQUARE TEST
A test for categorical data comparing observed frequencies to expected frequencies.
TYPE I ERROR
Researcher incorrectly rejects the null hypothesis.
TYPE II ERROR
Researcher fails to reject a false null hypothesis.
FILE DRAWER PROBLEM
Publication bias where non-significant results remain unpublished.
p- HACKING
Ways of creating too many Type I errors by making decisions in the research design process to increase chance of significance.
STATISTICAL POWER
The probability of detecting a real effect when it exists.
CONFIDENCE INTERVALS
A range of values likely to contain the population parameter.
BAYESIAN STATISTICS
An alternative statistical framework using prior probabilities and updating beliefs.
REPLICABILITY CRISIS
Inability of researchers to replicate earlier findings.
HARKING
Hypothesizing after results are known.
OPEN SCIENCE PRACTICES
Mechanisms designed to increase transparency and openness in scientific research.
Test Your Knowledge (answers at end of section)
1. What is the purpose of null hypothesis testing in psychological research?
a) To prove that a hypothesis is true
b) To help researchers decide whether a sample relationship reflects a population relationship or sampling error
c) To eliminate all Type I errors from research
d) To determine the exact probability that the alternative hypothesis is correct
2. A researcher finds a correlation of +0.25 in a sample of 100 participants. According to the principles discussed in the chapter, what two factors primarily determine whether this result is statistically significant?
a) The strength of the relationship and the size of the sample
b) The researcher's expectations and the research budget
c) The type of statistical test used and the participant demographics
d) The journal's publication standards and the research institution
3. A health psychologist wants to compare calorie estimates between three groups: psychology majors, nutrition majors, and dieticians. Which statistical test is most appropriate?
a) Independent-samples t-test
b) Chi-square test
c) Dependent-samples t-test
d) One-way ANOVA
4. In a one-sample t-test with 24 degrees of freedom, a researcher calculates a t-value of 2.10 for a two-tailed test. Using α = .05 and knowing that the critical value is ±2.064, what should the researcher conclude?
a) Retain the null hypothesis because the t-value is too close to the critical value
b) Reject the null hypothesis because the t-value exceeds the critical value
c) Conduct additional research because the result is inconclusive
d) Accept the null hypothesis as true
5. What is a Type I error in null hypothesis testing?
a) Rejecting the null hypothesis when it is true
b) Retaining the null hypothesis when it is false
c) Using the wrong statistical test for the research design
d) Failing to calculate the effect size
6. The 'file drawer problem' refers to which of the following issues in psychological research?
a) Researchers losing track of their data files
b) The tendency for non-significant results to go unpublished, leading to an overestimation of effect sizes in published literature
c) Insufficient storage space for research materials
d) The practice of deleting outliers from datasets
7. According to the Reproducibility Project discussed in the chapter, what percentage of the 100 replicated studies found statistically significant effects, compared to the original studies?
a) 97% in both original and replication studies
b) 97% in original studies, but only 36% in replications
c) 60% in original studies, 36% in replications
d) 50% in original studies, 25% in replications
8. Which of the following is an example of 'p-hacking' as described in the chapter?
a) Pre-registering hypotheses before data collection
b) Sharing raw data with other researchers
c) Checking if results are significant before deciding whether to collect more data
d) Publishing null results in open-access journals
Answer Key
1. B - To help researchers decide whether a sample relationship reflects a population relationship or sampling error
The purpose of null hypothesis testing is to help researchers decide between two interpretations of a statistical relationship in a sample: whether it reflects a real relationship in the population or is simply due to sampling error. It does not prove hypotheses true, eliminate errors completely, or determine exact probabilities.
2. A - The strength of the relationship and the size of the sample
The p value depends on two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the lower the p value.
3. D - One-way ANOVA
When comparing more than two means in a between-subjects design, the most common null hypothesis test is the one-way ANOVA. A t-test is only appropriate for comparing two means, and this scenario involves three groups.
4. B - Reject the null hypothesis because the t-value exceeds the critical value
Since the calculated t-value of 2.10 exceeds the critical value of ±2.064 (specifically, 2.10 > 2.064), the researcher should reject the null hypothesis.
5. A - Rejecting the null hypothesis when it is true
A Type I error occurs when we reject the null hypothesis when it is actually true. This means concluding that there is a relationship in the population when in fact there is not. Retaining a false null hypothesis is a Type II error.
6. B - The tendency for non-significant results to go unpublished, leading to an overestimation of effect sizes in published literature
The file drawer problem refers to the tendency for researchers to not submit non-significant results for publication (or for journals not to accept them), resulting in these studies being 'filed away.' This leads to the published literature containing a higher proportion of Type I errors and overstating the strength of relationships.
7. B - 97% in original studies, but only 36% in replications
The Reproducibility Project found that 97 of the original 100 studies found statistically significant effects, and only 36 of the replications did. This dramatic difference highlights the challenges in replicating psychological research findings.
8. C - Checking if results are significant before deciding whether to collect more data
P-hacking involves making various decisions in the research process to increase the chance of a statistically significant result, such as checking if results are significant before deciding whether to recruit additional participants.


