Skip to main content
Social Sci LibreTexts

8.3: Introduction to Statistical Inference and Hypothesis Testing

  • Page ID
    76231
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    • Explain the properties of the normal distribution
    • Explain the concept of z-score and calculate it
    • Conduct a hypothesis testing (differences of means test)
    • Differentiate between Type-I and Type-II errors

    Statistical inference is defined as the process of analyzing data generated by a sample, but then used to determine some characteristic of the larger population. Remember, surveys analyses are the bread and butter of quantitative political science. As we are most likely unable to survey everyone in each population, such as all registered voters in the U.S., we instead generate a sample that allows to draw inferences or draw conclusions about the studied population. Samples are useful as it allows scholars to test relationships between variables without having to spend the millions needed to research a larger population.

    Before we discuss the concepts of statistical inference and the means of testing relationships, let us begin by revisiting Figure 8.1 located at the end of the previous section (Section 8.2). You will notice that the curve is bell-shaped, with the exam scores peaking in the middle. This curve is called a normal distribution where the value of the mean, median, and the mode is the same, and data near the mean are more frequent in occurrence. It is safe to say that most variables that political scientists are interested can be assumed to be normally distributed. But, what does this curve represents? The height of the line represents the density of a particular observation.

    Do you notice that the peak of the curve is located at the middle of the distribution? It means that there are a lot more observations with the value of the mean or close to it than any other values in a normally distributed variable. In other words, as you move away (or deviate) from the mean, you will see fewer observations. It may make more intuitive sense using the test score example from the previous section. The mean test score of 85 signifies that a large proportion of students scored something close to 85. Recall the idea of standard deviation? Approximately 68% of the scores will fall between the first standard deviation from the mean. In the above example, we noted that 68% of students fall between the scores of 80 and 90.

    Another thing you might notice about the normal distribution curve is that is symmetrical. Half of the observations fall above the mean and the other half lies under the mean. Again, a normal distribution has the same value for mean, median and mode, meaning that the value of the mean is the most occurring and is also the middle value. Given this, the normal distribution is often referred to as N(μ, σ2).

    Sometimes, you may be interested in comparing certain values using different measures that are designed to measure similar concepts. Let us take the SAT and ACT for this example (adopted form OpenIntro Statistics). High school students who are interested in applying for four-year colleges and universities, are required to complete at least one of these aptitude examinations. Universities and colleges then use the SAT or ACT score, along with a combination of other inputs, such as GPA and community service, to determine if a student’s application is accepted. It is important to note that the SAT is scored out of 1600 and that the ACT is scored about 36. For example, say Carlos took the SAT and scored a 1300, and Tomoko took the ACT and scored a 24. How can you compare and determine who has performed better? Well, one way is to standardize the scores if certain statistics are available: the mean and the standard deviation. With the mean and standard deviation along with the values of interest (in this case the test scores of Carlos and Tomoko), we can calculate the Z-score, which tells us the number of standard deviations that a particular observation falls above or below from the mean.

    Z-score = x-μ/σ (8.6)

    In Equation 9.6, x represents an observation you are interested in. The mean is represented by μ and σ denotes the standard deviation of the dataset. So, in order for us to be able to compare the score of Carlos and Tomoko, we first calculate z-scores for both and compare them. We need the information below as well to accomplish this task.

    Statistics SAT ACT
    Mean(μ) 1100 21
    Standard Deviation(σ) 200 6

    Carlos took the SAT and scored 1300 so his z-score is:

    Z = (1300-1100)/200 = 1

    Carlos took the ACT and scored 24 so his z-score is:

    Z = (24-21)/6= 0.5

    These statistics mean that Carlos’s score was 1 standard deviation above the mean whereas Tomoko’s score was 0.5 standard deviation above the mean. So, who performed better on the standardized test? The answer is Carlos as 1 standard deviation above the mean is better than 0.5 standard deviation above the mean. Keep in mind that it is quite likely for a z-score to have a negative value as well. This simply means that the standard deviation is below the mean by a certain distance. Z-scores allow researchers to compare the scores of the same exam taken in different class sections, provided that the mean and the standard deviation for both classes are available.

    Once we establish the techniques for comparing data, such as scores for the SAT and ACT, research can start developing statistical hypotheses. Statistical hypotheses are statements about some characteristics of a variable or a collection of variables. There are two types of hypotheses used in statistical hypothesis testing. A null hypothesis (Ho) is a working statement that posits the absence of statistical relationship between two or more variables. In statistics, we desire in proving whether a working statement can be proven false. Related to the null hypothesis is the alternative hypothesis (Ha). Also known as research hypothesis, it is simply an alternative working statement to the null hypothesis. Essentially, it is the claim a researcher is making when testing the relationships between data. To best illustrate statistical hypotheses, null and alternative hypotheses, let us consider the following data and go through the process of hypothesis testing.

    The Department of Political Science at San Diego City College wanted to see if extra study sessions will have any effect on the students’ performance on the midterm exam. We have randomly selected students to attend extra study sessions.
    for the American politics (population mean) class was 75, with the standard deviation of 7 amongst 200 students. The mean score of the students who attended the extra study session (sample mean) was 82 and there were 50 students who were attended. Can we figure out if the extra study sessions on average had any effect on student performance?

    In order for us to be able to conduct this test, we have to decide on a couple of more things. First, we have to determine the level of probability that you are comfortable with in terms of mistakenly accepting the alternative hypothesis. This is called statistical significance or the alpha level. In other words, it is the probability of rejecting the null hypothesis when it is true. For example, an alpha of 0.05 means that we want to be 95% confident, and this is typically the level that most political scientists would agree as being acceptable. For this example, let us use the alpha of 0.05 (95% confidence). This decision lead us to identify another critical element needed for hypothesis testing: critical z-score. This value tells us whether we need to reject the research claim or not. Since we have decided that the alpha to be used here is 0.05, the critical z- score is 1.96. You can identify this number by a z-score probability table often located at the back of an introductory statistics textbook. Also, we need to decide whether we are going to be conducting one-tailed or two-tailed test. Since this is beyond the scope of this textbook, I will use the two-tailed test for this example. The summary of the information we have for this example can be found in the table below.

    Statistic Value
    Population Mean(μ) 75
    Population Deviation(σ) 7
    Sample Mean(Y_underbar) 82
    Sample size(n) 50
    Alpha-level 0.05
    Critical z-score 1.96
    Null Hypothesis(Ho) Y_underbar = μ
    Alternative HypothesisI(Ha) Y_underbar = μ or Y_underbar > μ

    Now we have all the necessary information, we can conduct the hypothesis testing using this example. Ultimately, a hypothesis testing involves the examination of the observed test statistic relative to the threshold that you have determined (critical z-score). If the observed test statistic goes beyond the critical value, we can safely say that your research claim may be correct. To calculate the observed test statistic (in this case z-score for the samples) by using the equation below.

    Zobs = |Y_underbar - μ|/

    Zobs = |82-75|/7* sqrt(50)

    Now compare the observed z-score and the critical z-score.

    Zobs : |7.07|>1.96(Zcritical)

    In this case, since the observed z-score was larger than the threshold of 1.96 we can say that the claim that Y_underbar = μ can be rejected. Conversely, if the observed z-score was smaller than 1.96, we will say, we failed to reject the null hypothesis. It is important to note that we never accept the null hypothesis. So, what does this mean ultimately? According to the test result here, we can safely say that the observation that the average score of those who received extra support was higher than the population average was not the result of change. In other words, we can make a conjecture that the extra support may have contributed to the higher average for the sample (extra support) group. While our example used the comparison of the means using z-scores, we can use the same concept for the comparison of the means tests with t-test and a comparison of proportions as well.

    When conducting a hypothesis testing to make a statistical inference, it is possible that your decisions about whether to reject the null-hypothesis or not was incorrect. It is possible for you to mistakenly reject the null-hypothesis that was true. This type of error is called a type-I error, and this is the case of “false-positive” conclusion. When a researcher fails to reject the null hypothesis that is false, the researcher has committed a type-II error (“false-negative” conclusion). We can try to safeguard against these errors. The significance level that we discussed above (alpha-level) is the probability that you will commit a type-I error. By increasing the alpha-level, you can ensure that your chance of committing this type or error is reduced. As for a type-II error, the probability of committing this error relates to the concept of “power” in the testing. Simply put, the larger the sample included in the test, the less likely that the study will suffer from a type-2 error.

    In this section, we have introduced the foundational knowledge to expand on your interest in advancing your quantitative method skills. What you were exposed here is a small tip of a huge statistical iceberg. If you are interested in the quantitative political research, we highly encourage you to enroll in an introductory level statistics course, preferably in political science (if your school offers) or in other social and behavioral sciences department.


    This page titled 8.3: Introduction to Statistical Inference and Hypothesis Testing is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Josue Franco, Charlotte Lee, Kau Vue, Dino Bozonelos, Masahiro Omae, & Steven Cauchon (ASCCC Open Educational Resources Initiative (OERI)) via source content that was edited to the style and standards of the LibreTexts platform.