Skip to main content
Social Sci LibreTexts

11.5: Descriptive Statistics (Summary)

  • Page ID
    309688
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Key Takeaways

    • Every variable has a distribution—a way that the scores are distributed across the levels. The distribution can be described using a frequency table and histogram. It can also be described in words in terms of its shape, including whether it is unimodal or bimodal, and whether it is symmetrical or skewed.
    • The central tendency, or middle, of a distribution can be described precisely using three statistics—the mean, median, and mode. The mean is the sum of the scores divided by the number of scores, the median is the middle score, and the mode is the most common score.
    • The variability, or spread, of a distribution can be described precisely using the range and standard deviation. The range is the difference between the highest and lowest scores, and the standard deviation is the average amount by which the scores differ from the mean.
    • The location of a score within its distribution can be described using percentile ranks or z scores. The percentile rank of a score is the percentage of scores below that score, and the z score is the difference between the score and the mean divided by the standard deviation.
    • Differences between groups or conditions are typically described in terms of the means and standard deviations of the groups or conditions or in terms of Cohen’s d and are presented in bar graphs.
    • Cohen’s d is a measure of relationship strength (or effect size) for differences between two group or condition means. It is the difference of the means divided by the standard deviation. In general, values of ±0.20, ±0.50, and ±0.80 can be considered small, medium, and large, respectively.
    • Correlations between quantitative variables are typically described in terms of Pearson’s r and presented in line graphs or scatterplots.
    • Pearson’s r is a measure of relationship strength (or effect size) for relationships between quantitative variables. It is the mean cross-product of the two sets of z scores. In general, values of ±.10, ±.30, and ±.50 can be considered small, medium, and large, respectively.
    • In an APA-style article, simple results are most efficiently presented in the text, while more complex results are most efficiently presented in graphs or tables.
    • APA style includes several rules for presenting numerical results in the text. These include using words only for numbers less than 10 that do not represent precise statistical results, and rounding results to two decimal places, using words (e.g., “mean”) in the text and symbols (e.g., “M”) in parentheses.
    • APA style includes several rules for presenting results in graphs and tables. Graphs and tables should add information rather than repeating information, be as simple as possible, and be interpretable on their own with a descriptive caption (for graphs) or a descriptive title (for tables).
    • Raw data must be prepared for analysis by examining them for possible errors, organizing them, and entering them into a spreadsheet program.
    • Preliminary analyses on any data set include checking the reliability of measures, evaluating the effectiveness of any manipulations, examining the distributions of individual variables, and identifying outliers.
    • Outliers that appear to be the result of an error, a misunderstanding, or a lack of effort can be excluded from the analyses. The criteria for excluded responses or participants should be applied in the same way to all the data and described when you present your results. Excluded data should be set aside rather than destroyed or deleted in case they are needed later.
    • Descriptive statistics tell the story of what happened in a study. Although inferential statistics are also important, it is essential to understand the descriptive statistics first.

    Key Terms and Concepts

    DESCRIPTIVE STATISTICS

    Techniques for summarizing and displaying data.

    DISTRIBUTION

    The set of scores on a variable for a group of individuals.

    HISTOGRAM

    A bar graph showing the frequency of different scores.

    SYMMETRICAL

    A distribution where both sides mirror each other.

    SKEWED

    A distribution with a tail extending more on one side.

    OUTLIER

    An extreme score that is very different from the others.

    CENTRAL TENDENCY

    A typical or average score that represents the distribution.

    MEAN

    The arithmetic average of all scores.

    MEDIAN

    The middle score when all scores are arranged in order.

    MODE

    The most frequently occurring score.

    VARIABILITY

    How spread out or dispersed the scores are.

    RANGE

    The difference between the highest and lowest scores.

    STANDARD DEVIATION

    The square root of variance; average distance from the mean.

    VARIANCE

    The average of squared deviations from the mean.

    PERCENTILE RANK

    The percentage of scores at or below a given score.

    Z SCORE

    Difference between an individual score and the mean of the distribution.

    EFFECT SIZE

    A measure of the magnitude of a relationship or difference.

    COHEN’S d

    Used to measure the strength of a relationship or effect size.

    LINEAR RELATIONSHIPS

    Relationships that form a straight line on a scatterplot.

    NONLINEAR RELATIONSHIPS

    Relationships that form a curved pattern.

    RESTRICTION OF RANGE

    When one or more variables have a limited range in the sample, relative to the population.

    BAR GRAPHS

    Graphs using bars to compare groups or categories.

    ERROR BARS

    Visual representations of variability of each group or condition on graphs.

    STANDARD ERROR

    The standard deviation of a sampling distribution.

    LINE GRAPHS

    Graphs showing trends with connected points.

    SCATTERPLOTS

    Graphs displaying individual data points for two variables.

    CORRELATION MATRIX

    A table displaying correlations among multiple variables.

    RAW DATA

    Original, unprocessed measurements as collected.

    DATA FILE

    An organized dataset with variables as columns and participants as rows.

    PLANNED ANALYSIS

    Statistical analyses decided upon before data collection.

    EXPLORATORY ANALYSIS

    Examining data for unexpected patterns not specified in advance.

    Test Your Knowledge (answers at end of section)

    1. What does a correlation coefficient (r) of -0.85 indicate?

    A) A weak negative relationship

    B) A strong negative relationship - as one variable increases, the other tends to decrease

    C) No relationship

    D) A positive relationship

    2. Ollendick and colleagues found that children receiving exposure treatment had a mean phobia rating of 3.47 (SD = 1.77) while those receiving education treatment had a mean of 4.83 (SD = 1.52). Cohen's d was 0.82. According to Cohen's guidelines, what does this effect size tell us that the means alone do not?

    A) It shows this is a large effect - the groups differ by 0.82 standard deviations, making results comparable across different measures and studies

    B) It proves the difference is statistically significant

    C) It means the treatment caused the change

    D) It indicates the study should be repeated

    3. When reporting statistical results in APA style, what information must be included for a correlation?

    A) Only the correlation coefficient

    B) The correlation coefficient (r), sample size (n), and p-value

    C) Only the p-value

    D) Just the variables being correlated

    4. What is the primary purpose of using statistical software (like SPSS, R, or Excel) in data analysis?

    A) To make graphs look prettier

    B) Only for very large datasets

    C) To avoid learning statistics

    D) To accurately and efficiently compute statistics, reducing calculation errors and allowing for complex analyses

    5. A researcher finds that one participant has a z-score of +4.2 (reaction time 4.2 standard deviations above the mean). The chapter notes outliers are sometimes defined as scores beyond ±3.00. The chapter gave an example where adding a 5,000 ms reaction time to scores of 200-280 ms raised the mean from 245 ms to 1,445 ms. What does this example teach about handling outliers?

    A) Always automatically remove any score with z > ±3.00

    B) Never remove outliers because all data are valid

    C) Outliers can drastically distort statistics (especially the mean) so they require investigation

    D) Replace outliers with the group mean

    Answer Key

    1. B - A strong negative relationship - as one variable increases, the other tends to decrease

    A correlation coefficient of -0.85 indicates a strong negative relationship between two variables. The negative sign indicates the direction: as one variable increases, the other tends to decrease. The magnitude (0.85) indicates the strength: values closer to -1.0 or +1.0 indicate stronger relationships. A correlation of -0.85 is considered strong because it's close to the maximum value of -1.0. For example, hours spent watching TV and GPA might have a correlation of -0.85, meaning students who watch more TV tend to have lower GPAs.

    2. A - It shows this is a large effect - the groups differ by 0.82 standard deviations, making results comparable across different measures and studies

    Cohen's d = 0.82 represents a large effect size (Cohen's guidelines: ~0.20 = small, ~0.50 = medium, ~0.80 = large).

    3. B - The correlation coefficient (r), sample size (n), and p-value

    APA style requires reporting the correlation coefficient (r), the sample size (n), and the p-value when presenting correlation results. Example: 'There was a significant positive correlation between study time and exam scores, r(48) = .67, p < .001.' The format r(48) indicates a correlation with 48 degrees of freedom (n - 2 for correlation). The coefficient shows the strength and direction, while the p-value indicates statistical significance.

    4. D - To accurately and efficiently compute statistics, reducing calculation errors and allowing for complex analyses

    Statistical software serves several important purposes: (1) Accuracy - eliminates manual calculation errors, especially for complex statistics. (2) Efficiency - computes statistics instantly that would take hours by hand. (3) Handles complex analyses - enables sophisticated statistical techniques.

    5. C - Outliers can drastically distort statistics (especially the mean) so they require investigation

    The chapter's 5,000 ms example powerfully illustrates how a single outlier can render the mean (1,445 ms) greater than 80% of the scores in the distribution and does not seem to represent the behavior of anyone in the distribution very well. Outliers require investigation, not automatic removal or retention.

    References

    Ollendick, T. H., Öst, L.-G., Reuterskiöld, L., Costa, N., Cederlund, R., Sirbu, C.,…Jarrett, M. A. (2009). One-session treatments of specific phobias in youth: A randomized clinical trial in the United States and Sweden. Journal of Consulting and Clinical Psychology, 77, 504–516.

    Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.

    Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16, 259–263.

    Carlson, K. A., & Conard, J. M. (2011). The last name effect: How last name influences acquisition timing. Journal of Consumer Research, 38(2), 300-307. doi: 10.1086/658470

    MacDonald, T. K., & Martineau, A. M. (2002). Self-esteem, mood, and intentions to use condoms: When does low self-esteem lead to risky health behaviors? Journal of Experimental Social Psychology, 38, 299–306.

    McCabe, D. P., Roediger, H. L., McDaniel, M. A., Balota, D. A., & Hambrick, D. Z. (2010). The relationship between working memory capacity and executive functioning. Neuropsychology, 24(2), 222–243. doi:10.1037/a0017619

    Brown, N. R., & Sinclair, R. C. (1999). Estimating number of lifetime sexual partners: Men and women do it differently. The Journal of Sex Research, 36, 292–297.

    Bem, D. J. (2003). Writing the empirical journal article. In J. M. Darley, M. P. Zanna, & H. L. Roediger III (Eds.), The complete academic: A career guide (2nd ed., pp. 185–219). Washington, DC: American Psychological Association.

    Schmitt, D. P., & Allik, J. (2005). Simultaneous administration of the Rosenberg Self-Esteem Scale in 53 nations: Exploring the universal and culture-specific features of global self-esteem. Journal of Personality and Social Psychology, 89, 623–642.

    Buss, D. M., & Schmitt, D. P. (1993). Sexual strategies theory: A contextual evolutionary analysis of human mating. Psychological Review, 100, 204–232.

    Exercises
    • Practice: Make a frequency table and histogram for the following data. Then write a short description of the shape of the distribution in words.
      • 11, 8, 9, 12, 9, 10, 12, 13, 11, 13, 12, 6, 10, 17, 13, 11, 12, 12, 14, 14
    • Practice: For the data in Exercise 1, compute the mean, median, mode, standard deviation, and range.
    • Practice: Using the data in Exercises 1 and 2, find
      • the percentile ranks for scores of 9 and 14
      • the z scores for scores of 8 and 12.
    • Practice: The following data represent scores on the Rosenberg Self-Esteem Scale for a sample of 10 Japanese university students and 10 American university students. (Although hypothetical, these data are consistent with empirical findings [Schmitt & Allik, 2005][1].) Compute the means and standard deviations of the two groups, make a bar graph, compute Cohen’s d, and describe the strength of the relationship in words.
    Japan United States
    25 27
    20 30
    24 34
    28 37
    30 26
    32 24
    21 28
    24 35
    20 33
    26 36
    • Practice: The hypothetical data that follow are extraversion scores and the number of Facebook friends for 15 university students. Make a scatterplot for these data, compute Pearson’s r, and describe the relationship in words.
    Extraversion Facebook Friends
    8 75
    10 315
    4 28
    6 214
    12 176
    14 95
    10 120
    11 150
    4 32
    13 250
    5 99
    7 136
    8 185
    11 88
    10 144
    • Practice: In a classic study, men and women rated the importance of physical attractiveness in both a short-term mate and a long-term mate (Buss & Schmitt, 1993)[2]. The means and standard deviations are as follows. Men / Short Term: M = 5.67, SD = 2.34; Men / Long Term: M = 4.43, SD = 2.11; Women / Short Term: M = 5.67, SD = 2.48; Women / Long Term: M = 4.22, SD = 1.98. Present these results
      • in writing
      • in a figure
      • in a table
    • Discussion: What are at least two reasonable ways to deal with each of the following outliers based on the discussion in this chapter? (a) A participant estimating ordinary people’s heights estimates one woman’s height to be “84 inches” tall. (b) In a study of memory for ordinary objects, one participant scores 0 out of 15. (c) In response to a question about how many “close friends” she has, one participant writes “32.”

    References

    1. Schmitt, D. P., & Allik, J. (2005). Simultaneous administration of the Rosenberg Self-Esteem Scale in 53 nations: Exploring the universal and culture-specific features of global self-esteem. Journal of Personality and Social Psychology, 89, 623–642.
    2. Buss, D. M., & Schmitt, D. P. (1993). Sexual strategies theory: A contextual evolutionary analysis of human mating. Psychological Review, 100, 204–232.

    This page titled 11.5: Descriptive Statistics (Summary) is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton via source content that was edited to the style and standards of the LibreTexts platform.