Skip to main content
Social Sci LibreTexts

Glossary

  • Page ID
    196741
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    ANOVA, one-way: A statistical test used to evaluate claims about whether three or more population means are different.

    Beta (or standardized) slope: Slopes with standard deviations as their unit.

    Causality / Causation: There is a causal relationship when one variable has an impact or effect on variation of another variable. This is evaluated through significance, substantiveness, causal order, theory, and nonspuriousness.

    Central Limit Theorem: Given a random sample with enough cases, the distribution of sample means will be normally distributed.

    Cronbach's alpha (𝜶): A common measure of index reliability (from 0 unreliable to 1 perfect reliability, with 0.7 or higher being reliable enough).

    CodebookA document that tells you the codes for each variable in your dataset and what attributes they represent

    Codes: Numbers that represent variable attributes 

    Coefficient of determination (r2): The proportion of the dependent variable's variance explained by the independent variable.

    Confidence interval: The values within which we are claiming a population mean or proportion falls, at a particular confidence level, based on a representative sample of that population.

    Control variable: An additional variable added to a model and held constant in order to improve our evaluation of relationships with the dependent variable.

    Correlation: A measure of the strength and direction of linear relationships.

    Crosstabulation: A crosstab is a table that shows subgroup distribution of data, enabling examination of potential relationships between variables.

    Partial (or elaborated) crosstabulation: Crosstabs that add a control variable to the original bivariate crosstab, producing separate bivariate crosstabs for each subgroup.

    Data: Information. In statistics, usually the values for variable(s).
    Data is a plural word. The singular is datum.

    Dataset: A collection of data organized for analysis

    Histogram: A frequency distribution graph for numeric data.

    Hypothesis: An inferential claim about a population.

    Alternative hypothesis (Ha): There is a difference/relationship (among the target population).

    Null hypothesis (H): There is no difference/relationship (among the target population).

    Hypothesis test: A statistical procedure that tests an inferential claim about a population using representative sample data. Also known as "statistical test."

    Index: An index is when multiple variables are combined together into one construct.

    Inferential statistics: Statistical procedures where we make claims about a population based on corresponding sample data.

    Interaction: A statistical interaction occurs when the effects of one predictor variable varies based on another predictor variable.

    Interquartile range (IQR): The distance spanning the middle 50% of data, the distance between the lower and upper quartile values.

    Level of measurement: a variable characteristic classifying the relationship among a variable's values

            Nominal: Nominal variables have values that are qualitative categories. They do not have any meaningful rank or order.

    Ordinal: Ordinal variables have values that are qualitative categories. They do have a meaningful rank or order.

    Ratio: Ratio-level variables have values that are numeric. They have a true zero point, meaningful ratios, consistent intervals between them, and the numbers function arithmetically.

    Interval: Interval-level variables have equal intervals between values, but do not meet all the criteria to be classified as ratio-level variables. Some definitions of interval-level variables require interval-level variables to have numeric values. 

    Indicator: Indicator variables, also called dummy variables, are dichotomous/binary, meaning they only have two categories, and these two values are coded as 0 and 1.

    Linear relationship: A relationship between two variables that takes the form of a line when graphed.

    Mean: The average value; the value each case would have if all cases' values were redistributed equally among all the cases.

    Median: The middle number in a group of ordered numbers.

    Mode: The repeated value that appears most frequently

    Multivariate regression: Regression that uses multiple independent variables.

    Negative relationship: As x (independent variable) increases, y (dependent variable) decreases.

    Normal distribution: A bell-shaped symmetrical histogram with data concentrated in the middle and evenly tapering off to both sides.

    One-sample t-test: A statistical test used to evaluate claims about whether a population mean is different from a particular hypothesized value.

    Outlier: An extreme value; a case with a value that substantively differs from most other values.

    Percentile: The proportion of data that falls below a given value. The nth percentile is the value below which n% of observations fall.

    Population: All individuals, groups, objects, or cases of interest.

    Positive relationship: As x (independent variable) increases, y (dependent variable) increases.

    Q1, the lower quartile, is the median of the lower half of the data. It is the 25th percentile, greater than 1/4 or 25% of the data (and less than 3/4 or 75% of the data).

    Q3, the upper quartile, is the median of the upper half of the data. It is the 75th percentile, greater than 3/4 or 75% of the data (and less than 1/4 or 25% of the data).

    Recoding: The process of changing a variable's number-attribute assignments so that it has new/different coding.

    Range: The distance between the minimum and maximum values.

    Reference group set: A group of indicator variables for one construct that as a collection represent its various categories. Each category, other than the comparison reference group, has an indicator variable where it is coded as 1.

    Regression line (or least squares line, or line of best fit): The line that, when drawn, is closest to the observed data. This is measured by minimizing the sum of the squared distance of each point's y-coordinate from a potential line's corresponding y-coordinate.

    P-value: The probability (likelihood) of getting a particular sample statistic as extreme (or more extreme than) the outcome we observed, based on what we would expect from the sampling distribution.

    Sample: A subset of cases from the target population.

    Sampling distribution: A distribution of sample means from the same population.

    Sampling error: The difference between sample statistics and population parameters resulting from samples not being perfectly representative of the populations they are drawn from.

    Scatterplot: A graph that shows pairs of quantitative data using a rectangular coordinate system.

    Significant difference: If we are over 95% confident that two proportions or means are different, we label them as significantly different.

    Standard deviation: An estimate of the average distance of cases' data values from their overall mean.

    Standard error: The standard deviation for a sampling distribution. This is the average distance of sample means from the population mean.

    Standard normal distribution: Also known as a z-distribution, this is a special normal distribution standardized by z-scores.

    Statistical test: A statistical procedure that tests an inferential claim about a population using representative sample data. Also known as "hypothesis test."

    Statistics: data, the practice of using data for research

    Substantive difference: A difference that is big, sizable, notable.

    Skewed distribution: An asymmetrical histogram.

    Strength (for linear relationships): The extent to which the points converge into a common pattern, into a particular linear relationship.

    T-score: How many standard errors a value (sample mean) is away from the population mean.

    Two-sample t-test: A statistical test used to evaluate claims about whether two population means are different.

    Unit of analysis: Who or what you are analyzing

    Variable: Anything that can vary (differ), as opposed to something that must be constant (the same).

    Other words you will come across that have a form of the word variable in them (variate) include:
    univariate: having to do with one (uni) variable
    bivariate: having to do with two (bi) variables
    multivariate: having to do with multiple (multi) variables 

    Antecedent variables: Variables that precede (come before) both the independent and dependent variables.

    Dependent Variable: The outcome; the variable potentially being impacted by the independent variable.

    Independent Variable: The determinant; the variable potentially impacting the dependent variable.

    Intervening variables: Variables that come in between the independent and dependent variables.

    Weighting: A process of adjusting statistical calculations to compensate for sampling error.

    Z-score: The number of standard deviations a value is from its mean.

    Z-distribution: See "standard normal distribution."