1.7: Data analysis

Last updated
Save as PDF

Page ID: 122906

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

“Let the data speak for itself” is a frequently invoked dictum that is both grammatically incorrect and impossible. Data, having been recorded, do not then speak for themselves. Data have no meaning apart from how we interpret them. Data analysis is the task of finding meaningful patterns in our data. It’s how we make sense of our data, how we derive meaning from it.

It is accurate enough to say that quantitative data analysis helps us make sense of numeric data and qualitative data analysis helps us make sense of textual data, but that does oversimplify the distinction a bit. Imagine conducting direct observations of presidential primary campaign stump speeches. Each time we observe a speech, we would probably want to record the approximate number of people in attendance. Clearly, that will yield numeric data, and we would use quantitative data analysis techniques to find patterns in them, such as calculating the mean, median, and standard deviation to summarize the central tendency and variation of crowd sizes at the speeches. We would probably also record the speeches themselves and later transcribe them so that we have a verbatim written record of each speech. This time, we will, clearly, have textual data and use qualitative data analysis tools to identify underlying themes that emerge from the data. However, we would also record whether each speech was delivered by a Republican primary candidate or a Democrat primary candidate, probably by checking a box on our direct observation tool. In this case, the data we record is, in a sense, qualitative; it’s text, Republican or Democrat. When we analyze these data, though, we will most likely use quantitative data analysis tools, in this case, probably just to count the frequency of each value of the variable, political party. The choice between qualitative and quantitative data analysis tools, then, isn’t entirely about the type of data; it’s also determined by what we’re going to do with those data. If we’re performing numeric calculations, we use quantitative data analysis tools, and if we’re deriving and attributing meaning from and to words, we use qualitative data analysis tools. (Even that oversimplifies a little because of gray areas like content analysis, which is a very quantitative approach to qualitative data analysis, but we’ll leave it there.)

The processes of qualitative data analysis and quantitative data analysis differ as well. When we undertake quantitative data analysis, the concepts we’re measuring are almost always predetermined. We first decide to measure a concept like political literacy, then operationalize the concept by writing a list of quiz items, then collect our data, and, finally, tally our respondents’ scores—that is, conduct our quantitative data analysis—as an indicator of their political literacy. Conceptualization came first, analysis second. When we’re doing qualitative data analysis, though, this isn’t necessarily the case. If we want to conduct interviews to understand (in the verstehen sense, recall) what respondents believe it means to be politically literate, we may not know what concepts we’ll end up identifying—that’s why we’re doing the research. Certainly, we have some starting point—a formal theory, a model, a hunch, whatever we’ve learned from previous research—or we wouldn’t know what to ask questions about. It is during the course of data analysis, though, that important concepts emerge as we find patterns in our interview data. Thus, conceptualization and analysis are pursued iteratively; concepts are a starting point for data collection, consistent with our model of the research process, but concepts are also the product of qualitative data analysis.

Much more of the quantitative data analysis process is a settled matter than the qualitative data analysis process. There is only one way to calculate the sample standard deviation, and if you want to compare the means of two groups, there are nearly universally agreed upon rules to help you choose the appropriate statistical test. If you want to identify underlying themes in a political speech, though, there is not one right way to go about your analysis. There are many different qualitative data analysis camps, some complementary and some competing, and even within one camp, there is no expectation that qualitative data analysis would lead you and another researcher to precisely the same findings.

We’re not going to cover the “how to” of data analysis here. For that, I refer you to your introductory statistics and qualitative data analysis courses and textbooks. Most students reading this will also have an introductory statistics course. I think we do aspiring social science researchers a disservice by not also requiring a course in qualitative data analysis. Students find one final distinction appealing. The frank truth is that students can accomplish little high caliber research, by professional standards, using the quantitative data analysis tools learned in an introductory statistics course. There are exceptions, but the type of quantitative research that could be published in a social science journal generally requires more statistics training. In contrast, students can conduct excellent research using basic qualitative data analysis techniques—a lot of good work is done with the basic tools. You shouldn’t choose your data analysis methods based on this, of course, but you should be encouraged to know that qualitative data analysis skills are very accessible to students and can enable students to conduct strong research. A great starting point is David Thomas’s (2006) “A General Inductive Approach for Analyzing Qualitative Evaluation Data,” American Journal of Evaluation, 27(2), 237-246.

I find that students often show up in my research methods courses still just a little uncertain about inferential statistics, even if they’re fresh out of a statistics course. That’s not a criticism of the students or their statistics courses (sometimes it’s my own course!)—it’s a hard idea to grasp at first. If you’re one of those uncertain students, I offer a quick review of this data analysis approach in Appendix C.

One final note about data analysis: Incorporating control variables into data analysis often trips students up. Appendix D presents one way of approaching this called elaboration modeling. I like to introduce students to this strategy because its logic can be applied across a wide range of quantitative and qualitative data analysis scenarios, and it helps students better learn the concept of control as well.

Search

Text Color

Text Size

Margin Size

Font Type