Skip to main content
Social Sci LibreTexts

2.7: Emerging Wave- Big Data and Machine Learning

  • Page ID
    76177
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Learning Objectives

    By the end of this section, you will be able to:

    • Define big data and machine learning
    • Explaining how big data and machine learning are being used in political science

    Political science is a dynamic discipline because it is willing to borrow from other disciplines to improve its study of political actors, institutions, and processes. There are a couple of emerging waves that are changing the nature of scientific inquiry and of political science. Two waves that we want to highlight here are big data and machine learning.

    The human mind is not capable of sifting, sorting, and analyzing these growing datasets, but computers are. It is useful to note that up until the late 1980s and early 1990s, researchers had to calculate descriptive statistics and linear regressions by hand and with calculators. But over the last 20 years, technology has become widely available and access to software has increased. With both the hardware and software in the hands of more political scientists, we increase the range of exploration and knowledge generation that comes with analyzing political phenomena.

    Big data is defined as the mountain of information, in the form of petabytes and exabytes, that is being stored on computers and servers around the world. As computers proliferate, and our use of them for personal, organizational, corporate, and governmental use grows exponentially, the amount of information we are generating as a human society is exploding by leaps and bounds every single day. And there are concerns about what this means for society (Brady 2019). With growing mountains of data, some questions arise: How can we study it? How can we uncover patterns in the data? How can we derive new meanings and understandings from these data?

    Big Data is “big” because the amount of space it takes on a computer hard drive, but the techniques to analyze “Big Data” are available in computer programs political scientists have used for years to statistically analyze large data sets. SPSS, Stata, R, and Python are all staples of statistical data analysis software in the discipline.

    But, within the last decade, two major changes are revolutionizing the study of everything, from politics and economics to biology and chemistry. First, we have seen significant advances in computer hardware technology. Specifically, the advances in graphic processing units also known as GPUs have fundamentally changed our ability to analyze mountains of data. The short of the long is that computer processing units or CPUs have shrunk in size but have grown in computational power. Why do you think you can hold a computer in the palm of your hand? GPUs, working independently and in conjunction with CPUs, have tremendous computational power.

    Second, computer scientists have been developing new programming languages, mechanisms for programming collaboration, and pushing the boundaries of artificial intelligence. This is where our second wave of machine learning starts to emerge. As computer science has pushed the boundaries of software, given the advancements in CPUs and GPUs, it is pushing the boundaries of what software can do with respect to inputting, analyzing, and learning from data in the world around us. Machine learning is the ability of a computer program to start with an initial model data, analyze actual data, learn from this analysis, and automatically update that initial model to incorporate the findings from its analysis. Now, this doesn’t just happen once in the computer software is done, this cycle can happen iteratively thereby allowing the software to uncover categories, patterns, and meanings.

    What does this all mean for political science? Honestly, we don’t have an answer to that question. What we do know is that the next generation of political scientists will be leading efforts to utilize big data and machine learning to explain political behaviors, institutions, and processes. It’s an exciting time to be entering the field and the experiences you have, the questions that intrigue you, and the research will conduct will help build our knowledge of politics.


    This page titled 2.7: Emerging Wave- Big Data and Machine Learning is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Josue Franco, Charlotte Lee, Kau Vue, Dino Bozonelos, Masahiro Omae, & Steven Cauchon (ASCCC Open Educational Resources Initiative (OERI)) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.