2.3: Case Selection (Or, How to Use Cases in Your Comparative Analysis)

Last updated
Save as PDF

Page ID: 135832

Dino Bozonelos, Julia Wendt, Charlotte Lee, Jessica Scarffe, Masahiro Omae, Josh Franco, Byran Martin, & Stefan Veldhuis
Victor Valley College, Berkeley City College, Allan Hancock College, San Diego City College, Cuyamaca College, Houston Community College, and Long Beach City College via ASCCC Open Educational Resources Initiative (OERI)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Learning Objectives

By the end of this section, you will be able to:

Discuss the importance of case selection in case studies.
Consider the implications of poor case selection.

Introduction

Case selection is an important part of any research design. Deciding how many cases, and which cases, to include, will clearly help determine the outcome of our results. If we decide to select a high number of cases, we often say that we are conducting large-N research. Large-N research is when the number of observations or cases is large enough where we would need mathematical, usually statistical, techniques to discover and interpret any correlations or causations. In order for a large-N analysis to yield any relevant findings, a number of conventions need to be observed. First, the sample needs to be representative of the studied population. Thus, if we wanted to understand the long-term effects of COVID, we would need to know the approximate details of those who contracted the virus. Once we know the parameters of the population, we can then determine a sample that represents the larger population. For example, women make up 55% of all long-term COVID survivors. Thus, any sample we generate needs to be at least 55% women.

Second, some kind of randomization technique needs to be involved in large-N research. So not only must your sample be representative, it must also randomly select people within that sample. In other words, we must have a large selection of people that fit within the population criteria, and then randomly select from those pools. Randomization would help to reduce bias in the study. Also, when cases (people with long-term COVID) are randomly chosen they tend to ensure a fairer representation of the studied population. Third, your sample needs to be large enough, hence the large-N designation for any conclusions to have any external validity. Generally speaking, the larger the number of observations/cases in the sample, the more validity we can have in the study. There is no magic number, but if using the above example, our sample of long-term COVID patients should be at least over 750 people, with an aim of around 1,200 to 1,500 people.

When it comes to comparative politics, we rarely ever reach the numbers typically used in large-N research. There are about 200 fully recognized countries, with about a dozen partially recognized countries, and even fewer areas or regions of study, such as Europe or Latin America. Given this, what is the strategy when one case, or a few cases, are being studied? What happens if we are only wanting to know the COVID-19 response in the United States, and not the rest of the world? How do we randomize this to ensure our results are not biased or are representative? These and other questions are legitimate issues that many comparativist scholars face when completing research. Does randomization work with case studies? Gerring suggests that it does not, as “any given sample may be widely representative” (pg. 87). Thus, random sampling is not a reliable approach when it comes to case studies. And even if the randomized sample is representative, there is no guarantee that the gathered evidence would be reliable.

One can make the argument that case selection may not be as important in large-N studies as they are in small-N studies. In large-N research, potential errors and/or biases may be ameliorated, especially if the sample is large enough. This is not always what happens, errors and biases most certainly can exist in large-N research. However, incorrect or biased inferences are less of a worry when we have 1,500 cases versus 15 cases. In small-N research, case selection simply matters much more.

This is why Blatter and Haverland (2012) write that, “case studies are ‘case-centered’, whereas large-N studies are ‘variable-centered’". In large-N studies we are more concerned with the conceptualization and operationalization of variables. Thus, we want to focus on which data to include in the analysis of long-term COVID patients. If we wanted to survey them, we would want to make sure we construct questions in appropriate ways. For almost all survey-based large-N research, the question responses themselves become the coded variables used in the statistical analysis.

Case selection can be driven by a number of factors in comparative politics, with the first two approaches being the more traditional. First, it can derive from the interests of the researcher(s). For example, if the researcher lives in Germany, they may want to research the spread of COVID-19 within the country, possibly using a subnational approach where the researcher may compare infection rates among German states. Second, case selection may be driven by area studies. This is still based on the interests of the researcher as generally speaking scholars pick areas of studies due to their personal interests. For example, the same researcher may research COVID-19 infection rates among European Union member-states. Finally, the selection of cases selected may be driven by the type of case study that is utilized. In this approach, cases are selected as they allow researchers to compare their similarities or their differences. Or, a case might be selected that is typical of most cases, or in contrast, a case or cases that deviate from the norm. We discuss types of case studies and their impact on case selection below.

Types of Case Studies: Descriptive vs. Causal

There are a number of different ways to categorize case studies. One of the most recent ways is through John Gerring. He wrote two editions on case study research (2017) where he posits that the central question posed by the researcher will dictate the aim of the case study. Is the study meant to be descriptive? If so, what is the researcher looking to describe? How many cases (countries, incidents, events) are there? Or is the study meant to be causal, where the researcher is looking for a cause and effect? Given this, Gerring categorizes case studies into two types: descriptive and causal.

Descriptive case studies are “not organized around a central, overarching causal hypothesis or theory” (pg. 56). Most case studies are descriptive in nature, where the researchers simply seek to describe what they observe. They are useful for transmitting information regarding the studied political phenomenon. For a descriptive case study, a scholar might choose a case that is considered typical of the population. An example could involve researching the effects of the pandemic on medium-sized cities in the US. This city would have to exhibit the tendencies of medium-sized cities throughout the entire country. First, we would have to conceptualize what we mean by ‘a medium-size city’. Second, we would then have to establish the characteristics of medium-sized US cities, so that our case selection is appropriate. Alternatively, cases could be chosen for their diversity. In keeping with our example, maybe we want to look at the effects of the pandemic on a range of US cities, from small, rural towns, to medium-sized suburban cities to large-sized urban areas.

Causal case studies are “organized around a central hypothesis about how X affects Y” (pg. 63). In causal case studies, the context around a specific political phenomenon or phenomena is important as it allows for researchers to identify the aspects that set up the conditions, the mechanisms, for that outcome to occur. Scholars refer to this as the causal mechanism, which is defined by Falleti & Lynch (2009) as “portable concepts that explain how and why a hypothesized cause, in a given context, contributes to a particular outcome”. Remember, causality is when a change in one variable verifiably causes an effect or change in another variable. For causal case studies that employ causal mechanisms, Gerring divides them into exploratory case-selection, estimating case-selection, and diagnostic case-selection. The differences revolve around how the central hypothesis is utilized in the study.

Exploratory case studies are used to identify a potential causal hypothesis. Researchers will single out the independent variables that seem to affect the outcome, or dependent variable, the most. The goal is to build up to what the causal mechanism might be by providing the context. This is also referred to as hypothesis generating as opposed to hypothesis testing. Case selection can vary widely depending on the goal of the researcher. For example, if the scholar is looking to develop an ‘ideal-type’, they might seek out an extreme case. An ideal-type is defined as a “conception or a standard of something in its highest perfection” (New Webster Dictionary). Thus, if we want to understand the ideal-type capitalist system, we want to investigate a country that practices a pure or ‘extreme’ form of the economic system.

Estimating case studies start with a hypothesis already in place. The goal is to test the hypothesis through collected data/evidence. Researchers seek to estimate the ‘causal effect’. This involves determining if the relationship between the independent and dependent variables is positive, negative, or ultimately if no relationship exists at all. Finally, diagnostic case studies are important as they help to “confirm, disconfirm, or refine a hypothesis” (Gerring 2017). Case selection can also vary in diagnostic case studies. For example, scholars can choose an least-likely case, or a case where the hypothesis is confirmed even though the context would suggest otherwise. A good example would be looking at Indian democracy, which has existed for over 70 years. India has a high level of ethnolinguistic diversity, is relatively underdeveloped economically, and a low level of modernization through large swaths of the country. All of these factors strongly suggest that India should not have democratized, or should have failed to stay a democracy in the long-term, or have disintegrated as a country.

Most Similar/Most Different Systems Approach

The discussion in the previous subsection tends to focus on case selection when it comes to a single case. Single case studies are valuable as they provide an opportunity for in-depth research on a topic that requires it. However, in comparative politics, our approach is to compare. Given this, we are required to select more than one case. This presents a different set of challenges. First, how many cases do we pick? This is a tricky question we addressed earlier. Second, how do we apply the previously mentioned case selection techniques, descriptive vs. causal? Do we pick two extreme cases if we used an exploratory approach, or two least-likely cases if choosing a diagnostic case approach?

Thankfully, an English scholar by the name of John Stuart Mill provided some insight on how we should proceed. He developed several approaches to comparison with the explicit goal of isolating a cause within a complex environment. Two of these methods, the 'method of agreement' and the 'method of difference' have influenced comparative politics. In the 'method of agreement' two or more cases are compared for their commonalities. The scholar looks to isolate the characteristic, or variable, they have in common, which is then established as the cause for their similarities. In the 'method of difference' two or more cases are compared for their differences. The scholar looks to isolate the characteristic, or variable, they do not have in common, which is then identified as the cause for their differences. From these two methods, comparativists have developed two approaches.

Book cover of John Stuart Mill's A System of Logic, Ratiocinative and Inductive, 1843 — Figure \(\PageIndex{1}\): Book cover of A System of Logic, Ratiocinative and Inductive. John Stuart Mill developed several approaches to comparison: “method of agreement” and “method of difference”. (Source: Mill, J.S. (1843). A System of Logic, Ratiocinative and Inductive. University of Toronto Press.)

What Is the Most Similar Systems Design (MSSD)?

This approach is derived from Mill’s ‘method of difference’. In a Most Similar Systems Design Design, the cases selected for comparison are similar to each other, but the outcomes differ in result. In this approach we are interested in keeping as many of the variables the same across the elected cases, which for comparative politics often involves countries. Remember, the independent variable is the factor that doesn’t depend on changes in other variables. It is potentially the ‘cause’ in the cause and effect model. The dependent variable is the variable that is affected by, or dependent on, the presence of the independent variable. It is the ‘effect’. In a most similar systems approach the variables of interest should remain the same.

A good example involves the lack of a national healthcare system in the US. Other countries, such as New Zealand, Australia, Ireland, UK and Canada, all have robust, publicly accessible national health systems. However, the US does not. These countries all have similar systems: English heritage and language use, liberal market economies, strong democratic institutions, and high levels of wealth and education. Yet, despite these similarities, the end results vary. The US does not look like its peer countries. In other words, why do we have similar systems producing different outcomes?

What Is the Most Different Systems Design (MDSD)?

This approach is derived from Mill’s ‘method of agreement’. In a Most Different System Design, the cases selected are different from each other, but result in the same outcome. In this approach, we are interested in selecting cases that are quite different from one another, yet arrive at the same outcome. Thus, the dependent variable is the same. Different independent variables exist between the cases, such as democratic v. authoritarian regime, liberal market economy v. non-liberal market economy. Or it could include other variables such as societal homogeneity (uniformity) vs. societal heterogeneity (diversity), where a country may find itself unified ethnically/religiously/racially, or fragmented along those same lines.

A good example involves the countries that are classified as economically liberal. The Heritage Foundation lists countries such as Singapore, Taiwan, Estonia, Australia, New Zealand, as well as Switzerland, Chile and Malaysia as either free or mostly free. These countries differ greatly from one another. Singapore and Malaysia are considered flawed or illiberal democracies (see chapter 5 for more discussion), whereas Estonia is still classified as a developing country. Australia and New Zealand are wealthy, Malaysia is not. Chile and Taiwan became economically free countries under the authoritarian military regimes, which is not the case for Switzerland. In other words, why do we have different systems producing the same outcome?