Chapter 15 will teach you about interactions with linear regression.
Are you familiar with the term intersectionality? Intersectionality is about how our multiple identities interact to impact our unique experience in the social world. Our intersecting identities qualitatively change our experience so it is necessary to take these into account when analyzing the empirical world. It's also not just an additive change where you can put one identity and another together. They can interact in ways that produce qualitative different outcomes.
While the idea predates the term, intersectionality was originally coined by Kimberlé Crenshaw. An example Crenshaw uses to explain the term is the course case of DeGraffenreid vs. General Motors court case in the 1970s. Historically women could work at General Motors as office staff and men could work at General Motors in manufacturing. However, only white women were allowed as office staff. When this policy changed, black women entered the workforce. However, when layoffs came around, black women were the first to be laid off, and when it came time for promotions, they were the last to be promoted. General Motors claimed this was because of their lack of seniority. Five black women sued, noting that the reason for their lack of seniority was because they were not allowed to work at General Motors previously due to their status as black women. They believed this constituted discrimination against them in both promotion and layoff policies. The courts ruled in favor of General Motors, saying they would not create a super-remedy and that the plaintiffs had failed to prove gender discrimination, since (white) women had been able to previously work in the front office, or racial discrimination, since (black) men had previously been able to work in manufacturing. Crenshaw’s point in sharing this case is that it was not race or gender alone that impacted these employees; it was the intersection of race and gender–the status of being a black woman–that meant they could not gain earlier employment.
Earlier we looked at gender and race pay gaps for full-time civilian workers, using U.S. Census data (ACS data for 2017 to 2021).
Here is the coefficients output and a graph of a multivariate regression with both gender and race as IVs.
(The p-values were also all significant at p<0.001 with weighting off.)
Altogether, gender and race explain 4.7% percent of variance in income among these full-time civilian worker respondents. We can see that gender has a particular effect (on average, women's income was about $21,700 less) regardless of race / while holding race constant / for any particular race, and that race has a particular effect (on average, compared to NH White, NH Black only $26K less, NH other/multi $12K less, Hispanic $30K less, and NH API $12K more) regardless of gender / holding gender constant / for men and for women.
However, does gender have the same effect across racial groups? Does race have the same effect for men and women? Exploring their intersection can help us answer that.
Here's some results from a new regression that can answer that, this time including "interactions" between race and gender. While we did not add any new demographics, now race and gender and their interaction explain 5.0% percent of variance in income among our respondents.
(The p-values were also all significant at p<0.001 with weighting off.)
Before, gender had a uniform gap for each racial group, and race had a uniform gap for each gender, but now we can see in the graph above that subgroups are not uniform. While on average a NH White Woman earns 71 cents for every dollar earned by a NH White Man, a NH Black Woman earns 88 cents for every dollar earned by a NH Black Man (and 54 cents for every dollar earned by a NH White Man). There is not simply a gender pay gap and a race pay gap. You can't take the two gaps and add them together, or just control for the other like we did in the multivariate model that did not include the interaction. Gender and race intersect to create different experiences and outcomes for different subgroups.
Here are a couple graphs of intersetional wage gaps from the AAUW, using median income. You can read more about its causes at the link as well.
Traditionally intersectionality is used to look at social identities to help us understand systems of oppression. But in statistics, we can broaden this concept to look at the intersection of any variables and how that intersection may influence our dependent variable. While region and age might both impact cereal preferences, while controlling for the other, it could be that the intersection or region and age gives us improved information, perhaps older people from a particular region have a particular cereal preference that younger people in that region don't have and older people in other regions don't have.
In statistics, the variables we create showing the intersection of two variables are called interaction variables and the outcomes are interaction effects.
A statistical interaction occurs when the effects of one predictor variable varies based on another predictor variable.
Interaction variables are computed by multiplying the two independent variables together. The two independent variables, such as race and gender, are the parent variables, or main effect variables, and their product is the interaction variable(s). Generally you will not see codes with values listed for interactions. You can also create three-way interactions (e.g., race*gender*sexuality), but these get more complicated, and depending on your variables you have to be careful about the sample size of your subgroups. In this chapter we are sticking to two-way interactions (interactions with two parent variables).
SPSS: Run multivariate regressions with interactions the same way you ran multivariate regressions in Chapter 14. This time, just make sure to include the parent variables and the interaction variable as your independent variables. This isolates the effect of the interaction variable so you can see if it makes a difference. If you don't include the parent variables, the interaction will also be showing you some of the main effects of the parent variables, so you won't know if the interaction matters or what its independent effect might be.
When we test for interactions, we are testing to see if the effect of one independent variable varies among subgroups categorized by a second independent variable. While you can use your SPSS output to figure out if interactions are significant, beyond that, interpreting interactions from your SPSS output is not intuitive. You can't just look at the interaction slopes without also considering the slopes of the main effects, and the coding of multiplied variables is complicated. If you start to calculate predicted values, you can start to observe what the interaction's story is, but the easiest way to figure out its pattern is to graph the interaction. Once you've graphed an interaction, you can figure out its story.
Don't try to tell a story about your parent/main effect or your interaction variable slopes without also including the others simultaneously.
Remember that the overall model significance in the ANOVA output is about whether collectively all of your IVs improve your predictions of your DV. The p-value in the ANOVA indicates whether the overall model is significant, not whether any specific variables, including the interaction, are significant. To figure out if the interaction is significant, you have to go to the coefficients output and look at the p-value on the row with the interaction variable. (And this will only mean there is a significant interaction if you also remembered to include your parent/main effect variables in the regression.)
Using the 2022 General Social Survey (representative of U.S. adults), let's take a look at how political party and income influence perspectives on the need for government action regarding income inequality.
Our dependent variable asks:
Some people think that the government in Washington ought to reduce the income differences between the rich and the poor, perhaps by raising the taxes of wealthy families or by giving income assistance to the poor. Others think that the government should not concern itself with reducing this income difference between the rich and the poor. Here is a card with a scale from 1 to 7. Think of a score of 1 as meaning that the government ought to reduce the income differences between rich and poor, and a score of 7 meaning that the government should not concern itself with reducing income differences. What score between 1 and 7 comes closest to the way you feel?
I re-coded this so that -3 is "the government should not concern itself with reducing income differences," 0 is neutral, and +3 is "the government should reduce income differences."
Political party asks, "Generally speaking, do you usually think of yourself as a Republican, Democrat, Independent, or what?" I re-coded this:
- -3 Strong Republican
- -2 Not very strong Republican
- -1 Independent, close to Republican
- 0 Independent (neither, no response); Other party
- 1 Independent, close to Democrat
- 2 Not very strong Democrat
- 3 Strong Democrat
For income, I used respondent's estimated indivdiual income, coded in thousands of dollars.
Here is the syntax I used with SPSS. Note how it's the same as for multivariate regression, just with one additional independent variable (the interaction variable):
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/STATISTICS COEFF OUTS CI(95) R ANOVA
/METHOD=ENTER PARTYID RIncomeK PartyIDRIncomeKInteraction.
Here is the coefficients output:
If you were to write a linear regression equation, it would be:
y=0.67 + 0.49P + -0.004I + 0.001(interaction)
The interaction is party multiplied by income, so if you wanted to make predictions of the dependent variable based on your predictors, you could also write it: y=0.67 + 0.49(party) + -0.004(income) + 0.001(party)(income)
Notice that party and income are in the equation twice. That's why you can't interpret one of the slopes or variables without considering the others.
Here's a summary display table with bivariate models, a multivariate model with both IVs, and a multivariate model that also includes the interaction. The interaction just gets its own row as its own independent variable.
Here is a graph of the multivariate additive model (just the two IVs without the interaction). Notice how the lines are all parallel. The effect of income is the same across party, and the differences for party are the same across income levels:
Here's a graph of the interaction model. What do you notice? What's the story?
Democrats are supportive of the government tackling income inequality. Independents start out slightly supportive and by high income levels neutral to a smidgen unsupportive. Republicans are not supportive, but low-income Republicans are only slightly unsupporitve whereas high-income Republicans are very unsupportive.
You can see that, as income increases, support for government reducing income inequality decreases ---- among independents and among Republicans. However, among strong Democrats, their support does not change much regardless of income.
The template for making the interaction graph looks pretty similar to the last one (though the formulas embedded are a little different). There is just one more row with the intercept and slopes for your interaction slope.
Interactions can have varying effects. If you graph two lines, you might see:
- Two parallel lines (no interaction effect)
- Two lines that both have positive (or negative) slopes, but one line has a steeper slope than another.
- One line with a negative slope and one line with a positive slope.
- One line has a positive or negative slope (there is a relationship with the DV), but the other line is horizontal (with a slope of zero, no relationship).
Last chapter we looked at U.S. eligible voters' evaluations of Trump's handling of immigration based on their policy positions on immigration and their feelings towards Trump, using ANES 2020 pre-election data.
Here it is again, this time with an interaction between policy positions on immigration and their feelings towards Trump
If you look at the coefficients output, you can see that the interaction is not significant (or is marginally significant, p=0.075, so <0.1 but not <0.05).
We are not confident that among all U.S. eligible voters there is a significant interaction. We know these other two things matter, even while controlling for the other, but we are not confident that this intersection matters (e.g., the main effects don't differ among subgroups, e.g., on average, among U.S. eligible voters leading up to the 2020 election, immigration policy views have the same effect while controlling for feelings towards Trump among people with different feelings towards Trump, and feelings towards Trump has the same effect while controlling for policy positions even among people with different policy positions). We don't need to consider the interaction for this model and could just use the multivariate model from Chapter 14 without losing explanatory/prediction power.
Here's the graph for respondents. You can see that the lines are almost parallel. The graph looks very similar to the graph from Chapter 14 when there was no interaction term included.