8.4: Background- The General Approach

Last updated
Save as PDF

Page ID: 137770

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The general artifact detection/rejection procedure is pretty straightforward. For each participant, you apply an artifact detection algorithm to the epoched EEG data. That algorithm determines which epochs contain artifacts, and those epochs are “marked” or “flagged”. When you compute averaged ERPs, those epochs are simply excluded from the averages.

There are two main classes of approaches to determining which epochs should be flagged. The approach I prefer involves knowing as much as possible about the nature of the artifacts (e.g., the typical waveshape and scalp distribution of a blink) and designing algorithms that are tailored to those artifacts. The other approach involves asking which epochs are extreme or unusual in a statistical sense. I don’t like this statistical approach as much because it’s not clear that “weird” epochs are necessarily problematic. How many movies have you seen about high school students in which the “popular” students rejected kids who seemed “weird” but were actually quite delightful? I just don’t like the idea of rejecting trials that seem “weird” but might actually be delightful.

I suspect that the two approaches actually end up mostly flagging the same epochs for rejection, so it may not matter which approach you use in the end. And the most important factor when deciding what approach to take is to have a clear understanding of the ultimate goal of artifact rejection. As described in the previous chapter, the ultimate goal is to accurately answer the scientific question that the experiment was designed to address. So, go ahead and use statistical approaches to flagging epochs for rejection if that leads you to this goal. Also, every area of research is different, so you should feel free to ignore any of my specific pieces of advice if you have a better way of accurately answering your scientific questions.

As described in detail in Chapter 6 of Luck (2014), I advocate setting the artifact detection parameters individually for each participant. In the present chapter, I will show you how to select appropriate parameters manually. There are also completely automated approaches to selecting the parameters (e.g., Jas et al., 2017; Nolan et al., 2010). I haven’t used those approaches myself, but they seem fairly reasonable. However, many people who use these approaches on a regular basis recommend verifying that the parameters are working well and not just accepting them blindly. So, these approaches end up not being fully automatic. An ERP Boot Camp participant, Charisse Pickron, suggested another excellent use for the automated algorithms: When you’re first learning to set artifact detection parameters, you can check your parameters against the automated parameters so that you have more confidence in the parameters that you’ve set.

Some participants have so many artifacts that an insufficient number of trials remains to create clean averaged ERP waveforms. The standard procedure is to exclude those participants from the final analyses. However, you must have an objective, a priori criterion for exclusion. Otherwise, you will likely bias your results (as explained in the text box below). In my lab’s basic science research, we always exclude participants if more than 25% of trials are rejected because of artifacts (aggregated across conditions). In our research on schizophrenia, where the data are noisier and the participants are much more difficult and expensive to recruit, we exclude participants if more than 50% of trials are rejected. We apply these criteria rigidly in every study, without fail. A different criterion might make sense in your research. Just make sure that the criterion is objective and determined before you see the data.

Although this chapter focuses on detecting and rejecting artifacts, I would like to encourage you to start thinking about artifacts before you record the EEG. This advice follows from something I call Hansen’s Axiom: “There is no substitute for clean data” (see Luck, 2014). It’s much better to minimize artifacts during the recording instead of trying to reject or correct them afterward. Strategies for minimizing artifacts are described in Chapter 6 of Luck (2014).

Excluding Participants is Dangerous!

Imagine that you run an experiment, and your key statistical analysis yields a p value of .06 (the most hated number in science!). You spent two years running the study, and the effect is going in the predicted direction, but you know you can’t publish it if the effect isn’t statistically significant. Given the millions of steps involved in an ERP experiment, you might go back through your data to make sure there wasn’t an error in the analysis. And imagine you find that 80% of the trials were rejected for one of the participants, leading to incredibly noisy data. You would (very reasonably) conclude that this participant should not have been included in the final analyses. So, you repeat the analyses without this participant, and now the p value is .03. Hallelujah! You can now publish this important study.

Now imagine that the p value was originally .03, so you have no reason to go back through all the data. And imagine that your final sample included a participant with 80% of trials rejected and very noisy data. And further imagine that excluding this participant would lead to a p value of .06. But because the effect was significant in the initial analysis, you had no reason to go back through the data, so you wouldn’t notice that this participant should have been excluded. And even if you did, would you really have the intestinal fortitude to exclude the participant, even though this means that your p value is now .06?

This example shows why you need an a priori criterion for excluding participants. If you decide whom to exclude after you’ve seen the results of the experiment, you’re more likely to notice and exclude participants when it makes your p value better (because it was >.05 before you excluded participants) than when it makes your p value worse (because you don’t notice participants who should be excluded when p < .05). As a result, this creates a bias to find p < .05 even when there is no true effect. So, you should develop an a priori criterion for excluding participants before you see the results.