3.4: Exercise- “Bad” Data
-
- Last updated
- Save as PDF
So far, we’ve been looking at really clean data. However, the reality of ERP research (and most areas of human neuroscience) is that you often get some participants with really noisy data. And in some areas, noisy data is the norm (and large numbers of participants are needed to make up for it. For example, imagine trying to record the EEG from wiggly 2-year-olds. You’d get all kinds of movement artifacts, and they won’t sit through an hour of data collection the way a paid adult will. But you’ll also see some noisy in studies of calm, compliant adults. So, no matter what kind of ERP research you’re interested in, you’ll probably need to learn to deal with noisy data.
In this exercise, we’ll look at one of the 40 participants in the full N400 study whose data were problematic (Subject 30). This participant wasn’t horrible—all of our participants were college students who were pretty compliant with our instructions, and we know a lot of tricks for optimizing the data quality in EEG recordings (see Farrens et al., 2019 for a detailed description of our EEG recording protocol). However, the data from this participant were problematic in a way that we often see in our college student population.
You can find this participant’s data in the folder named Bad_Subject inside the Chapter_3 folder. I’ve already preprocessed the EEG and made the averaged ERPs, so you don’t need to go through those steps. The folder contains the original EEG dataset file, the EEG dataset file after all preprocessing steps (including artifact detection), and the averaged ERPset file.
Start by loading the averaged ERP data from this participant ( EEGLAB > ERPLAB > Load existing ERPset ) and plotting Bins 3 and 4 ( EEGLAB > ERPLAB > Plot ERP > Plot ERP waveforms ). You should see a very noisy waveform for the related target words, but the waveform for the unrelated target words is missing. If you look at the aSME data quality metric ( EEGLAB > ERPLAB > Data Quality options > Show Data Quality measures in a table ), you’ll find an aSME value of 0 for every channel at every time point for Bin 3 (related targets), and a value of NaN for Bin 4 (unrelated targets). NaN is an abbreviation for not a number , and it’s what Matlab uses when something can’t be computed (e.g., when it requires dividing by zero).
Now plot the ERP waveforms for the prime words (Bins 1 and 2) and look at the aSME values for these words. The waveforms are noisy, and the aSME values are higher than those for the 10 participants you processed in the previous exercises. But at least it looks like there is valid data for these bins.
Your job now is to figure out what has gone wrong with Bins 3 and 4 for this participant. In Chapter 2, I made a point of describing several checks that you should perform while processing a participant’s data (see summary of steps in Section 2.12). Section 3.4 of the present chapter describes some additional checks. Go through these checks to figure out what went wrong with this participant. Once you’ve done it, you can read the text box below to make sure your answer was correct (but no peaking until you’ve figured it out for yourself!).
I hope you’ve now figured out the problem with Subject 30. I included this example to drive home a point that I made in Chapter 2, namely that you really need to pay close attention when you’re initially processing each participant’s data. Don’t just run a script and hope for the best. Look at the number of event codes, the number of accepted and rejected trials, the continuous EEG, and the epochs that were marked by the artifact detection process. If you don’t, your data will be filled with C.R.A.P. (which is an acronym for Commonly Recorded Artifactual Potentials , but also refers to a variety of other problems, such as incorrect event codes). And as they say: garbage in, garbage out . So, if you want your experiments to yield robust, statistically significant, and accurate results, pay close attention to the data!
What’s wrong with Subject 30?
If you load the ERPset for Subject 30 and look at ERP.ntrials, you’ll see that there was only one accepted trial in Bin 3 and there were zero accepted trials in Bin 4. And if you load one of the EEG dataset files and look at the EEG, you’ll see that this participant blinked a lot. In particular, the participant blinked right around the time of the buttonpress response (event code 201) on almost every trial. As a result, the ERP waveform for Bin 3 was based on an “average” of only one trial, and the aSME value was zero. Bin 4 had no trials, so no ERP waveform could be plotted for that bin, and the aSME value was not a number (NaN). Well over half the trials were also rejected in Bins 1 and 2, and the data were just generally noisy for this participant. That’s why the aSME values were bad even for Bins 1 and 2.
When you loaded the ERPset for Subject 30 into ERPLAB, the fact that there were no trials in Bin 4 led to a warning message that was printed in red text in the command window (WARNING: bin #4 has flatlined ERPs). You can probably still see it if you scroll up. You probably didn’t notice it when it first happened, because it probably scrolled off the screen before you could see it. When you run into a problem (like a bin that doesn’t appear to plot properly), you should look at the command window (scrolling up if necessary) to see if any warning or error messages were printed. That can help you find problems like this.
So, what can we do about this subject? In the published version of the N400 ERP CORE experiment, we used artifact correction instead of artifact rejection to deal with blinks. That is, we used a procedure called independent component analysis to estimate and remove the part of the signal that was caused by blinking. We rejected trials with blinks only if the blinks happened near time zero, indicating that the eyes were closed when the word was presented (which was rare). Consequently, we were able to include almost all the trials from every participant in our averaged ERP waveforms.