11.16: Exercise- Building an Entire EEG Processing Pipeline

Last updated
Save as PDF

Page ID: 137746

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

You’ve now learned the basics of scripting, so we’re ready to build an entire EEG preprocessing pipeline. This pipeline will execute all the steps prior to averaging. It closely matches the example pipeline described in Appendix 3. Here, we’ve divided the pipeline into five separate scripts:

Step1_pre_ICA_processing.m- Performs the initial preprocessing steps prior to ICA-based artifact correction
Step2_ICA_phase1.m- Creates an optimized dataset for the ICA decomposition
Step3_ICA_phase2.m- Performs the ICA decomposition
Step4_ICA_phase3.m- Transfers the ICA weights from the optimized dataset to the original dataset and reconstructs the data without the artifactual ICs
Step5_post_ICA_processing.m- Performs the steps following artifact correction that are necessary prior to averaging

I encourage you to divide your EEG and ERP processing into multiple scripts in this manner. A general principle of good programming is to divide a complex job into a set of small, independent modules. This makes each part of the job simpler and less error-prone. It also makes it easier for you to find and fix problems. And it makes it easier for you to reuse your code for future experiments. Ideally, none of your scripts should be more than about 200 lines long (including comments). If you find a script is getting a little long, try to figure out how to break it up into a sequence of smaller scripts.

Step1_pre_ICA_processing.m is similar to Script5.m from the previous exercise, but with two major changes. First, interpolation has been removed from this script and moved to a later stage, following artifact correction. Second, the new script references the data to O2 rather than to the average of all sites. This is because using the average of all sites as a reference makes ICA complicated (see the chapter on artifact correction). After the ICA step, we’ll re-reference to the average of all sites. By the way, I could have used any site as the reference at this initial stage. See the following text box for more information.

Recognizing a Conceptual Error

When I first started writing these scripts, I referenced the data to the average of all sites at the beginning of the pipeline (because this is standard reference for the N170). I had gotten through the stage of performing the ICA decomposition, and I started going through the data to determine which ICs should be removed (using the process described in the chapter on artifact correction). The ICs from the first participant looked okay but not great. The time course of one of the top ICs had a lot of weird high-frequency noise that I wasn’t seeing in the EEG. The ICs from the second participant were even worse, with the top two ICs showing lots of high-frequency noise, and the eye movements distributed across three ICs. I was starting to get suspicious. When I looked at the ICs for the third participant, the blinks were spread across the top four ICs, and two of them again had a ton of weird high-frequency noise.

I then asked myself what I was doing differently from before, and then I realized that I was now referencing to the average of all sites prior to the decomposition. I then changed the scripts to use O2 as the reference, re-ran the ICA decomposition, and then everything worked better.

The moral of the story is that you may get occasional participants for whom the ICs don’t look great, but if you see more than one or two, you need to think through your process and figure out what’s going wrong. The reference is one possible problem. Another common problem is an insufficient recording duration (especially if you have >64 channels). A third common problem is huge C.R.A.P. that hasn’t been eliminated prior to the decomposition.

Step2_ICA_phase1.m implements the procedures described in Chapter 9 for creating datasets that are optimized for the ICA decomposition, including downsampling to 100 Hz, eliminating breaks and other periods of huge C.R.A.P., and implementing an aggressive high-pass filter. It’s similar to one of the example scripts at the end of Chapter 9 (MMN_artifiact_correction_phase3.m). Open Step2_ICA_phase1.m and take a look at it.

One very important element of this script is that it assumes you’ve already gone through the EEG to determine the parameters that you will use for finding huge C.R.A.P. with the Artifact rejection (continuous EEG) routine and stored these parameters in a spreadsheet named ICA_Continuous_AR.xlsx. I’ve already done this for you. To determine these parameters, I first commented out the part of the script that performs the continuous artifact rejection, and then I ran the code at the end of the script for loading the datasets into the EEGLAB GUI. These datasets have been downsampled and aggressively filtered, and I wanted to see what the artifacts looked like in these datasets because they will be used for the continuous artifact rejection. Two of the subjects had some large C.R.A.P., and I ran the Artifact rejection (continuous EEG) routine from the GUI for these subjects to figure out the best rejection criteria. As discussed in the chapter on artifact correction, you really need to look carefully at the data when setting these parameters if you want ICA to work well. The spreadsheet also indicates which channels to include. I’ve left out the EOG channels and Fp1/Fp2 so that ordinary ocular artifacts don’t get deleted. I’ve also left out any channels that will be left out of the ICA decomposition and interpolated later. Crazy periods in these channels won’t influence the decomposition, so we don’t need to delete them.

If you look at the code for reading these parameters from the spreadsheet, you’ll see that it’s much like the code for reading the parameters for interpolating bad channels in the previous exercise, except that we have different columns labels in the spreadsheet.

You’ll also see that the script calls the pop_erplabDeleteTimeSegments routine to delete the periods of time during the breaks, which also helps get rid of large C.R.A.P. Note that the parameters that control this routine are defined as variables at the top of the script, following good programming practice.

Once you’ve looked through the script to see how it works, go ahead and run it. You’ll see that it creates a new dataset file for each participant with _optimized at the end of the filename. You can also load the new datasets into the EEGLAB GUI by running the bit of code at the end of the script. This allows you to see what the optimized datasets look like and make sure everything worked properly.

Step3_ICA_phase2.m runs the ICA decomposition process on the dataset created by Step2_ICA_phase1.m. It assumes that an Excel spreadsheet named interpolate.xlsx has already been created to indicate which channels will be interpolated after correction has been performed and should therefore be excluded from the ICA decomposition. I’ve already created this file for you. Note that the decomposition process is quite slow, so this script takes a long time to run.

Step4_ICA_phase3.m takes the ICA weights in the optimized dataset and transfers them back to the pre-optimization dataset. It then removes the artifactual ICs, which are listed in a file named ICs_to_Remove.xlsx. I had to determine which ICs were artifactual by looking at the IC scalp maps and by comparing the IC time courses with the EEG/EOG time courses (as described in the chapter on artifact correction). To make this easier, I used the bit of code at the end of Step3_ICA_phase2.m to load the datasets into the EEGLAB GUI.

I didn’t spend a lot of time making careful decisions about which ICs to remove (and making sure that the ICA decomposition was truly optimal). For example, Subject 1 still has some blink activity remaining in F4 after the correction. It’s really boring to spend many hours in a row getting the ICA perfect for a large set of participants! This is one more reason why you should do the initial preprocessing of each participant within 48 hours of data collection. It’s a lot easier to spend the time required to optimize the ICA when you’re only doing it for one participant at a time and don’t have to spend an entire day processing the data from 20 participants.

The last script is Step5_post_ICA_processing.m, which performs the steps following artifact correction that must be executed prior to averaging. This includes re-referencing to the average of all sites (and putting the channels into a more useful order), performing interpolation for any bad channels, adding an EventList, assigning events to bins with BINLISTER, epoching the data, and performing artifact detection. The script also prints a summary of the overall proportion of trials marked for rejection in each participant and creates an Excel file with this information broken down by bin.

Open the script and take a look. You’ll see that the first part of the script (prior to the loop) defines and opens a set of files with the artifact detection parameters. We’ve already corrected for blinks, so we only want to flag epochs with blinks that occurred at a time that might interfere with the perception of the stimulus. Eye movements aren’t typically an issue in this paradigm because the stimuli are presented briefly in the center of the display, but we flag trials with eye movements that might interfere with the perception of the stimulus (which were quite rare). We use the uncorrected bipolar channels for the blink and eye movement detection. We also flag trials with large C.R.A.P. in any of the EEG channels (using both an absolute voltage threshold and a moving window peak-to-peak amplitude algorithm). Each of these artifact detection routines uses a different flag so that we can keep track of the number of trials flagged for each type of artifact.

The top portion of the script also pre-allocates a set of arrays that will be used to store the number of accepted and rejected trials for each participant. This pre-allocation isn’t strictly necessary, but it’s good programming practice—it makes it clear what the dimensions of the arrays are.

The body of the subject loop loads the dataset created by the previous script, interpolates any bad channels, re-references the data, creates the EventList, runs BINLISTER, and epochs the data. These steps are pretty straightforward.

The next set of lines reads the parameters for blink detection from a spreadsheet and then runs the step function algorithm to find blinks that occurred between -200 and +300 ms. These lines use the same “magic code” that I used to grab parameters from spreadsheets in the previous scripts. I didn’t spend a lot of time setting the artifact detection parameters—you could do a better job if you spent some time using the strategies described in the chapter on artifact rejection. Once the parameters have been extracted from the spreadsheet, the artifact detection routine is called. We then repeat this process for the eye movement and C.R.A.P. artifacts.

Each of these artifact detection steps adds to the flags set by the previous step. At the end, we save the dataset to the hard drive. It’s now ready for averaging!

After the end of the loop, I added some code to grab information about the number of trials that were flagged for each participant. The code demonstrates how to save this information in Matlab’s special Table format, which then makes it easy to save the information as a spreadsheet.

As usual, the end of the script has some code after the return statement that you can use to load the datasets into the EEGLAB GUI.

Whew! That’s a lot of code. But I hope it shows you how to break a complex sequence of processing steps into a set of relatively simple modules. And I hope it also demonstrates the process of going back and forth between the GUI (to set various participant-specific parameters and make sure the data look okay) and scripts (which are much faster, especially when you need to reprocess the data multiple times).

One last thing: Once you’ve created a set of scripts that perform the different processing stages for your experiment, you can create a “master script” that simply calls the scripts for each stage. Then you can execute all the stages by calling this one script.