11.15: Exercise- Preprocessing the EEG and Using a Spreadsheet to Store Subject-Specific Information
-
- Last updated
- Save as PDF
As described in earlier chapters, I strongly recommend that you go through each step of EEG processing manually, looking at the data from each step, before running a script. Actually, I don’t merely recommend it, I insist on it! This will allow you to catch problems that would otherwise contaminate your data. It will also let you determine some subject-specific parameters, such as which channels should be interpolated and what artifact detection parameters should be used. But if some parameters differ across subjects, how can you use one script that processes the data from all the subjects?
The answer is to store the subject-specific parameters in a spreadsheet, and then have your script read the parameters from that spreadsheet. This exercise is designed to show you how this works in the context of interpolating bad channels and other preprocessing steps. The interpolation process won’t work properly if there are big voltage offsets in the data, so we’ll apply a high-pass filter that eliminates these offsets prior to interpolation. Also, we need to figure out the 3-dimensional locations of the electrodes, because this information is needed by the interpolation routine (which uses the distance between the to-be-interpolated electrode and the other electrodes to compute sensible, distance-weighted values).
I’m not going to go through the process of running these routines in the GUI and looking at the history to see the corresponding Matlab code. I assume you now understand that process, so I’ll go directly to the code. You can look up the help information for these routines if you want a better understanding of the available options.
Start by quitting EEGLAB, typing clear all , and loading Script5.m . Let’s take a look at the script before running it. The first thing to notice is a variable named Interpolation_filename that holds the name of an Excel file, interpolate.xlsx . This file contains information about which channels are bad and should be interpolated. Take a look at the file in Excel (or import it into Google Sheets or some other spreadsheet program). Here’s what you should see:
|
ID |
Bad_Channels |
Ignored_Channels |
Channel_Names |
|---|---|---|---|
|
1 |
[6 13] |
[31 32 33] |
C5, Oz |
|
2 |
[31 32 33] |
||
|
3 |
[25] |
[31 32 33] |
P8 |
|
4 |
[31 32 33] |
||
|
6 |
[31 32 33] |
||
|
7 |
[31 32 33] |
||
|
8 |
[31 32 33] |
||
|
9 |
[31 32 33] |
||
|
10 |
[31 32 33] |
The first column contains the Subject ID values (without Subject 5, who we’re still excluding). The second column indicates the bad channels (if any) corresponding to the Subject ID values. They’re specified using square brackets so that they will be interpreted as arrays by Matlab. You’ll see why that’s important in a bit. The next column indicates which channels should be excluded when we compute the interpolated values (the EOG channels). Those are the same for each participant, so they could be listed in the script, but I found it more convenient to put them into the spreadsheet. The last column shows the names of the bad channels. This column isn’t used by the script, but it’s nice to have that information when you’re looking at the spreadsheet.
If you look closely at the contents of the cells (e.g., by looking at the Formula Bar), you’ll see that the value for each cell begins with a single quote (except for the ID values). This tells Excel that the contents of that cell should be treated as a text string and never interpreted as a number. I’ve found that this can avoid problems when we read the values into Matlab, because we want every cell in a column to be the same data type. You should also note that the column labels don’t have any spaces or special characters in them (except for the underscore character). The Matlab routine we’ll use to read the spreadsheet will use the column labels to create new variables, so the labels need to be legal Matlab variable names.
Now look at the script again and find the line with the readtable command. This is a Matlab function that reads data from a file into a special Table data structure. It’s a very powerful function that can read from many different file types. It uses the filename to determine what kind of file is being read (e.g., the .xlsx filename extension is used to indicate that it’s an Excel XML file). It creates a Table from the data file, and we’ve told it to store this Table in a variable named interpolation_parameters .
To see how this works, run the first part of the script, starting with the DIR = pwd line and going through the interpolation_parameters = readtable(Interpolation_filename) line. Then double-click on the interpolation_parameters variable in the Workspace pane so that you can see the contents of this variable. You’ll see that it contains the same rows and columns that were in the spreadsheet. Now type interpolation_parameters.Bad_Channels on the command line. You’ll see a list of the bad channels for each subject:
9×1 cell array
{'[6 13]'}
{0×0 char}
{'[25]' }
{0×0 char}
{0×0 char}
{0×0 char}
{0×0 char}
{0×0 char}
{0×0 char}
You can see that this list is in a special Matlab-specific format called a cell array . Cell arrays are a little difficult to understand and tricky to use correctly. This is especially true for beginners, but I still often make mistakes when I try to use them. At some point you’ll need to learn about them, because they’re very useful, but for now you can rely on code that I wrote that extracts the contents of the interpolation_parameters table into a set of simple numeric arrays.
This code is embedded within the loop in Script5.m . Let’s execute the code, but without actually going through the whole loop. To do this, first type subject = 1 on the command line so that the looping variable has the correct value for the first subject. Then execute the ID = num2str(SUB(subject) ) line in the body of the loop, because we’re going to need this variable. Now execute the three lines of code beginning with table_row = . The first of these lines determines which row of the interpolation_parameters table contains the values for this subject. The second line gets the array of bad channels for this subject and stores it in a variable named bad_channels (which has zero elements if there are no bad channels). The third line gets the array of to-be-ignored channels and stores it in a variable named ignored_channels . If you’re not already an experienced Matlab programmer, the code on those lines probably looks like hieroglyphics—like I said, cell arrays are a little complicated. Once you’re more familiar with Matlab coding, and you’ve wrapped your brain around cell arrays, you should come back to this code and figure out how it works. But for now, you can treat it like a bit of magic that gets you the information you need.
Now you should inspect the contents of bad_channels and ignored_channels , either by typing the variable names on the command line or looking at them in the Workspace. You’ll see that bad_channels is an array with the values 6 and 13 (the two bad channels for Subject 1), and ignored_channels is an array with the values 31 through 33 (the three channels we will be ignoring when computing interpolated values for the bad channels).
The next line of code does the interpolation using the pop_erplabInterpolateElectrodes function. You can see that we send the bad_channels and ignored_channels variables to this function. But don’t run this line of code yet, because we haven’t run all of the preceding lines in the body of the loop. Let’s take a look at those lines before we run them.
The body of the loop begins by setting some variables and loading the dataset, just as in the previous script. Then it runs a routine called pop_erplabShiftEventCodes , which shifts all the stimulus event codes to be 26 ms later. This is necessary because there is a fairly substantial delay between when an LCD display receives an image from the computer’s video card and when it actually displays that image. We measured that delay using a photosensor and found that it was 26 ms. We therefore shift the event codes so that they occur at the actual time that the stimulus appeared instead of at the time when the image was sent to the display. If you’re using an LCD display, you must do this. If you don’t know how, contact the manufacturer of your EEG recording system.
The next step uses the pop_basicfilter function to run a bandpass filter with a bandpass of 0.1 to 30 Hz (12 dB/octave, which corresponds to a filter order of 2 in the code). Filtering out the low frequencies is essential prior to interpolation, because otherwise the random voltage offsets in each channel will produce a bizarre scalp distribution and the interpolation algorithm (which assumes a smooth scalp distribution) will produce bizarre results. Note that it is a good idea to remove the DC offset before filtering continuous EEG data, and this is implemented by specifying 'RemoveDC', 'on' when we call the pop_basicfilter function.
The next step runs the pop_chanedit function to add information about the 3D location of each electrode site based on the electrode names. The function uses a file named standard-10-5-cap385.elp that is provided by EEGLAB. It contains a list of standard electrode names (e.g., CPz ) and their idealized locations on a spherical head. This doesn’t give you the true location for each participant, which would require using a 3D digitization system, but it’s a good enough approximation for the interpolation process.
The next few lines extract the information from the interpolation_parameters table and run the interpolation routine.
Finally, we reference the data and save the dataset to the hard drive (just as in the previous exercise).
Note that, for each of these processing operations, we send the dataset to the routine in the EEG variable, and then the routine returns a modified dataset that we store in the EEG variable. In other words, the new dataset overwrites the old dataset in the EEG variable. That’s much more efficient than storing the result of each new operation in ALLEEG . But note that keeping the individual datasets makes sense when you’re processing the data in the GUI, because you want the flexibility of going back to a previous dataset.
Now that we’ve looked at the code, let’s run the script, but only for the first subject. To limit it to the first subject, make sure that the SUB = [ 1 ] line near the top isn’t commented out. Then run the script. You can now see that the script has created a new dataset file named 1_N170_shift_filt_chanlocs_interp_ref.set in the Chapter_11 > N170_Data > 1 folder.
However, this dataset isn’t visible in the Datasets menu. Script5.m differs from the previous example scripts in that it doesn’t make the datasets available in the EEGLAB GUI. It just does the processing and saves the results for each participant in a dataset file. As you’ll see later in the chapter, you’ll often have a series of scripts for processing a given experiment (e.g., one for the initial preprocessing, another for ICA, another for post-ICA EEG processing, and another for dealing with averages). Each of these scripts will read in the files created by the previous script, so there’s often no need to make the datasets available in the GUI.
If you do want to look at the results of a given script, you’ll want a convenient way of reading in the files that were just created. Script5.m accomplishes this by including code at the end for loading the new dataset files into ALLEEG . This makes it possible for you to access these datasets from the EEGLAB GUI. This code is preceded by a return command, which causes the script to terminate, so the code at the end won’t ordinarily execute when you run the script. But after you run the script, you can just select the code at the end and run it manually (e.g., by clicking the Run Section button at the top of the script editor window). Give it a try, and then verify that you can see the new dataset in the Datasets menu.
Plot this new dataset with EEGLAB > Plot > Channel data (scroll) . Then load the original dataset ( 1_N170.set ) for this participant from the Chapter_11 > N170_Data > 1 folder and plot it as well. You’ll see that the new dataset is much smoother and doesn’t have large DC offsets (because it has been bandpass filtered). And you can see some huge artifacts in the C5 and Oz channels in the original data that are gone in the new data (because the script interpolated those channels). You should also notice that the channels are in a more convenient order in the new dataset than in the original dataset as a result of the EEG Channel Operations step near the end of the loop.
Now comment out the SUB = [ 1 ] line near the top of the script and run the script again to process the data from all 9 participants. You can then select and execute the code at the end of the script for loading the new datasets into the EEGLAB GUI.
At this point, I’d like to remind you of some advice I gave you at the beginning of the chapter: Play! The best way to understand how these scripts actually work is to play around with the code. If you’re not sure you understand one of the steps, try changing it to see what happens. But don’t just do this randomly. Come up with hypotheses and test them by means of experimental manipulations (i.e., by changing the code and seeing if the results confirm or disconfirm your hypotheses). You can also try modifying the scripts to process your own data. Unless you’re a very experienced programmer, you probably won’t actually understand the key points from this chapter unless you engage in this kind of active exploration. And now is a great time to play around with the code, because the rest of the chapter assumes that you fully understand the basics and are ready to put them together into a complete data processing pipeline.