9.4: Background- A Quick Conceptual Overview of ICA

Last updated
Save as PDF

Page ID: 87977

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The online supplement to Chapter 6 in Luck (2014) provides a conceptual overview of how ICA works in general and how it is applied to artifact correction. Here I’ll provide a quick summary. Several different algorithms are available for performing the ICA decomposition. For the vast majority of cases in which ICA is used for artifact correction, there aren’t big differences among the algorithms, so here I’ll focus on EEGLAB’s default ICA algorithm (Infomax, implemented with the runica routine).

The first thing you should know is that ICA is purely a statistical technique, and it was not developed for neural data per se. It knows nothing about brains or electricity. It doesn’t know that the data are coming from electrodes or where the electrodes are located. Most ICA algorithms don’t even know or care about the order of time points. They just see each time point as a set of N abstract variables, one for each of the N channels. Infomax uses a machine learning algorithm (much like a neural network) that learns a set of N ICs that are maximally independent when applied to the data.

By maximally independent, I mean that the activation level of one IC provides no information about the activation levels of the other ICs at that time point. For example, if the blink-related voltage level at each time point does not predict other sources of activity at the same time point, blinks will likely be extracted as a separate IC. However, it’s not a problem if blink activity at one time point predicts activity from other sources at earlier or later time points.

ICA learns an unmixing matrix, which converts the EEG data at a given time point to the activation level of each IC. The inverse of the unmixing matrix is the mixing matrix, which is just the scalp distribution of each IC. You can also think of the scalp distribution of an IC as a set of weights. The voltage produced by an IC in a given channel at a given time point is the activation level of the IC at that time point multiplied by the weight for that channel. Some randomness is applied to the learning algorithm, so you won’t end up with exactly the same set of ICs if you repeat the decomposition multiple times.

An important practical consideration is that the machine learning routine needs a lot of data to adequately learn the ICs. The EEGLAB team has provide an informal rule for this, which is that the number of time points in the dataset must be at least 20 x (# channels)². It’s probably the number of minutes of data that matters rather than the number of time points, but the key thing to note is that the number of channels is squared. This means that doubling the number of channels requires four times as much data. For example, you would need four times as many minutes of data for a 64-channel recording as for a 32-channel recording (and sixteen times as much data for a 128-channel recording as for a 32-channel recording).

ICA is somewhat like principal component analysis (PCA). However, whereas PCA tries to lump as much variance as possible into the smallest number of components, ICA tries to make the components maximally independent. ICA is also like PCA insofar as it just takes the dataset and represents it along a different set of axes. You can go from the original data to the ICA decomposition with the unmixing matrix, and then you can apply the mixing matrix to the ICA decomposition and perfectly recover the original data.

This decomposition-and-recovery sequence is how ICA corrects for artifacts. After running the ICA decomposition to get the ICs, you simply set one or more of the ICs to have an activation of zero at each time point and then use the mixing matrix to recover the original data (but without the artifactual ICs). This means that ICA influences your data at every single time point. When you remove a blink IC, ICA doesn’t just find time periods with blinks and correct the data during those time periods. It reconstructs your EEG data at every time point, but with the artifactual ICs set to zero. There will be some nonzero activity in the blink IC at each time point, so zeroing this IC at each time point means that the data will be changed at least slightly at every time point. This is actually good, because there may be quite a lot of EOG activity between blinks as a result of small changes in eye rotation or eyelid position, and ICA will remove this non-neural activity when you remove the IC corresponding to blinks.

ICA makes several important assumptions (see Luck, 2014), but two are particularly important to know about. The first is that the scalp distribution of a given source of activity must remain constant over the entire session. For example, we can assume that the locations of the eyes relative to the electrode sites will not change over the course of a session (unless there is some kind of catastrophe), so blinks and eye movements meet this criterion. Similarly, the location of the heart relative to the electrodes doesn’t change over time, so the EKG artifact also meets this criterion. However, the scalp distribution produced by skin potentials will depend on which sweat pores are activated, which may change over time, so skin potentials do not meet this assumption. By the way, this assumption means that you must perform ICA separately for each participant (because the scalp distributions will differ at least slightly across participants).

There is dispute in the literature about whether ICA works well with EMG. The argument against using ICA with EMG is that different muscle fibers may contract at different time points, changing the scalp distribution. The argument for using ICA is that the scalp distribution does not actually change very much over time. To be on the safe side, my lab doesn’t use ICA for EMG. We minimize EMG by having participants relax during the EEG recording, and we can filter out the remaining EMG so that it has minimal impact on our results. However, if you cannot avoid having a lot of EMG in your data, and you can’t filter it out without creating other problems (e.g., because you’re looking at high-frequency ERP activity), you can read the literature and decide for yourself whether the benefits of using ICA for EMG outweigh the costs.

A second key assumption of ICA is that the number of true sources of activity is equal to the number of channels. This is related to the fact that the number of ICs must be equal to the number of channels in order for the math to work.

Exceptions Make the Rule

There are occasional exceptions to the rule that the number of ICs is equal to the number of channels, particularly when you are using the average of all sites as the reference. See Makoto’s Preprocessing Pipeline or EEGLAB’s ICA documentation for details.

As I mentioned earlier, the fact that the number of ICs must equal the number of channels means that ICA is an imperfect method. You don’t change the number of sources of activity when you add or subtract electrodes! Also, there will always be more sources of activity in the EEG signal than there are channels (because each synapse in the brain is a potential source of activity). As a result, ICA will lump multiple true components into the same IC. In addition, a single true source may also be split among multiple ICs. So, you will definitely have lumping of true components, and you will likely have some splitting as well.

Given the failure of EEG data to meet this second assumption, you may wonder whether it is valid to use ICA for artifact correction. As famously noted by the statistician George Box, all statistical models are wrong, and the question is not whether they are correct but whether they are useful (Box, 1976). In practice, ICA is useful for correcting some kinds of artifacts despite the invalid assumptions. The saving grace of ICA is that the lumping and splitting problems are minimal for components that account for a lot of variance (e.g., components that are both large and frequently occurring). Most participants blink a lot, and blinks are very large, so ICA typically works very well for blinks. Depending on the experiment and the participant, eye movements can be large or small and they can be frequent or rare. In my experience, ICA works only modestly well for eye movements, and it can’t correct for the change in sensory input produced by the change in gaze position, so we only use ICA to correct for eye movements when necessary. However, I recently came across a nice paper by Dimigen (2020) showing that ICA can work quite well for large and frequent eye movements when the right preprocessing steps are applied prior to the ICA decomposition (as I’ll discuss in more detail later). Drisdelle et al. (2017) also provide evidence that ICA can work well for eye movements in certain types of paradigms.

ICA can be applied either to continuous or epoched EEG. When my lab first started using ICA many years ago, I emailed Scott Makeig and Arnaud Delorme to get their advice, and they recommended applying it to the continuous EEG. They still give this advice today in the EEGLAB documentation. You can apply ICA to epoched data if necessary, but the epochs must be at least 3 seconds long (e.g., -1000 to +2000 ms). Adjacent epochs cannot contain the same data points, so this means that you must have relatively long trials for this approach to work. If you get your pipeline set up properly (see Chapter 11 and Appendix 3), there isn’t any reason why you’d need to apply ICA to epoched data, so my view is that the safest thing to do is to apply it to the continuous data. As described in the text box below, there may also be a practical advantage.

A Practical Advantage

Over the years, we’ve found a significant practical advantage to doing ICA at the earliest possible stage of EEG preprocessing (which means applying to continuous EEG, because epoching is a relatively late stage). Specifically, ICA is a time-consuming process that you don’t want to repeat if you can possibly avoid it. If you need to change some of your processing steps after you’ve already analyzed your data once, putting ICA at the earliest possible stage minimizes the likelihood that this change will require repeating the ICA.

The ICA decomposition process typically takes somewhere between 2 minutes and 2 hours depending on the nature of your data and your computer. If you need to process data from 30 participants, this is now between 60 minutes and 60 hours. That can be done overnight while you’re asleep, but another 2-20 minutes of human effort are required for each participant to make sure that the decomposition has worked properly and to determine which ICs should be removed. That’s 60 to 600 minutes of your precious time.

What’s the likelihood that you will need to re-process your data? In my experience, the likelihood is close to 100%! Reviewers always seem to want some change (or some secondary analysis). And when you’re new to ERP analysis, you’re likely to do something that is less than optimal and will require a re-analysis. But if you’ve done the artifact correction at the earliest possible point in your processing pipeline, chances are good that you won’t need to repeat this time-consuming part of your pipeline.

A key step in ICA-based artifact correction is to determine which ICs correspond to artifacts and should be removed. There are automated algorithms for this, but I recommend doing it manually for the vast majority of studies. As you will see, you need to carefully determine whether a given IC should be removed, which requires taking into account the three underlying goals of artifact rejection and correction, and this often goes beyond what an algorithm can do.

ICA-based artifact correction massively changes your data, and we know we are violating at least one of its assumptions, so I recommend being conservative in using it. We almost always use it for blinks, and we sometimes use it for eye movements, but we don’t ordinarily use it for other kinds of artifacts. If we frequently encountered large EKG artifacts, we’d probably use ICA for those as well. Some labs use ICA for anything that looks “weird”, but I personally don’t like that approach. There are other ways of dealing with these other types of artifacts, and I just don’t trust an algorithm to solve every problem in my data.

Finally, don’t forget Hansen’s Axiom: There’s no substitute for good data. Do everything you can to minimize artifacts during the recording, and then you won’t end up getting an ulcer from worrying about how to deal with a ton of artifacts during the analysis.