8.6: Exercise- Adjusting the Threshold

Last updated
Save as PDF

Page ID: 137774

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In this exercise, we’ll see how adjusting the threshold changes which epochs are flagged for rejection. Let’s start by seeing if we can detect some of the blinks that were missed with our ±100 µV threshold. Make 1_MMN_preprocessed_interp_be the active dataset, and then select EEGLAB > ERPLAB > Artifact detection in epoched data > Simple voltage thresholds. Change the voltage limits to -50 50 to indicate that an epoch should be flagged for rejection if the voltage is more negative than -50 µV or more positive than +50 µV at any time in the VEOG-bipolar channel. Click ACCEPT to run the artifact detection routine.

The first thing you should look at is the proportion of rejected trials, which is shown in the Matlab command window. Whereas 17.8% of epochs were flagged when our threshold was ±100 µV, now 42.9% have been flagged. If we were to use this ±50 µV threshold, this participant would need to be excluded from the final analyses (because my lab excludes participants if more than 25% of trials were rejected). Obviously, you don’t want to exclude participants if you don’t have to, so let’s see if we really want to use this threshold.

If you scroll through the epochs in the plotting window that appeared, you’ll see that the blinks in Epochs 104 and 170 have been detected with this threshold. That’s the good news. But if you keep scrolling, you’ll see the bad news: Many epochs without a clear blink are now flagged for rejection (e.g., Epochs 408, 424, 432, and 435-437). In general, decreasing the threshold for rejection increases our hit rate (the proportion of blinks that were detected) but also increases our false alarm rate (the proportion of non-blink epochs that are flagged for rejection).

Now let’s try increasing our threshold to avoid flagging epochs 463, 525, and 526, which were unnecessarily flagged for rejection with our original threshold of ±100 µV. Close the plotting window and the window for saving the dataset, make sure that 1_MMN_preprocessed_interp is still the active dataset, and run the artifact detection routine using voltage limits to -150 150.

The percentage of flagged trials has now dropped to 11.3%. That’s good insofar as increasing the number of accepted trials will increase our signal-to-noise ratio. But it might be bad if a lot of blinks have escaped detection.

If you scroll through the data, you’ll see that Epochs 463, 525, and 526 are no longer flagged for rejection, which is good. However, several clear blinks have been missed (e.g., Epochs 103, 191, 201). In general, increasing the threshold for rejection decreases the hit rate but also decreases the false alarm rate.

The take-home message of this exercise is that adjusting the threshold impacts both the hit rate and the false alarm rate, making one better and the other worse. You’ll need to choose a threshold that balances the hit rate and false alarm rate in a way that best helps you achieve the fundamental goal, which is to accurately answer the scientific question that the experiment is designed to address. Is that goal best met by ensuring that all epochs with blinks are rejected, even if this means rejecting some perfectly fine epochs? Or is the goal best met by optimizing the number of included epochs, even if a few blinks escape rejection?

The answer will depend on the nature of your scientific question, the details of your experimental design, and the nature of the artifacts in your data. In particular, if blinks differ systematically across bins (especially in the time range of the ERP components of interest), then you will usually need to make sure that the vast majority are rejected to avoid confounds. And if you have a reasonably large number of trials, throwing out a few trials without blinks won’t really change your signal-to-noise ratio very much (see the text box below). So, in most cases, I recommend erring on the side of throwing out too many trials rather than allowing some large artifacts to remain in the data.

Also, as you’ll see in some of the later exercises, you can both increase your hit rate and decrease your false alarm rate by choosing a better algorithm for determining which epochs contain artifacts. The simple voltage threshold we’ve used in this example is a poor way of detecting blinks, and I’m always amazed that many software packages don’t provide better algorithms.

Don’t Stress About Rejecting a Few Trials

It’s easy to get stressed out about excluding 20% or 50% of trials because of artifacts. Is this going to cause a 20% or 50% reduction in your data quality? It turns out that excluding trials has a smaller impact on data quality than you might expect.

This is because the signal-to-noise ratio (SNR) increases as a function of the square root of the number of trials. This square root rule is really annoying when you’re designing your experiment, because doubling the number of trials only increases your SNR by 41% (because sqrt(2) = 1.41). But the same rule means that you don’t lose very much SNR when you have to exclude some trials.

As an example, imagine that your single-trial SNR is 1:2 or 0.5 (i.e., your signal is half as big as your noise in the raw EEG epochs). If you average together 100 trials, the resulting SNR is 0.5 x sqrt(100) = 5. Now imagine that you have to exclude 20 trials because of artifacts. Now your SNR is 0.5 x sqrt(80) = 4.47. That is, you’ve decreased the number of trials by 20%, but your SNR has dropped by only about 10%.

Now imagine that you have to exclude 50 trials. The resulting SNR is 0.5 x sqrt(50) = 3.54. Even though you’ve decreased the number of trials by 50%, your SNR has dropped by only about 30%.

As mentioned earlier, you should have an a priori threshold for excluding participants on the basis of the percentage of rejected trials, and the square root rule will help you decide on what percentage to use as your threshold. How much is your statistical power reduced by excluding a participant versus including participants with a reduced SNR? Usually, your power is reduced more by excluding the participant unless so many trials were rejected that the SNR is truly awful.

However, this assumes that the artifacts are random, and the only difference between participants with lots of artifacts and participants with few artifacts is the number of trials available for averaging. In my experience, this assumption is false. Participants with a large number of artifacts tend to be less compliant with instructions, may be more sleep-deprived, and often have poorer EEG signals even on the trials without artifacts. Our threshold for excluding participants (25% in basic science studies, 50% in schizophrenia studies) is lower than would be necessary if we solely considered the square root rule.

In the future, we may switch to a rule that is based on the SME—a direct measure of data quality—rather than the percentage of rejected trials. This might make it possible to avoid excluding participants whose averaged ERPs are quite clean even though they had a lot of rejected trials and to exclude participants who didn’t have a lot of rejected trials but had noisy averages nonetheless. This approach could be particularly valuable in research participants for whom it is difficult to obtain a large number of trials (e.g., infants and small children).