6.7: Exercise - The Signal-to-Noise Ratio

Last updated
Save as PDF

Page ID: 108207

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In this exercise, we’re going to quantify the impact of the high-frequency noise in F8 on our data quality and also see how the data quality depends on the number of trials being averaged together.

Take a look at the analytic standardized measurement error (aSME) values for the ERPset you just created (EEGLAB > ERPLAB > Data Quality options > Show Data Quality measures in a table). There’s a ton of information in the table, and I find that it helps to select the Color heatmap option. Interestingly, the aSME values from the F8 electrode are not particularly high. The worst values are at PO4.

This disconnect between the aSME values and the high-frequency noise you can see in the waveforms occurs because these aSME values tell you about the precision of the measurements of the mean voltage over 100-ms periods. High-frequency noise has very little impact on the mean voltage over a 100-ms period, because the upward and downward noise deflections cancel out. However, low-frequency noise has a big impact on the mean voltage over a given time period. If you look closely at the EEG for this participant, you’ll see that the voltage tends to drift around a bit more at PO4 than at the other sites. That’s why PO4 has the worst (largest) aSME value. If we were to quantify P3b amplitude as the mean voltage between 300 and 500 ms (as recommended by Kappenman et al., 2021), the noise in PO4 would mean that our P3b amplitude score could be quite far from the participant’s true score (i.e., the score we would obtain with an infinite number of trials). By contrast, the high-frequency noise in F8 wouldn’t have much impact.

If we instead quantified P3b amplitude as the peak voltage between 300 and 500 ms, the high-frequency noise would be a bigger problem. Computing the standardized measurement error for peak amplitude is more complicated, so we’re not going to look at it now, but if we did I’m sure that the high-frequency noise in F8 would produce a large SME value. The take-home message is that the effect of noise on your ability to precisely measure the amplitude or latency of a given ERP component depends on both the nature of the noise (e.g., high-frequency versus low-frequency) and the nature of the method used to quantify the amplitude or latency (e.g., mean amplitude versus peak amplitude).

Now let’s look at how the data quality differs between the Rare and Frequent averages. A standard idea in the ERP literature is that the signal-to-noise ratio of an averaged ERP increases according to the square root of the number of trials (all else being equal). I have to admit that I didn’t understand exactly what was meant by “noise” in the signal-to-noise ratio until a few years ago, when we started developing the SME metric of data quality. The “signal” part of the signal-to-noise ratio is the “true” amplitude of the averaged ERP waveform at a given moment in time (i.e., the amplitude we would obtain with an infinite number of trials). But how do we define the noise?

It turns out that the noise is quantified as the standard error of the voltage at this time point in the averaged ERP waveform. The voltage at a given time point in an averaged ERP waveform is simply the mean across the epochs being averaged together, and the standard error of this mean can be estimated using the standard analytic formula for the standard error of the mean: SD ÷ sqrt(N). That is, we take the standard deviation (SD) of the single-trial voltages at this time point and divide by the square root of the number of trials. Because the denominator is sqrt(N), this standard error gets smaller according to sqrt(N). So, the denominator of the signal-to-noise ratio decreases according to the square root of the number of trials, so the overall signal-to-noise ratio must increase according to the square root of the number of trials.

In our P3b oddball experiment, 20% of the trials were oddballs, so there were 4 times as many Frequent trials as Rare trials. This means that sqrt(N) was twice as great for the Frequent condition as for the Rare condition (because sqrt(4) = 2). And this implies that the standard error should be half as large for the Frequent condition as for the Rare condition. The SME value is a generalized metric of the standard error; it gives you the standard error for any amplitude or latency measure that is obtained from an averaged ERP waveform (see the box below for more details).

Take a look at the aSME values for the Rare and Frequent conditions. You should see that the values are approximately half as large for the Frequent condition (Bin 2) as for the Rare condition (Bin 1). That is, the noise (quantified as the SME) is about half as big in the condition with four times as many trials. This is exactly what we would expect from the idea that the signal-to-noise ratio varies according to the square root of the number of trials.

The SME is just an estimate, so we wouldn’t expect it to be perfectly predicted by the number of trials in a finite data set. To get a more robust estimate, I made an average across all the time ranges and channels, and I found a mean of 0.617 for the Frequent condition and 1.372 for the Rare condition. This isn’t quite a 1:2 ratio. But there’s a good explanation for this: although there were 4 times as many Frequent trials as Rare trials in the experiment, we ended up excluding more Rare trials from the averages because of incorrect responses. Take a look at the actual number of trials in Bins 1 and 2—does the difference between the average aSME values for the Rare and Frequent trials make sense given the actual sqrt(N) for the Rare and Frequent trials?

Some details about the SME

With ERPLAB’s default settings, the SME values indicate the standard error of the amplitude scores that you’d get by quantifying the amplitude as the mean value in a set of consecutive 100-ms time periods. You can easily change the parameters (in the averaging step) to select other time intervals. Ordinarily, we would measure the P3b as the mean amplitude between 300 and 500 ms. If you’re interested, you can have ERPLAB estimate the SME values for this time range by re-averaging the data and selecting custom parameters in the Data Quality Quantification section of the averaging routine.

ERPLAB’s default SME values are estimated using the analytic formula for the standard error of the mean [SD ÷ sqrt(N)], so we call these analytic SME (aSME) values. This formula can’t be used to estimate the SME for other amplitude or latency scores (e.g., peak amplitude or peak latency), and something called bootstrapping is used instead. This is more complicated and currently requires scripting. This is described in our original paper on the SME (Luck et al., 2021), and we have provided example scripts for computing bootstrapped SME values (https://doi.org/10.18115/D58G91).