In Section 13.2, we examined some evidence that the part of the brain that processes auditory information is sensitive to phonological categories. Critically, in the study by Phillips and colleagues (2000), the participants were English speakers who have separate /t/ and /d/ phonemes as a part of their mental grammar. We could expect that for a speaker of a language that doesn’t have that distinction, the pattern of brain reactions could be quite different.
Researchers have studied how a person’s native language could influence their processing of vocal language. For example, Marslen-Wilson and Lahiri (1991) asked whether Bengali speakers and English speakers would process nasal and non-nasal vowels differently. Both English and Bengali have nasal vowels, but the nasal/oral distinction is only phonemic — in other words it can only create contrast — in Bengali.
For example, the English word ban is typically pronounced with a nasal vowel ([bæ̃n]) because of a phonological process called nasalization. The vowel becomes nasal because of the influence of the upcoming nasal consonant /n/. The vowel in bad is not nasalized because /d/ is an oral consonant. So in English, nasal vowels are predictable based on the phonological environment they are in: before a nasal consonant, the vowel is nasalized, and elsewhere the vowel is oral. Therefore, [æ̃] and [æ] are variants — allophones — of one phoneme.
Bengali also has a rule nasalizing vowels before nasal consonants, but is different from English in that having a nasal versus an oral vowel is not completely predictable based on the phonological environment. For example, Bengali has the minimal pair /bãd/ (which means ‘dam’) and /bad/ (which means ‘difference’) that differ only in the nasal/oral status of the vowel /a/ and yet have different meanings. This means that in Bengali, /a/ and /ã/ are separate phonemes.
Marslen-Wilson and Lahiri showed that this difference in the phonemic status of nasal and oral vowels between English and Bengali has an influence on how speakers of these languages recognize spoken words. Before we get to their experiment, let us introduce some background about spoken word recognition and their experimental method, the gating task.
Spoken words unfold over time. The human mind doesn’t wait for a word to be over before recognizing it, but rather activates potential matches from the very beginning of hearing the word. Upon hearing the first sound of a word, there will be a large number of potential matches. This number will get smaller and smaller as more of the word is heard, because potential candidates will be ruled out. For example, imagine that a listener hears the word report (/ɹipoʊɹt/). The first phoneme, /ɹ/, is compatible with lots of words: report, red, reach, robot, etc. Once /i/ is heard, then red and robot would be ruled out because they are no longer compatible with the input. The set of all potential matches that overlap with the beginning of a word up to a given point is called an onset cohort. One influential model of spoken word perception, the Cohort Model (see Marslen-Wilson & Tyler, 1980) claims that members of the onset cohort of a word become active during the hearing of the word, but that the activation for a potential match drops off once the evidence is no longer compatible with that word.
At a certain point in each spoken word, listeners (on average) will be able to correctly identify what the word will be. This is called the recognition point for that word. One way to determine a word’s recognition point is through an experimental method called a gating task. In the gating task, a recording of a word is presented to experimental participants in progressively bigger fragments. After hearing a fragment of the recording, participants are asked to guess what the word is, perhaps by writing down their guess. As you might imagine, these guesses become more accurate as the fragments become longer. Eventually a particular fragment length will provide enough information to reach a threshold where most people correctly identify the word, so the end of that fragment can be said to be the word’s recognition point.
Marslen-Wilson and Lahiri asked whether a listener’s knowledge of the phonology of their native language would influence their ability to recognize words as they unfold. They found that English listeners could identify whether a word was ban or bad before they heard the last consonant, because the nasal or oral quality of the vowel helped them predict what the upcoming consonant would be. Bengali listeners, on the other hand, needed more information before identifying a word with a nasal vowel, leading to a later recognition point for those words. This is presumably because, in Bengali, a word with a nasal vowel could end in a nasal consonant, like /n/, or an oral consonant, like /d/. Bengali speakers do not use the nasal or oral quality of the vowel to predict the upcoming consonant because, in their mental grammars, nasal and oral vowels are separate phonemes.
Further language-specific phonological knowledge has been found using ERPs and again, the Mismatch Negativity. Dehaene-Lambertz and colleagues (2001) asked whether sequences of syllables would be processed similarly by speakers of languages with different phonotactic constraints. Remember from Chapter 4, Section 4.2 that languages have restrictions on what syllables they allow. In Japanese, for example, nasals are the only consonants allowed at the end of a syllable – oral consonants cannot be syllable codas, in other words. English and French, however, allow a variety of consonants in coda position. So what happens when a Japanese speaker listens to sequences of syllables that have an illegal coda consonant?
Following up on earlier work by Dupoux and colleagues, Dahaene-Lambertz et al. presented French native speakers and Japanese native speakers with fake words like igumo and igmo. The first one, igumo, is possible with either Japanese or French phonotactics because it can be split i.gu.mo (here I have used ‘.’ to indicate a syllable boundary). The second one only fits the phonotactics of French. The sequence /gm/ is not a good syllable onset in either language, so the only potential syllabification is ig.mo. This is a possible word of French but not Japanese, because it has the a /g/ as a coda consonant.
In their experiment, participants listened to sequences of fake words while the electrical signal from the surface of the scalp was recorded (EEG). The participants heard one word several times, which was then followed by either the same word again, a word that differed only the presence or absence of /u/, or a completely different word /igimo/. They found that for the cases that differed only in the presence of /u/, French speakers indicated that the last word in the sequence was different from the rest, whereas Japanese speakers largely thought they were the same. The brain’s response echoed the responses – the French speakers showed a response that can be interpreted as a Mismatch Negativity for the ‘deviant’ items but the Japanese speakers did not. So why would the Japanese speakers not notice a difference between /igmo/ and /igumo/? One interpretation of this finding is that because /igmo/ doesn’t fit with the phonotactic constraints of their language, Japanese speakers mentally insert a vowel to correct the illegal coda. In other words, Japanese speakers “hear” /igumo/ rather than /igmo/. So our mental grammar can influence the way we perceive speech.
This experiment is part of a body of evidence demonstrating that our knowledge of the phonology of our native language, as a part of mental grammar, has an influence on how our brains process language.
Dehaene-Lambertz, G., Dupoux, E., & Gout, A. (2000). Electrophysiological Correlates of Phonological Processing: A Cross-Linguistic Study. Journal of Cognitive Neuroscience, 12(4), 635–647.
Dupoux, E., Kakehi, K., Hirose, Y., Pallier, C., & Mehler, J. (1999). Epenthetic vowels in Japanese: A perceptual illusion? Journal of Experimental Psychology: Human Perception and Performance, 25(6), 1568–1578. https://doi.org/10.1037/0096-1518.104.22.1688
Lahiri, A., & Marslen-Wilson, W. (1991). The mental representation of lexical form: A phonological approach to the recognition lexicon. Cognition, 38(3), 245–294. https://doi.org/10.1016/0010-0277(91)90008-R
Marslen-Wilson, W., & Tyler, L. K. (1980). The temporal structure of spoken language understanding. Cognition, 8(1), 1–71. https://doi.org/10.1016/0010-0277(80)90015-3<
Phillips, C., Pellathy, T., Marantz, A., Yellin, E., Wexler, K., Poeppel, D., McGinnis, M., & Roberts, T. (2000). Auditory Cortex Accesses Phonological Categories: An MEG Mismatch Study. Journal of Cognitive Neuroscience, 12(6), 1038–1055.