Skip to main content
Social Sci LibreTexts

2.6: All About Vowels

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    2.6.1 From 2.8 Diphthongs, in Anderson's Essentials of Linguistics


    Video Script

    The last unit talked about simple vowels, where the tongue position stays pretty constant throughout the duration of the vowel. In addition to simple vowels, many languages include diphthongs, where we move our articulators while producing the vowel. This gives the sound a different a different shape at the end from how it begins. The word diphthong comes from the Greek word for “two sounds”.

    There are three major diphthongs in English that have quite a noticeable change in the quality of the vowel sound.

    Say these English words out loud: fly, tie, ride, smile. Now make the vowel sound [aɪ] again but hold it at the beginning [aaa]. The first part of the sound is the low front [a], but then the tongue moves up quickly at the end of the sound, ending it [ɪ]. So the [aɪ] sound is a diphthong, and it gets transcribed with two consecutive symbols:[aɪ].

    In the words now, loud, brown, the tongue again starts low and front [a], and then it moves high and to the back of the mouth, and the lips get rounded too! The second part of this diphthongs is but the high back rounded [ʊ]. The [aʊ] diphthong is transcribed like this: [aʊ].

    The third major diphthong in English occurs in words like toy, boil, coin. It starts with the tongue at the back of the mouth and lips rounded [ɔ], then moves to the front with lips unrounded. It is transcribed like this: [ɔɪ].

    Some linguists also consider the vowel sound in cue and few to be a diphthong. In this case, the vowel sound starts with the glide [j] and then moves into the vowel [u].

    In addition to these major English diphthongs, speakers of Canadian English also have a tendency to turn the mid-tense vowels into diphthongs.

    For example, let’s look at the pair of vowels [e] and [ɛ] from the words gate and get. They’re both mid, front, unrounded vowels, but [e] is tense – it’s made with greater tension in the muscles of the vocal tract than [ɛ]. Canadian English speakers pronounce the lax vowel in get as a simple vowel [ɡɛt], but for the tense vowel, we tend to move the tongue up at the end: [ɡeɪt]. We do it so systematically that it’s very hard for us to hear it, but it’s always there.

    We do the analogous thing for the mid-back vowel [o] like in show and toe: at the end of the [o] vowel, the tongue moves up a little bit so we produce the vowel as [oʊ]. Notice that the lips are rounded for both parts of this diphthong.

    To sum up, a diphthong is a vowel sound that involves movement of the tongue from one position to another. Nearly all dialects of English include the three major diphthongs [aɪ] , [aʊ] , and [ɔɪ]. These ones are called the major diphthongs because they involve large movements of the tongue.

    In Canadian English, speakers also regularly produce diphthongs for the tense vowels, [eɪ] and [oʊ], but not all English dialects do this. Some linguists consider these ones to be minor diphthongs.

    Check Yourself

    Exercise \(\PageIndex{1}\)

    What is the diphthong sound in the word in the word proud?

    • [ɔɪ]
    • [oʊ]
    • [aʊ]
    • [aɪ]
    • [eɪ]

    "[aʊ]" (also known as [aw])

    Hint: If you break down the vowel sounds in that diphthong, the first one is [a], and the second one is [ʊ]. It may sound more like the glide [w] for many Mainstream American English speakers.

    Exercise \(\PageIndex{2}\)

    What is the diphthong sound in the word in the word rain?

    • [ɔɪ]
    • [eɪ]
    • [oʊ]
    • [aɪ]
    • [aʊ]

    "[eɪ]" (also known as [ej])

    Hint: If you break down the vowel sounds in that diphthong, the first one is [3], and the second one is [ɪ]. It may sound more like the glide [j] for many Mainstream American English speakers.

    Exercise \(\PageIndex{3}\)

    What is the diphthong sound in the word in the word sigh?

    • [aɪ]
    • [aʊ]
    • [oʊ]
    • [eɪ]
    • [ɔɪ]

    "[aɪ]" (also known as [aj]

    Hint: If you break down the vowel sounds in that diphthong, the first one is [a], and the second one is [ɪ]. It may sound more like the glide [j] for many Mainstream American English speakers.

    2.6.2: From 3.3 Stress and Suprasegmentals, in Anderson's Essentials of Linguistics


    Video Script

    So far all the sounds we’ve been considering are segments: the individual speech sounds that we represent with IPA symbols. But when we speak, we also include sounds that are above or beyond the level of the segments. This sound information is called prosody, or suprasegmental information, and it makes up the rhythm, timing, meter, and stress of the words and sentences that we speak. The primary pieces of suprasegmental information are the pitch of sounds, the loudness, and the length.

    The pitch of a sound is how high or low it is. We produce high pitched sounds when our vocal folds have a high-frequency vibration, and when our vocal folds vibrate more slowly, the resulting sound is lower in pitch.

    Some languages use pitch information to signal changes in word meaning. If a language uses pitch this way, the pitch information is called tone. These example words are from Yoruba, a language spoken in Nigeria. If you look just at the segmental level, these words all seem to be transcribed the same. But speakers of Yoruba vary their pitch when they speak these words so that the meaning of the word changes depending on whether the second syllable has a high tone, a mid-tone, or a low tone. Probably the best-known tone language is Mandarin, which has five different tones. Looking at these five words, you can see that they contain the same segments, but it’s the tones that distinguish their meaning.

    Languages also use pitch in another way, not to change word meaning, but to signal information at the level of the discourse, or to signal a speaker’s emotion or attitude. When pitch is used this way, it’s called intonation rather than tone. English uses pitch for intonation — let’s look at some examples.

    Sam got an A in Calculus.
    Sam got an A in Calculus!
    Sam got an A in Calculus?
    Sam? got an A? in Calculus?

    All of these sentences contain the same words (and the same segments) but if we vary the intonation, we convey something different about the speaker’s attitude towards the sentence’s meaning. Notice that we sometimes use punctuation in our writing to give some clues about a sentence’s prosody.

    Another component of suprasegmental information is the length of sounds. Some sounds are longer than others. Listen carefully to these two words in English. beat, bead. The vowel sound in both words is the high front tense vowel [i]. But in bead, the vowel is a little longer. This is a predictable process in English — vowels get longer when there’s a voiced sound in the coda of the syllable. The diacritic to indicate that a segment is long looks a bit like a colon [iː].

    So a sound can change in length as the result of a predictable articulatory process, or, like intonation, length can signal discourse-level information about an utterance. Consider the difference between, That test was easy, and, That test was eeeeeeeeeeeeeaaasyyyyyyyy. Some languages use length contrastively, that is, to change the meaning of a word. In these words in Yapese, a language of the Western Pacific region, you can see that making a vowel long leads to a completely different word with a new meaning. In these words from Italian, consonant length can change the meaning of a word, so fato means fate, but fatto means fact.

    In English, pitch, loudness and length also contribute to the stress pattern in words. English words that are longer than one syllable usually alternate between stressed and unstressed syllables. Stressed syllables are more prominent than unstressed syllable, and what makes them prominent is that they’re louder, longer, and higher in pitch than unstressed syllables. Here are some examples.

    The words happy, music, sweater have primary stress on the first syllable, while the words beside, around, descend are stressed on the second syllable. If you’re having a hard time hearing the stress difference, try humming the words to hear the difference in pitch. Stress on the first syllable sounds like this [humming] and stress on the second syllable sounds like this [humming].

    Being able to identify stressed syllables is important when we’re learning to do phonetic transcription, because in English, stressed syllables usually get pronounced with a full vowel, while the vowel in unstressed syllables gets reduced. What does it mean to be reduced? That short mid-central vowel that has the name schwa and the symbol [ə] like an upside-down “e” is the most neutral vowel in English. So the “uh” sound in the first syllable of banana gets transcribed with a schwa because it’s unstressed, but the “uh” in bunny gets a full vowel because it’s in a stressed syllable. We’ll see later in this chapter that stress makes a difference to alveolar stops and to aspirated consonants as well!

    To sum up, suprasegmental information, also known as prosody, is that sound information that’s above the level of the segment. It consists of pitch, loudness, and length. Many languages use prosody to provide discourse-level information, and some languages also use prosody to change word meanings.

    Check Yourself

    Exercise \(\PageIndex{4}\)

    Young children’s voices are usually recognizably different from adult’s voices. Which factor is likeliest to be different between children’s speech and adults’ speech?

    • Word length.
    • Tone.
    • Pitch.


    Hint: Think about when children speak, and how the pitch of their speech is less varied than what adults produce.

    Exercise \(\PageIndex{5}\)

    In English, yes-no questions often conclude with rising pitch, whereas wh-questions often have a falling pitch on the final words. Is this pitch difference a difference in tone or in intonation?

    • Tone.
    • Intonation.
    • Pitch.


    Hint: "Intonation" is more association with syntactical elements, like question formation. Tone and pitch are more lexical, typically. Also: English is not a tone or pitch language.

    Exercise \(\PageIndex{6}\)

    English uses pitch as one factor in syllable stress. There are many English pairs of words like record (noun) and record (verb), which are spelled the same but differ in their stress patterns. Which of the following is true for this pair of words?

    • The first syllable has higher pitch than the second in the noun record.
    • The second syllable has higher pitch than the first in the noun record.

    "The first syllable has higher pitch than the second in the noun record."

    Hint: Think about which syllable is a bit higher or louder when you say the noun record.

    2.6.3: From 3.4 Syllable Structure, in Anderson's Essentials of Learning


    Video Script

    In a previous unit we saw that a syllable is a peak of sonority surrounded by less sonorous sounds. We know that sonority is acoustic energy, and now that we understand how speech is produced, we know that the most sonorous sounds, the ones that have the most acoustic energy, are the sounds that are produced with the vocal tract unobstructed. The most sonorous sounds are vowels. Consonants, on the other hand, have an obstruction in the vocal tract so they’re less sonorous. So we might also think of a syllable as a vowel surrounded by some consonants. That’s a good beginning definition, but it’s a little more complex than that, as we’ll see in this unit and the next. Our mental grammar doesn’t just organize words into syllables, but it also structures what’s inside a syllable. Let’s take a look. The name for the most sonorous part of a syllable is the nucleus. In a typical syllable, the nucleus will be a vowel, produced with an unobstructed vocal tract. The segments that come before the nucleus are called the onset, and if there are any segments after the nucleus they’re called the coda. The nucleus and coda together form a unit that we call the rhyme, and linguists like to use the Greek letter sigma (σ) to label the entire syllable.

    Let’s look at how this works in some English words. When we say a word is “monosyllabic” that just means that it has one syllable. We’ll start with a nice simple word like big [bɪɡ]. The nucleus is the most sonorous part, so in this word, the vowel [ɪ] is the nucleus. The consonant that comes after the vowel nucleus [ɡ] is the coda, and the consonant that comes before [b] is the onset. The only part of a syllable that always has to be there is the nucleus. Some syllables have an onset but no coda, like the word day [deɪ], and some syllables have a coda but not onset, like the word eat [it]. And the occasional syllable has neither an onset nor a coda, just a nucleus, like the word I [aɪ]!

    What about a single-syllable word that has more consonants in it? Let’s look at screens. Again, the vowel [i] is the nucleus of this syllable, and the consonants [nz] that come after the nucleus form the coda. There are three consonants [skɹ] before the nucleus, and they form the onset. When there’s a group of consonants in the onset or coda we call them a cluster.

    Monosyllabic words are pretty straightforward. How does it work with words that have more than one syllable, like raptor? It’s got two syllables, so it has two nuclei [æ] [ə]. The consonant at the beginning of the word [ɹ] is the onset of the first syllable, and the consonant at the end of the word [ɹ] is obviously the coda of the second syllable. What about these two consonants in the middle? In the word raptor, the [p] is the coda of the first syllable and [t] is the onset of the second syllable, but there are other logical possibilities. We could just as easily say that the first syllable has a coda cluster [pt], or that the second syllable has an onset cluster [pt]. How does the mental grammar organize consonants in the middle of a multi-syllabic word?

    Well, it’s not random, and the mental grammar doesn’t just try to distribute consonants evenly. There’s a systematic principle that operates in the mental grammar, which is that onsets are greedy. To see what that means, let’s look at a word that has a bunch of consonants in the middle, like emblem. There are three consonants [mbl] in the middle of this word, so there are four logical possibilities for how they could be organized. It could be that all the consonants go in the onset of the second syllable. It could be that they all go in the coda of the first syllable, or they could be divided up between the coda of the first and the onset of the second, with a couple of possible permutations. What does the mental grammar do with these consonants?

    The principle that onsets are greedy means that an onset will take as many consonants as it can. So this first option here has the greediest onset: it has the greatest number of consonants in an onset position. But it looks pretty weird, doesn’t it, to have a syllable start with [mbl]? A greedy onset takes as many consonants as it can within the grammar of that language. It’s a principle of English grammar that words don’t begin with a cluster like [mbl], and neither do syllables. Of these four options, the one that has the greediest onset that is possible within English is this one: the [m] is the coda of the first syllable, and the consonant cluster [bl] is the onset of the second syllable.

    Let’s look at one more example to illustrate this idea that onsets are greedy. Consider the word ugly. The two vowels [ʌ] [i] form the two nuclei of the syllables; there’s no onset for the first syllable, and no coda for the second syllable. So there are three logical possibilities for these middle consonants [ɡl] — they could both be the coda; they could both be the onset; or they could split the difference. Which does the mental grammar do? The onset is greedy, so it wants to take as many consonants as it can. We know that [ɡl] is a possible onset in English, because there are lots of words that start with [ɡl], like glue, glass, glamour. So because [ɡl] is a possible, grammatical onset cluster in English, the onset of the second syllable takes all of it, and leaves no consonants in the coda of the first syllable.

    Let’s sum up. Syllables are units within words, and they also have an inner structure of their own. Every syllable has a nucleus, which is the most sonorous part of the syllable: a vowel or another sonorous sound. If there are consonants, which are less sonorous, they make up the onset and coda of the syllable. And in the middle of a word, onsets are greedy: they’ll take as many consonants as they can, within the constraints of the grammar of the language.

    Check Yourself

    Exercise \(\PageIndex{7}\)

    In the following transcriptions, a dot [.] represents a potential syllable boundary. Which one shows the syllable boundary in the correct location for the word dispute?

    • [ dɪsp . jut ]
    • [ dɪ . spjut ]
    • [ dɪspj . ut ]
    • [ dɪs . pjut ]

    "[ dɪs . pjut ]"

    Hint: Think about where a 'natural break' is in that word, and that is where the syllable boundary will be. It is connected to the morphology--more on that in the next chapter.

    Exercise \(\PageIndex{8}\)

    In the following transcriptions, a dot [.] represents a potential syllable boundary. Which one shows the syllable boundary in the correct location for the word melting?

    • [ mɛ . ltɪŋ ]
    • [ mɛl . tɪŋ ]
    • [ mɛlt . ɪŋ ]

    "[ mɛlt . ɪŋ ]"

    Hint: Think about where a 'natural break' is in that word, and that is where the syllable boundary will be. It is connected to the morphology--more on that in the next chapter.

    Exercise \(\PageIndex{9}\)

    In the following transcriptions, a dot [.] represents a potential syllable boundary. Which one shows the syllable boundary in the correct location for the word access?

    • [ æ . ksɛs ]
    • [ æk . sɛs ]
    • [ æks . ɛs ]

    "[ æk . sɛs ]"

    Hint: Think about where a 'natural break' is in that word, and that is where the syllable boundary will be. It is connected to the morphology--more on that in the next chapter.

    2.6.4: From 3.5 Syllabic Consonants, in Anderson's Essentials of Linguistics


    Video Script

    Do you remember our definition of a syllable from a couple of units ago? We said that a syllable has a nucleus: the peak of sonority, which is surrounded by less sonorous sounds. We already know that vowels are the most sonorous sounds, so most syllables have a vowel as the nucleus. We know that glides are also fairly sonorous, but they’re too short to serve as the nucleus of a syllable. Thinking about all the consonant sounds we know, some of them are more sonorous than others. Stops are not very sonorous because they have so little airflow because the vocal tract is completely obstructed. And fricatives also aren’t very sonorous because of the obstruction in the vocal tract. But nasal consonants are quite sonorous because the airflow resonates through the nasal cavity even when the oral cavity is stopped. And the liquids, [l] and [ɹ], are also quite sonorous because air is allowed to flow around the tongue.

    These sonorous consonants can sometimes serve as the nucleus of a syllable in their own right. In other words, there are some syllables that don’t have a vowel at all, just a sonorous consonant. Let’s look at some examples.

    In the word rhythm, the second syllable is unstressed, and it’s pretty short. Most of the time, in ordinary rapid speech, that second syllable doesn’t have a vowel in it at all. Our articulators go right from the [ð] sound at the end of the first syllable into the [m] sound. The [m] itself becomes the nucleus of the syllable. It is said to be a syllabic consonant, and we use a special notation to transcribe it: [ɹɪðm̩]. Look at that little vertical line below the [m] symbol — that’s called a diacritic. Diacritics are special additional notations we add to IPA symbols to give extra information about the sounds. That vertical line is the diacritic for a syllabic consonant.

    Here’s an example of a liquid consonant becoming syllabic. When we speak the word funnel, we don’t produce a vowel in the second, unstressed syllable. Instead, we pronounce the [l] as a syllabic [l̩], so that it is the nucleus of the syllable. The notation is the same, with the diacritic for the syllabic [l̩]: [fʌnl̩].

    Check Yourself

    Exercise \(\PageIndex{10}\)

    The video indicated that the word funnel can be transcribed to indicate that the second syllable consists of a syllabic [l̩]. The word elbow is also spelled with the letters ‘el’. Say the two words to yourself several times. Which is the correct transcription for elbow?

    • [l̩boʊ].
    • [ɛlboʊ].


    Hint: There is a true vowel sound before the [l], so it's not a syllabic sound.

    Exercise \(\PageIndex{11}\)

    The words human and manager both contain a syllable that is spelled with the letters ‘man’. In which word does that syllable contain a syllabic [n̩]?

    • Human.
    • Manager.


    Hint: The stressed syllable in human is the first one, making the second syllable unstressed with a sonorant as the nucleus, [n]. The same is not true for manager, as the [n] is an onset of a syllable.

    Exercise \(\PageIndex{12}\)

    In the word umbrella, is the [m] syllabic?

    • Yes.
    • No.


    Hint: There is a true vowel sound before the [m], so it's not a syllabic sound.

    2.6.5: All About Vowels, from Sarah Harmon


    Video Script

    Catherine Anderson's explanation of how to break down the syllable really mirrors what I do with mine, so realistically I’m not going to say much more. The only thing that is important to understand is that different languages will have different constraints as to what can be an onset, what can be a coda, and even what can be a nucleus. Certainly, the vowel is always a candidate as far as the nucleus is concerned, but having syllabic consonants will also factor in, depending on the language, maybe even the dialect.

    What is also important to bring up his nasalization and lengthening of vowels because different languages do different things. For example, lengthening of vowels means literally you were keeping it for an extra half to full beat; if you are a musician, you know what that means. Think of how, when you say 'sat' versus 'sad'. When you think of 'sat' versus 'sad', really the only difference is on the vowel, whether that is a long or short vowel: 'sat', very short; 'sad', very long. Same thing with 'bit' versus 'bid'. 'Bit' versus 'bid' is a difference of length. In English there are other reasons why those vowels are long. It is not a phonemic issue, meaning is not something that is going to matter if we say 'sat' versus 'saat'. They still both mean the same thing. But in other languages it does, and Breton is a really great example. It's a Celtic language spoken in Brittany—that part of northern France. You can see that there are plenty of cases where vowel length is a factor, that everything is exactly the same, except for the vowel is longer in one lexicon versus the other. The difference between the term for ‘white’ and the term for ‘bees’ is just how long that vowel is. The same thing is true with the term for ‘every’ or ‘all’ versus the term for ‘owl’. Breton does this, even though it has borrowed a significant number of terms from French, in particular—the word for ‘every’ or ‘all’ you could probably see that if you know, French or any other Romance language. But this lengthening of vowels is a very Celtic thing to do. Most Celtic languages do have some kind of vowel lengthening phenomenon going on, so this is pretty cool; we'll come back to more cases like this pretty soon.

    Nasalization is again something that we see in French, we see in Breton, we see in Portuguese, and we see it in a number of languages. The difference between an oral vowel and a nasal vowel means that those are two totally different terms. The word for ‘handsome’ in French versus ‘good’, the only difference is that vowel, whether it is nationalized or not: [bo] versus [bõ]. The word for ‘low’ versus the word for a wedding announcement: [ba], [bã]. I do my nasalization a little bit stronger because I speak some Portuguese, but French, it's right there and Breton, we see the same thing. Breton, by the way, probably got this from French.

    Both of these phenomena of lengthening and nasalization of vowels, we will see more of as we go through a lot of different phonological patterns, as well as when we start talking about the history of these dialects and these languages.


    2.6: All About Vowels is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by LibreTexts.