Skip to main content
Social Sci LibreTexts

2.3: Speech articulators

  • Page ID
    • Catherine Anderson, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi
    • eCampusOntario

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    Overview of the vocal tract

    Spoken language is articulated by manipulating parts of the body inside the vocal tract, such as the lips, tongue, and other parts of the mouth and throat. The vocal tract is often depicted in a midsagittal diagram, a special kind of diagram that represents the inside of the head as if it were split down the middle between the eyes. Midsagittal diagrams are conventionally oriented as in Figure 3.2, with the nostrils and lips on the left and the back of the head on the right, so that we are viewing the inside of the human head from its left side. The main regions and individual articulators of the vocal tract labelled in Figure \(\PageIndex{1}\) are defined and described in more detail in the rest of this section and the following sections.

    Midsagittal view of the vocal tract, facing left, with various body parts labelled.
    Figure \(\PageIndex{1}\): Midsagittal diagram of the human vocal tract.

    Open spaces in the vocal tract

    There are three main open regions of the vocal tract. The oral cavity is the main interior of the mouth, taking up space horizontally from the lips backward. The pharynx is behind the oral cavity and tongue, forming the upper part of what we normally think of as the throat. Finally, the nasal cavity is the open interior of the head above the oral cavity and pharynx, from the nostrils backward and down to the pharynx.

    The bottom of the pharynx splits into two tubes: the trachea (also known as the windpipe), which leads down to the lungs, and the esophagus, which leads down to the stomach. The esophagus is not normally relevant for phonetics, but the trachea is important, since the vast majority of spoken language is articulated with air coming from the lungs, and as discussed later in Section 3.3, there are ways we can manipulate that airflow when it passes from the trachea to the pharynx.

    Phones as a basic unit of speech

    The pieces of the vocal tract can be articulated in various ways to create and manipulate a wide range of sounds. In the phonetics of spoken languages, we are primarily interested in studying units of speech called phones or speech sounds. It is difficult to provide a precise definition of what a phone is, either in general or for a specific spoken language, but roughly speaking, a phone in a spoken language is a linguistically significant sound, which means that can be used as part of an ordinary word in that language. For example, the ordinary English words spill, slip, lisp, and lips each contain four phones; in fact, these words have the same four phones, just in different orders (with some slight variation in how they are pronounced; see Chapter 4 for more information).

    There are many other sounds we can produce with the vocal tract or even with other body parts, such as burps, snorts, finger snaps, etc., However, these are not typically studied in phonetics, because they are not known to be phones in any spoken language. However, even though they do not occur in ordinary words, they may still be used to express non-linguistic meaning. For example, in some cultures, snapping fingers can indicate quickness or a desire for attention.

    Note that spoken languages may differ in how they use phones and whether they even use the same phones at all. For example, English speakers may use clicking sounds to express disapproval (the soft teeth-sucking tsk-tsk click) or to urge a horse to go faster (the loud popping giddyup click), but they are not phones in English, because they are not used within ordinary words. However, these same sounds do occur as phones in some other languages, such as Hadza (a language isolate spoken in Tanzania; Sands et al. 1996) and isiZulu (a.k.a. Zulu, a Southern Bantu language of the Niger-Congo family, spoken in southern Africa; Poulos and Msimang 1998).

    We have to be careful about what kinds of words we look at to determine the phones of a language, because there are some marginal word-like expressions that can be used while speaking, but which may contain sounds that are not phones in the language. For example, the English word ugh is often pronounced with a rough gravelly sound that is otherwise not used in English, and we can say things like Kaoru noticed their car was making a glzzk-glzzk-glzzk sound, where glzzk is some impromptu sound produced to mimic the noise made by a vehicle in desperate need of repair.

    One of the most fundamental distinctions between phones is whether they are consonants or vowels. The next three sections address how consonants and vowels are articulated and how they are described and categorized in meaningful ways by linguists.

    Check your understanding

    Query \(\PageIndex{1}\)


    Poulos, George, and Christian T. Msimang. 1998. A linguistic analysis of Zulu. Pretoria: Via Afrika.

    Sands, Bonny, Ian Maddieson, and Peter Ladefoged. 1996. The phonetic structures of Hadza. Studies in African Linguistics 25(2): 171–204.

    How Humans Produce Language, in Anderson's Essentials of Linguistics


    Video Script

    The field of phonetics studies the sounds of human speech. When we study speech sounds we can consider them from two angles. Acoustic phonetics, in addition to being part of linguistics, is also a branch of physics. It’s concerned with the physical, acoustic properties of the sound waves that we produce. We’ll talk some about the acoustics of speech sounds, but we’re primarily interested in articulatory phonetics, that is, how we humans use our bodies to produce speech sounds. Producing speech needs three mechanisms.

    The first is a source of energy. Anything that makes a sound needs a source of energy. For human speech sounds, the air flowing from our lungs provides energy.

    The second is a source of the sound: air flowing from the lungs arrives at the larynx. Put your hand on the front of your throat and gently feel the bony part under your skin. That’s the front of your larynx. It’s not actually made of bone; it’s cartilage and muscle. This picture shows what the larynx looks like from the front.

    Larynx external
    By Olek Remesz (wiki-pl: Orem, commons: Orem) [CC BY-SA 2.5-2.0-1.0 (], via Wikimedia Commons

    This next picture is a view down a person’s throat.

    Cartilages of the Larynx
    By OpenStax College [CC BY 3.0 (], via Wikimedia Commons

    What you see here is that the opening of the larynx can be covered by two triangle-shaped pieces of skin. These are often called “vocal cords” but they’re not really like cords or strings. A better name for them is vocal folds.

    The opening between the vocal folds is called the glottis.

    We can control our vocal folds to make a sound. I want you to try this out so take a moment and close your door or make sure there’s no one around that you might disturb.

    First I want you to say the word “uh-oh”. Now say it again, but stop half-way through, “Uh-”. When you do that, you’ve closed your vocal folds by bringing them together. This stops the air flowing through your vocal tract. That little silence in the middle of “uh-oh” is called a glottal stop because the air is stopped completely when the vocal folds close off the glottis.

    Now I want you to open your mouth and breathe out quietly, “haaaaaaah”. When you do this, your vocal folds are open and the air is passing freely through the glottis.

    Now breathe out again and say “aaah”, as if the doctor is looking down your throat. To make that “aaaah” sound, you’re holding your vocal folds close together and vibrating them rapidly.

    When we speak, we make some sounds with vocal folds open, and some with vocal folds vibrating. Put your hand on the front of your larynx again and make a long “SSSSS” sound. Now switch and make a “ZZZZZ” sound. You can feel your larynx vibrate on “ZZZZZ” but not on “SSSSS”. That’s because [s] is a voiceless sound, made with the vocal folds held open, and [z] is a voiced sound, where we vibrate the vocal folds. Do it again and feel the difference between voiced and voiceless.

    Now take your hand off your larynx and plug your ears and make the two sounds again with your ears plugged. You can hear the difference between voiceless and voiced sounds inside your head.

    I said at the beginning that there are three crucial mechanisms involved in producing speech, and so far we’ve looked at only two:

    • Energy comes from the air supplied by the lungs.
    • The vocal folds produce sound at the larynx.
    • The sound is then filtered, or shaped, by the articulators.

    The oral cavity is the space in your mouth. The nasal cavity, obviously, is the space inside and behind your nose. And of course, we use our tongues, lips, teeth and jaws to articulate speech as well. In the next unit, we’ll look in more detail at how we use our articulators.

    So to sum up, the three mechanisms that we use to produce speech are:

    • respiration at the lungs,
    • phonation at the larynx, and
    • articulation in the mouth.

    Check Yourself

    Exercise \(\PageIndex{1}\)

    What is the voicing of the last sound in the word soup?

    • Voiced.
    • Voiceless.


    The reason: [p] is a voiceless sound. When we pronounce it on its own, our vocal cords don't vibrate.

    Exercise \(\PageIndex{2}\)

    What is the voicing of the last sound in the word life?

    • Voiced.
    • Voiceless.


    The reason: The last sound is [f], which is a voiceless sound. When we pronounce it on its own, our vocal cords don't vibrate.

    Exercise \(\PageIndex{3}\)

    What is the voicing of the last sound in the word seem?

    • Voiced.
    • Voiceless.


    The reason: The last sound is [m], which is a voiced sound. When we pronounce it on its own, our vocal cortds vibrate.

    Articulators, in Anderson's Essentials of Linguistics


    Video Script

    We know that humans produce speech by bringing air from the lungs through the larynx, where the vocal folds might or might not vibrate. That airflow is then shaped by the articulators.

    Places of articulation
    Created by User:ish shwar (original .png deleted), .svg by Rohieb [GFDL (, CC-BY-SA-3.0 ( via Wikimedia Commons

    This image is called a sagittal section. It depicts the inside of your head as if we sliced right between your eyes and down the middle of your nose and mouth. This angle gives us a good view of the parts of the vocal tract that are involved in filtering airflow to produce speech sounds.

    Let’s start at the front of your mouth, with your lips. If you make the sound “aaaaa” then round your lips, the sound of the vowel changes. We can also use our lips to block the flow of air completely, like in the consonants [b] and [p].

    We also use our teeth to shape airflow. They don’t do much on their own, but we can place the tip of the tongue between the teeth, for sounds like [θ] and [ð]. Or we can bring the top teeth down against the bottom lip for [f] and [v].

    If you put your finger in your mouth and tap the roof of your mouth, you’ll find that it’s bony. That is the hard palate. English doesn’t have very many palatal sounds, but we do raise the tongue towards the palate for the glide [j].

    Now from where you have your finger on the roof of your mouth, slide it forward towards your top teeth. Before you get to the teeth, you’ll find a ridge, which is called the alveolar ridge. If you use the tip of the tongue to block airflow at the alveolar ridge, you get the sounds [t] and [d]. We also produce [l] and [n] at the alveolar ridge, and some people also produce the sounds [s] and [z] with the tongue at the alveolar ridge (though there are other ways of making the [s] sound.)

    When we block airflow in the mouth but allow air to circulate through the nasal cavity, we get the nasal sounds [m] [n] and [ŋ].

    Some languages also have nasal vowels. Make an “aaaaa” vowel again, then make it nasal. [aaaaa] [ããããã]

    The articulator that you move to allow air into the nasal cavity is called the velum. You might also know it as the soft palate. For sounds made in the mouth, the velum rests against the back of the throat. But we can pull the velum away from the back of the throat and allow air into the nose. We can also block airflow by moving the body of the tongue up against the velum, to make the sounds [k] and [ɡ].

    Farther back than the velum are the uvula and the pharynx, but English doesn’t use these articulators in its set of speech sounds.

    Every different configuration of the articulators leads to a different acoustic output.

    Check Yourself

    Exercise \(\PageIndex{4}\)

    Which articulators are responsible for the first sound in the word minor?

    • Lips.
    • Lips and teeth.
    • Tongue and teeth.
    • Tongue and alveolar ridge.
    • Tongue and palate.
    • Tongue and velum.


    The reason: The first sound is [m], which is a bilabial sound. Only the lips are involved in that sound.

    Exercise \(\PageIndex{5}\)

    Which articulators are responsible for the final sound in the word wit?

    • Lips.
    • Lips and teeth.
    • Tongue and teeth.
    • Tongue and alveolar ridge.
    • Tongue and palate.
    • Tongue and velum.


    The reason: The first sound is [w], which is a bilabial sound. Only the lips are involved.

    Note: It's actually a labio-velar sound, but more on that soon.

    Exercise \(\PageIndex{6}\)

    Which articulators are responsible for the first sound in the word photography?

    • Lips.
    • Lips and teeth.
    • Tongue and teeth.
    • Tongue and alveolar ridge.
    • Tongue and palate.
    • Tongue and velum.

    "Lips and teeth"

    The reason: The first sound is [f], which is a labiodental sound. Both the bottom lip and the top teeth are involved.

    Articulators and Airstream Mechanisms, from Sarah Harmon


    Video Script

    There's only a little bit more than I’m going to add to Catherine Anderson's discussion on articulators and air stream mechanisms, not much more, because what she puts up there is really good really detailed and I can't do much better so I’ll let her take that wheel. There's only a couple of things I want to add.

    When we talk about the air stream mechanism, yes, most all sounds that are used for human language are pulmonic, meaning that the air comes from the lungs and then goes out. It is there, and it is pushed out from the lungs all the way. There are, however, some languages that do include sounds that are not pulmonic; they can be either glottalic or velaric. Let me explain what those are.

    Glottalic sounds you hear in a number of languages that are indigenous to South America; Quechua is one of them; the early version of it was spoken by the Inca Empire and continues to be spoken to this day. They have, for example, an egressive [t’]. So, take a regular [t] sound, and instead of that pulmonic [t] (the is pushed all the way from the lungs), if you are stopping the air also at the glottis and ejecting it out forcefully, that is what a glottalic sound is. It's an egressive, meaning the air flow coming out of the glottis. With the egressive, the air starts from the lungs, but it is stopped also at the glottis and then ejected it out, as I say, the [t’]. You can hear like a [t] but it's got more push behind it, and that is the sound. In theory, you could have an ingressive glottalic, but that doesn't really happen.

    Velaric sounds, at the velum, meaning the air is stopped at the velum, the back of the mouth, and then pushed out. We have two different kinds: you have ingressive and egressive. With the egressive, air goes out and the famous one is blowing a raspberry. That is a bilabial, ingressive, fricative, velaric. Ingressive velaric, now we're talking about the very famous clicks that we hear in the Bantu languages. The Bantu languages are spoken in central and especially southern Africa. Zulu is, for example, a Bantu language; !Xhosa is a Bantu language; there's a few others. Even when I said that second language name, !Xhosa, and you heard that [click] in front, that's a click. We use clicks all the time, but maybe not to communicate within a human language. Certainly, a lot of us in Europe and the Americas, for example, when we want to call our pet will go right that little kissy sound. That the click you're pulling in air is the second comes from the back of the mouth and it's a bilabial ingressive velaric. Some of us do a different kind of velaric sound when we are disapproving of something. [tsk tsk] The tip of your tongue is up against the back of your teeth and you're sucking in, pulling in from the back of your mouth of the velum. Sometimes, some of these clicks are associated when we're trying to talk or communicate to our pack animals, like horses or oxen or donkeys. Those are all clicks.

    When we're talking human language, there are some languages—and the Bantu family is one—that use clicks. It's really cool to see and hear. In the video below this one, you will see a song on YouTube. It's being sung by Miriam Makeba; she was a very famous South African singer. She is among a wide group of black artists from Africa, mostly South Africa. They're all speakers have a Bantu language and they're singing in those languages frequently. A lot of it came out in the jazz movements in the 60s and 70s; Miriam Makeba certainly was part of that. I love her music; her music is amazing and inspiring. Even if I cannot understand it most of the time, it's just the way she conveys sound. But what is really cool is she made a very common Bantu cultural song –I can't remember what exactly it's called in !Xhosa, but it is translated into French and English and etc, as “The Click Song.” It's a song about a beetle; she's going to explain more in this video. What is really cool, though, is she sings that in !Xhosa, and then the spelling of the !Xhosa words is there, and under it is the English. In the captioning you can watch and read and listen all at the same time.

    Click languages are really awesome; they're amazing and they are unique to the Bantu languages. There are no other language families that are recorded in human history that have the sounds. So where did they come from? That's the mystery and discussion for another time.

    "Qongqothwane" by Miriam Makeba, with captions in English and !Xhosa

    Unfortunately, the video that I originally had in this text was taken off of YouTube; there are other videos of the same performance, but the captions are not there. Fortunately, I found a different performance of the same song with the captions in both !Xhosa and English. However, the video quality is quite poor. 

    This page titled 2.3: Speech articulators is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Catherine Anderson, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi (eCampusOntario) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.