Skip to main content
Social Sci LibreTexts

8.9: Language Families

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    8.8.1 Language Families, from Sarah Harmon

    Video Script

    Let's talk a little bit about those language families. When we talked about typology and historical linguistics, even throughout the entire course, I kept referring to these language families. So, what are they and what kinds of things do we know about what they do? It's really interesting to see how all of this plays out.

    This is a map of the world, and you will notice how many different colors there are all of those colors referred to major language families.

    A map of the different language families around the world.

    In a couple of cases, specifically the ones that you see listed out over here on the left—Indo-European, Sino-Tibetan, Afro-Asiatic and Altaic, we've broken them down a little bit more into specific languages. Even if you consider all of this list in the right side, look at the sheer number of colors that you have. It is unreal when we think about human migration; that stat that I ended in the previous section, with about 40% of the human population speaks Mandarin, Spanish, English, Bengali, Arabic, Portuguese, Russian, and Japanese. You see how widespread these languages are, this is an amazing graphic. It really speaks to the human existence, that we go all over the place, and as we go and create more isolation, those initial dialects that we speak eventually evolved into different languages. Then those languages are part of a language family; those language families have certain characteristics.

    Alright, let's get started.


    The language family I have probably brought up the most, and the one you're probably most familiar with, is the Indo-European language family.

    Indo-Euoprean Family Tree

    This map shows you where modern English is, and I'm just highlighting Spanish right here, but you get the point. This is pretty much most of the Indo-European languages, at least the major players. We believe that Proto-Indo-European was spoken around 3500 to 3000 BCE. It might be a little older than that; we honestly are starting to lean more towards 4000BCE. Even still, it is most probably the case that it took about 2000 years of migration to get to these major language families within Indo-European. We have Indian or Indic, Armenian, Iranian or Vedic, Germanic, Balto-Slavic, Albanian, Celtic, Hellenic, and Italic. Those are the main language families within Indo-European. It is interesting to note a few things

    • Albanian and Armenian are linguistic isolates with in the Indo-European family. To our best understanding they seemed they seem to be independent migrations that just self-isolated, and we don't know why.
    • Everything else shows a period of migration. For example, in the Indic group, we have Sanskrit as the main ancient language that you probably have heard of. When we talk about the modern languages of Hindi, Bengali, Gujarati, and Urdu, these are all Indic languages of some kind, and they have all come about, certainly in the Common Era.
    • Iranian or Vedic is also part of this subfamiliy. This is also called Aryan. We have Avestan, which is a sibling language to Old Persian/Old Farsi.
    • Balto-Slavic seem to have split up very early. I love it when Slavic speakers of some kind, say, “Lithuanian and Latvian looks totally different sounds totally different; it can't be the same family as Slavic.” Keep in mind that it split off very early; we do know that Latvian and Lithuanian in particular are very similar but have some very big differences.
    • Celtic seems to have been split up between Insular Celtic and Continental Celtic. Breton is part of Continental Celtic and Welsh seems to be pretty closely related to that, although it tends to split the difference. Irish and Scots Gaelic seem to have more in common with each other.
    • From the Hellenic branch, we only have Greek anymore, although there were other languages in this group.
    • The Italic group, which is what I brought up earlier with respect to Latin. Latin had sibling languages; Oscan and Umbrian are two of them. If you go into the history of the early Roman society, before the Empire and before the Republic; we're talking about the Roman kings, and then even before that.
    • Of course, the Germanic branch has a wide range as well.


    We cannot talk only about Indo-European languages. One of the other big groups that I’ve talked about quite often is the Sino-Tibetan family. Again we're talking about a giant enormous group. The bulk of the attention goes to Mandarin and the other languages of China. But there are so many, as you can see.

    Sino Tibetan Language Family

    One of my absolute favorite things to point out is that if you go from Old Chinese, you have four major branches, you have Middle Chinese, old Wu-Min, Old Xia, and Old Chu. This explains so much of what modern languages are in this part of the world. When you talk about Hakka, which is a language that is spoken throughout a lot of places, including Taiwan, parts of southern China and going into the diaspora in Indonesia, Singapore, Thailand, and Burma. That is a radically different language than even Mandarin or Cantonese. Cantonese, it should be pointed out, is very different than Mandarin, and Mandarin has had multiple versions; not just the dialect of Beijing, but you have Mandarin this book and in Taiwan, that is infected with indigenous Taiwanese you have the Mandarin spoken in the diaspora as well. Old Wu-Min, you have languages like Teochew with and Hokkien and Hainanese, these are very represented in again Indonesia, Singapore, and throughout Southeast Asia, as various peoples have left what we consider to be China and gone south or East and Taiwan. Hakka is over here, and Huizhou is over here as well. Don’t forget the Tibeto-Burman side; in fact, it is hypothesized that this language family got its start in the area known as Tibet, not what we consider to be the rest of China. It is actually in the mountains of Tibet that we believe this language family to have started. Of course, Tibetan is part of this branch, and while you probably have never heard of many of these languages before, but most of these that you see here, of course, Tibetan and Classical Tibetan is here. Burmese is right there; Arakanese is right there as well. You have a number of languages from Southeast Asia, but you'll notice that there are no languages from Vietnam, Thailand, Laos and these areas; those are coming up.

    Languages of the African Continent

    We talked a little bit earlier about the various African languages.

    African Languages Map

    We see the Afro-Asiatic family in the northern Saharan area. In Sub-Saharan Africa we see the Niger-Congo family, for the most part, Nilo-Saharan kind of in the middle. Whether Khosian is a part of Niger Congo or not, that is something that's still being debated; I'm not going to go into the debate, but suffice it to say that it seems to act very differently. The Bantu languages are part of this area; they are connected to the Khosian, but they're different. They're a Niger-Congo language, but they're different and this area is still being worked on.


    I brought up also many times the Afro Asiatic family.

    Afro Asiatic Language Family Tree

    One of the very big hallmarks of this family are those continental roots. For example, [ktb] is the root for 'to write' in both Arabic and Hebrew, and I believe Amharic also, and that the vowels get stuck in as part of the morphology—the derivation and the inflection is through the vowels. This family also includes ancient Egyptian; it's important to understand that it was an Afro-Asiatic language. Cushitic languages, now we have Oromo, but we had others as well, and they are historic languages of that North-East African quadrant. When we talk about the languages of a number of the southern part of the –Hausa, in particular, this is a Northwest African Saharan language. We have the Semitic languages of Akkadian, Hebrew, and Aramaic—if you are into a number of Old Jewish or Old Testament readings, you know that Aramaic is what was frequently spoken. Of course, Arabic is one of them, and it should be noted, there is an optional lesson just on Arabic languages—yes, I said Arabic languages, I didn't say Arabic, and there's a reason for that.


    I brought up many times Austronesian or Australio-Pacific; those terms are used interchangeably.

    Austronesian Language Family Map

    It is a very fascinating language family; this is the language family that has infixation and tons of duplication. It has so many amazing aspects to it. The only one that you do not see here is the language of the Rapa Nui, which is the indigenous name for Easter Island and that area kind of off the coast of Chile, although, to be honest, it’s well into the Pacific. What is absolutely fascinating is that this language family starts in Taiwan. When we think of Taiwan, we think of Mandarin for the most part. But the indigenous language is Austronesian or Australio-Pacific. It is from there from that little island that we get nearly all of the cultures and languages throughout the Pacific; there's very few places that that are not home to an Austronesian language. This fascinating group liked migrating, which they did spectacularly, and we know this from the similarity within each of these regions. For example, the Western Malaya Polynesian, we're talking the languages of the Philippines, Indonesia, and includes Malagasy in Madagascar. You have a lot of similarity here within the Oceanic languages, and then within the Indonesian and Papuan New Guinea regions; you have a couple other smaller groups, as well.

    Trans New Guinea/Trans-Guinean

    New Guinea, though, is not home just to Austronesian languages; it is also home to Trans New Guinea or the Trans-Guinean languages.

    Trans New Guinea Language Family Map

    At this point, we think it is a separate language family, that it is not Austronesian nor Australian; it is something in between. We do know that the languages of the northern portion of Australia are very similar to this to these languages. It is absolutely mind boggling to think that an entire archipelago has this many different languages, but it is the case. If you know a little bit about the geography of this region, you know how incredibly remote these places are. Not surprisingly, a group of folks would go to a region and be completely cut off from everybody else; it's the story of human existence in many ways.


    Within the Uralic family, I’ve talked frequently about the Finno-Ugric languages, but really it's part of the larger subset of our larger family called Uralic.

    Uralic Language Family Tree

    The bulk of the Uralic languages, though, are the Finno-Ugric group. When we're talking about Hungarian, Estonian, and Finnish, this is the Finno-Ugric family. We also have to include the Samoyed languages, which are spoken in the Arctic regions of both Scandinavia and Russia. This is a series of languages of indigenous languages that are very endangered. Estonian is technically considered endangered, although it is increasing in native speakers. Finnish and Hungarian are not endangered at this point, but pretty much everything else you see here, if it is still in existence, and that's a big if, it's highly endangered.


    As I said, there are some language families that are controversial, and Macro-Altaic is one of them.

    Macro Altaic Family Tree

    Altaic take itself is not controversial; we pretty clearly can trace the history of the Turkic languages. Turkish, of course, is one of them, and a number of the languages that are spoken in the western part of Asia. Mongolic languages, Mongolian is one of them, of course, and then there are others, like some of the languages of Western China, these are all related. We have them pretty well documented for at least 2000 years, if not longer in the case of Mongolian. They seem to take up the old Chinese writing systems fairly early on, modified them to what they wanted, and then went from there. In some cases, when Islam came into those areas, they started writing the languages down in the Arabic writing system. As such, we have a lot of old documentation.

    As I said, Altaic is not the problem; the problem is over here on the right, when we're talking about the native languages of Japan. Ainu is one of them; it's about the only one that is still in existence. All of the languages that existed of the peoples who lived in Japan before a group of Koreans decided to hop over the Korean Sea. Japanese and Korean are very closely related, and it all has to do with a group who got kicked out of the Korean court about 1200 years ago. They hopped over to what we now call Japan and became the group the Yamato people; they pretty much conquered the rest of the islands. Those three groups, the Korean groups—there are a couple other Korean languages, historically—the Japonica groups—that's Traditional Japanese, as well as Modern Japanese and Okinawan—and then languages that are Ainuic—Ainu and the other indigenous languages of Japan. We aren't sure of their origins. This was an area that Joseph Greenberg was interested in; he never really got to explore too much, not as much as the languages of Africa, the languages of the Americas and Basque. Many of us have been trying to figure out where these three groups come from. We know that the Japanese and Korean languages are pretty much the same group; we know that part. We suspect that the indigenous languages of Japan might have come out of Siberia, but we aren't sure. Was there a connection with the Uralic languages? Maybe. The closest seems to be with this Altaic group.

    But there are things that the Altaic languages do that these other groups don't, and so we don't really know yet. More work is being done on genetic analysis, as in DNA analysis, and combining it with historic analysis of these languages. In the case of Japanese and Korean, we have a long written history of over 2000 years; in fact, the Old Korean writing system is based off of an earlier Chinese writing system, so we have a pretty good length of documentation. But it's really difficult to decipher this one. If you were to look up Japanese or Korean or Ainu in a reputable linguistic database to find out what language family they are in, we don't have a good answer, and no two databases will be necessarily alike. As an historical linguist, I tend to say they are Macro-Altaic, knowing the controversy that is involved with that and understanding that I have many times also called them isolates. I will use both terms, because we are right now really not sure.

    Linguistic Supergroups

    Before we leave this topic of language families, it's incredibly important to talk about super groups. This is where we have mass migration over several thousands of years, all mixed up such that we see certain trends that exist in those areas and nowhere else. In fact, we've already talked about two of these groups: Southeast Asia and the Bantu languages.

    SouthEast Asia Supergroup Map

    When we talked about Southeast Asia, we talked about specifically the fact that they have contour tones and that nowhere else in the world do we see anything similar. Contour tones, we believe, may have originated in the Sino-Tibetan languages—remember we're talking about Tibet, which is kind of over here—and the Sino-Tibetan languages seem to have come out of that area and spread over what we now call China and what we call Burma, or Myanmar. We noticed that the various language families that we see in Southeast Asia—there are at least two, maybe three, depending on how you qualify it. They all have these contour tones…so where did this come from? Did this come from Sino-Tibetan, or did it come from these other languages and they got pulled into Sino-Tibetan? We're pretty sure it comes from Sino-Tibetan initially, and then moved south and east. But we aren't entirely sure. All of these languages are highly isolating; there is little to no inflection. Certainly, Sino-Tibetan languages have no inflection, but even the languages of Southeast Asia—we're talking about the various languages spoken in Vietnam, Cambodia and Laos, as well as northern Myanmar/northern Burma. There are some languages that have a couple of inflections, but for the most part there's none. Where did that come from? What is more, they are so isolating compared to any other language family on the planet. They also use sentence particles and numerical classifiers; we saw an example of classifiers in semantics with Mandarin. The extent to which those elements are used is staggering in this region.

    Bantoid Supergroup Map

    We talked about the Bantu languages, including some of the Khosian languages along with some of the Niger-Congo languages. We call them Bantoid for now because we're not entirely sure if they are a separate group. In some ways, they are similar; in some ways, they are different to Niger-Congo. This is where we have those register tones, those high-low tones. we have clicks in these areas. The Bantu area or Bantoid languages all have clicks, but the languages of Western Africa or the Banue area do not have clicks. Why is that the case? You have a high level duplication in these Bantu languages that you do not see in the other Niger-Congo languages or some of the other Khosian languages. You see nominal classes—we saw that with Swahili, in particular—that we do not see another Niger-Congo languages, and the sheer number of them is staggering at times. This is very clearly the Bantu area, a very clear linguistic supergroup it covers multiple language families.

    Balkansprache Supergroup Map

    There's a third supergroup, one that was a little closer to my world as far as an historical Romance linguistics, and it's the Balkan area. Frequently, we use the German term, Balkansprach, so ‘Balkan speak’, but it includes the languages of Hungarian, Romanian, Bulgarian, Greek, Serbo-Croatian—yes, I'm going to combine them—Bosnian, Slovenian, Slovakian, Czech, Austrian-German (not all German, but Austrian-German), and even includes a bit of Turkish ,although Turkish is an Altaic language and certainly has its own thing. There are certain aspects to Turkish that you don't see in the other Altaic languages, and they are aspects that are very similar to this other Balkan area. In these cases, we have a reduced case system; they all have a case system, but less so compared to their other languages in their language families. For example, Austrian-German has fewer cases then Standard German or Swiss German. Romanian actually has case, which is different than the other Romance languages, but the Slavic languages in this area have way fewer cases than, say, Russian, Ukrainian or Polish. Even when we compare Hungarian to Finnish or Estonian, there are way fewer case markings, they have a post article; instead of saying ‘the cat’, that is an inflection and it is a suffix: ‘cat-the’. They are also highly analytical languages; they're not very synthetic. We're going to have more isolation, but there is still some synthesis, some inflection, but there's just less of it. There are loanwords that you find in this region, but you don't find elsewhere; Romanian, a great example of that, as there are a ton of Slavic and Turkic words that are brought into that language that you don't see in any other Romance language. As I said, most of this has to do with migration, conquering, wars, and the like. One of the classic statements we say in Romance linguistics is, we really need that time machine. So, one of you, please, set up that time machine, so that we can find out what happened to Romanian. We have an almost 800-year period where there's no documentation whatsoever, from the time that the Romans left, which is in the third century CE (around 400) until about 1400; we have about 1000 years where we have very little documentation. We don't really know how Romanian came to be versus the other Romance languages. The same is true for all of these languages; we have documentation, but it's later in its history. If you look into the history of that region, the sheer number of wars, conquering, language suppression, all of it that happened in that region, well, it makes sense as to why this is a supergroup. The same can be said for the Bantoid area and for Southeast Asia.

    Therefore, when we talk about languages and language families, always there are patterns. It's just a matter of finding it.

    8.9: Language Families is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?