Skip to main content
Social Sci LibreTexts

13: Study Notes

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    Study notes to Chapter 1


    1. The British National Corpus (BNC) is available for download free of charge from the Oxford Text Archive at
    2. The Corpus of Contemporary American English (COCA) is commercially available from Mark Davies at Brigham-Young University, who also provides a free web interface at

    Further reading

    Although it may seem somewhat dated, one of the best discussions of what exactly “language” is or can be is Lyons (1981).

    Study notes to Chapter 2


    1. The Lancaster-Oslo-Bergen Corpus of Modern English (LOB) is available free of charge from the Oxford Text Archive at 0167.
    2. The British National Corpus, Baby edition (BNC-BABY) is available for download free of charge from the Oxford Text Archive at uk/ota/2553.
    3. The London-Lund Corpus of Spoken English is available free of charge from the Oxford Text Archive at
    4. The Susanne Corpus is available with some restrictions from the Oxford Text Archive at
    5. Parts of the Santa Barbara Corpus of Spoken American English (SBCSAE) are available for download and through a web interface at https: //
    6. The International Corpus of English, British Component is commercially available from; the components for some other varieties (Canada, East Africa, Hong Kong, India, Ireland, Jamaica, Phillipines, Singapore and USA) can be downloaded at that URL after written registration.
    7. The Brown University Standard Corpus of Present-Day American English (BROWN), the Freiburg-Brown Corpus of American English (FROWN), The Freiburg–LOB Corpus of British English (FLOB) and the WELLINGTON corpus are available to institutions participating in the CLARIN project at
    8. A version of the BROWN corpus can also be downloaded at http://www., but note that this is not the original version, and some texts are partially missing.

    Further reading

    Wynne (2005) is a brief but essential freely available introduction to all aspects of corpus development, including issues of annotation; Xiao (2008) is a compact overview of well-known English corpora.

    Study notes to Chapter 3


    Further reading

    A readable exposition of Popper’s ideas about falsification is his essay “Science as falsification”, included in the collection Conjectures and Refutations (Popper 1963). A discussion of the role of operationalization in the context of corpus-based semantics is found in Stefanowitsch (2010), Wulff (2003) is a study of adjective order in English that operationalizes a variety of linguistic constructs in an exemplary and very transparent way. Zaenen et al. (2004) is an example of a detailed and extensive coding scheme for animacy.

    Study notes to Chapter 4


    1. The ICE-GB sample corpus is available at
    2. The IMS Open Corpus Work Bench (CWB) is a available for download free of charge at, it can be installed under all unix-like operating systems (including Linux and Mac OS X).
    3. The NoSketch Engine is available for download free of charge at https: // for Linux.
    4. The Tree Tagger is available for download at for Linux, Mac OS X and Windows.

    Further reading

    No matter what corpora and concordancing software you work with, you will need regular expressions at some point. Information is easy to find online, I recommend the Wikipedia Page as a starting point (Wikipedia contributors 2018). An excellent introduction to issues involved in annotating corpora is found in Geoffrey Leech’s contribution “Adding linguistic annotation” in Wynne (2005). An insightful case study on working with texts in non-standardized orthographies is found in Barnbrook (1996) (which is by now seriously dated in many respects, but still a worthwhile read).

    Study notes to Chapter 5

    (see Study notes to Chapter 6)

    Study notes to Chapter 6


    1. A comprehensive and well-maintained statistical software package is R, available for download free of charge from for Linux, Mac OS X, Windows.
    2. Especially if you are using Linux or Windows, I also recommend you download R Studio (also free of charge), which provides an advanced user interface to R,

    Further reading

    Anyone serious about using statistics in their research should start with a basic introduction to statistics, and then proceed to an introduction of more advanced methods, preferably one that introduces a statistical software package at the same time. For the first step, I recommend Butler (1985), a very solid introduction to statistical concepts and pitfalls specifically aimed at linguists. It is out of print, but the author made it available for free at 1748/ For the second step, I recommend Gries (2013) as a package deal geared specifically towards linguistic research questions, but I also encourage you to explore the wide range of free or commercially available books introducing statistics with R.

    Study notes to Chapter 7

    If you want to learn more about association measures, Evert (2005) and the companion website at are very comprehensive and relatively accessible places to start. Stefanowitsch & Flach (2016) discuss corpusbased association measures in the context of psycholinguistics.

    Study notes to Chapter 8


    The Corpus of Late Modern English Texts (CLMET v3.1) is available for download free of charge at https://fedora.clarin-d.uni-saarland...met/clmet.html.

    Further reading

    Grammar is a complex phenomenon investigated from very different perspectives. This makes general suggestions for further reading difficult. It may be best to start with collections focusing on the corpus-based analysis of grammar, such as Rohdenburg & Mondorf (2003), Gries & Stefanowitsch (2006), Rohdenburg & Schlüter (2009) or Lindquist & Mair (2004).

    Study notes to Chapter 9

    Further reading

    For a different proposal of how to evaluate TTRs statistically, see Baayen (2008: Section 6.5); for a very interesting method of comparison for TTRs and HTRs based on permutation testing instead of classical inferential statistics see Säily & Suomela (2009).

    Study notes to Chapter 10


    1. The Corpus of Historical American English (COHA) is commercially available from Mark Davies at Brigham-Young University, who also provides a free web interface at
    2. The \(n\)-gram data from the Google Books archive is available for download free of charge at datasetsv2.html (note that the files are extremely large).

    Further reading

    This chapter has focused on very simple aspects of variation across text types and a very simple notion of “text type”. Biber (1988) and Biber (1989) are good starting points for a more comprehensive corpus-based perspective on text types. As seen in some of the case studies in this chapter, text is frequently a proxy for demographic properties of the speakers who have produced it, making corpus linguistics a variant of sociolinguistics, see further Baker (2010a).

    Study notes to Chapter 11

    Further reading

    Deignan (2005) is a comprehensive attempt to apply corpus-linguistic methods to a range of theoretically informed research questions concerning metaphor. The contributions in Stefanowitsch & Gries (2006) demonstrate a range of methodological approaches by many leading researchers applying corpus methods to the investigation of metaphor.

      This page titled 13: Study Notes is shared under a CC BY-SA license and was authored, remixed, and/or curated by Anatol Stefanowitsch (Language Science Press) .

      • Was this article helpful?