9: Morphology

Last updated
Save as PDF

Page ID: 81933

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

We saw in Chapter 8 that the wordform-centeredness of most corpora and corpus-access tools requires a certain degree of ingenuity when studying structures larger than the word. It does not pose particular problems for corpus-based morphology, which studies structures smaller than the word. Corpus morphology is mostly concerned with the distribution of affixes, and retrieving all occurrences of an affix plausibly starts with the retrieval of all strings potentially containing this affix. We could retrieve all occurrences of -ness, for example, with a query like \(\langle\text{[word=".+ness(es)?"%c]}\rangle\). The recall of this query will be close to 100 percent, as all words containing the suffix -ness end in the string ness, optionally followed by the string es in the case of plurals. Depending on the tokenization of the corpus, this query might miss cases where the word containing the suffix -ness is the first part of a hyphenated compound, such as usefulness-rating or consciousness-altering; we could alter the query to something like \(\langle\text{[word=".+ness(es)?(--.+=)?"%c]}\rangle\) if we believe that including these cases in our sample is crucial. The precision of such a query will not usually be 100 percent, as it will also retrieve words that accidentally happen to end with the string specified in our query – in the case of -ness, these would be words like witness, governess or place names like Inverness. The degree of precision will depend on how unique the string in our query is for the affix in question; for -ness and -ity it is fairly high, as there are only a few words that share the same string accidentally (examples like those just mentioned for -ness and words like city and pity for -ity), for a suffix like -ess (‘female animate entity’) it is quite low, as a query like \(\langle\text{[word=".+ess(es)?"%c]}\rangle\) will also retrieve all words with the suffixes -ness and -less, as well as many words whose stem ends in ess, like process, success, press, access, address, dress, guess and many more.

However, once we have extracted and – if necessary – manually cleaned up our data set, we are faced with a problem that does not present itself when studying lexis or grammar: the very fact that affixes do not occur independently but always as parts of words, some of which (like wordform-centeredness in the first sentence of this chapter) have been created productively on the fly for a specific purpose, while others (like ingenuity in the same sentence) are conventionalized lexical items that are listed in dictionaries, even though they are theoretically the9 Morphology result of attaching an affix to a known stem (like ingen-, also found in ingenious and, confusingly, its almost-antonym ingenuous). We have to keep the difference between these two kinds of words in mind when constructing morphological research designs; since the two kinds are not always clearly distinguishable, this is more difficult than it sounds. Also, the fact that affixes always occur as parts of words has consequences for the way we can, and should, count them; in quantitative corpus-linguistics, this is a crucial point, so I will discuss it in quite some detail before we turn to our case studies.