13: Study Notes
selected template will load here
This action is not available.
Although it may seem somewhat dated, one of the best discussions of what exactly “language” is or can be is Lyons (1981).
Wynne (2005) is a brief but essential freely available introduction to all aspects of corpus development, including issues of annotation; Xiao (2008) is a compact overview of well-known English corpora.
A readable exposition of Popper’s ideas about falsification is his essay “Science as falsification”, included in the collection Conjectures and Refutations (Popper 1963). A discussion of the role of operationalization in the context of corpus-based semantics is found in Stefanowitsch (2010), Wulff (2003) is a study of adjective order in English that operationalizes a variety of linguistic constructs in an exemplary and very transparent way. Zaenen et al. (2004) is an example of a detailed and extensive coding scheme for animacy.
No matter what corpora and concordancing software you work with, you will need regular expressions at some point. Information is easy to find online, I recommend the Wikipedia Page as a starting point (Wikipedia contributors 2018). An excellent introduction to issues involved in annotating corpora is found in Geoffrey Leech’s contribution “Adding linguistic annotation” in Wynne (2005). An insightful case study on working with texts in non-standardized orthographies is found in Barnbrook (1996) (which is by now seriously dated in many respects, but still a worthwhile read).
(see Study notes to Chapter 6)
Anyone serious about using statistics in their research should start with a basic introduction to statistics, and then proceed to an introduction of more advanced methods, preferably one that introduces a statistical software package at the same time. For the first step, I recommend Butler (1985), a very solid introduction to statistical concepts and pitfalls specifically aimed at linguists. It is out of print, but the author made it available for free at https://web.archive.org/web/2006052306 1748/ http://uwe.ac.uk/hlss/llas/statistic.../bkindex.shtml . For the second step, I recommend Gries (2013) as a package deal geared specifically towards linguistic research questions, but I also encourage you to explore the wide range of free or commercially available books introducing statistics with R.
If you want to learn more about association measures, Evert (2005) and the companion website at http://www.collocations.de/AM/ are very comprehensive and relatively accessible places to start. Stefanowitsch & Flach (2016) discuss corpusbased association measures in the context of psycholinguistics.
The Corpus of Late Modern English Texts (CLMET v3.1) is available for download free of charge at https://fedora.clarin-d.uni-saarland...met/clmet.html .
Grammar is a complex phenomenon investigated from very different perspectives. This makes general suggestions for further reading difficult. It may be best to start with collections focusing on the corpus-based analysis of grammar, such as Rohdenburg & Mondorf (2003), Gries & Stefanowitsch (2006), Rohdenburg & Schlüter (2009) or Lindquist & Mair (2004).
For a different proposal of how to evaluate TTRs statistically , see Baayen (2008: Section 6.5); for a very interesting method of comparison for TTRs and HTRs based on permutation testing instead of classical inferential statistics see Säily & Suomela (2009).
This chapter has focused on very simple aspects of variation across text types and a very simple notion of “text type”. Biber (1988) and Biber (1989) are good starting points for a more comprehensive corpus-based perspective on text types. As seen in some of the case studies in this chapter, text is frequently a proxy for demographic properties of the speakers who have produced it, making corpus linguistics a variant of sociolinguistics, see further Baker (2010a).
Deignan (2005) is a comprehensive attempt to apply corpus-linguistic methods to a range of theoretically informed research questions concerning metaphor. The contributions in Stefanowitsch & Gries (2006) demonstrate a range of methodological approaches by many leading researchers applying corpus methods to the investigation of metaphor.