6.3: The Classical Approach to Musical Cognition

Last updated
Save as PDF

Page ID: 21237

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In Chapter 8 on seeing and visualizing, we see that classical theories take the purpose of visual perception to be the construction of mental models of the external, visual world. To do so, these theories must deal with the problem of underdetermination. Information in the world is not sufficient, on its own, to completely determine visual experience.

Classical solutions to the problem of underdetermination (Bruner, 1973; Gregory, 1970, 1978; Rock, 1983) propose that knowledge of the world—the contents of mental representations—is also used to determine visual experience. In other words, classical theories of perception describe visual experience as arising from the interaction of stimulus information with internal representations. Seeing is a kind of thinking.

Auditory perception has also been the subject of classical theorization. Classical theories of auditory perception parallel classical theories of visual perception in two general respects. First, since the earliest psychophysical studies of audition (Helmholtz & Ellis, 1954), hearing has been viewed as a process for building internal representations of the external world.

We have to investigate the various modes in which the nerves themselves are excited, giving rise to their various sensations, and finally the laws according to which these sensations result in mental images of determinate external objects, that is, in perceptions. (Helmholtz & Ellis, 1954, p. 4)

Second, in classical theories of hearing, physical stimulation does not by itself determine the nature of auditory percepts. Auditory stimuli are actively organized, being grouped into distinct auditory streams, according to psychological principles of organization (Bregman, 1990). “When listeners create a mental representation of auditory input, they too must employ rules about what goes with what” (p. 11).

The existence of classical theories of auditory perception, combined with the links between classical music and classical cognitive science discussed in the previous section, should make it quite unsurprising that classical theories of music perception and cognition are well represented in the literature (Deutsch, 1999; Francès, 1988; Howell, Cross, & West, 1985; Krumhansl, 1990; Lerdahl, 2001; Lerdahl & Jackendoff, 1983; Sloboda, 1985; Snyder, 2000; Temperley, 2001). This section provides some brief examples of the classical approach to musical cognition. These examples illustrate that the previously described links between classical music and cognitive science are reflected in the manner in which musical cognition is studied.

The classical approach to musical cognition assumes that listeners construct mental representations of music. Sloboda (1985) argued that,

a person may understand the music he hears without being moved by it. If he is moved by it then he must have passed through the cognitive stage, which involves forming an abstract or symbolic internal representation of the music. (Sloboda, 1985, p. 3)

Similarly, “a piece of music is a mentally constructed entity, of which scores and performances are partial representations by which the piece is transmitted” (Lerdahl & Jackendoff, 1983, p. 2). A classical theory must provide an account of such mentally constructed entities. How are they represented? What processes are required to create and manipulate them?

There is a long history of attempting to use geometric relations to map the relationships between musical pitches, so that similar pitches are nearer to one another in the map (Krumhansl, 2005). Krumhansl (1990) has shown how simple judgments about tones can be used to derive a spatial, cognitive representation of musical elements.

Krumhansl’s general paradigm is called the tone probe method (Krumhansl & Shepard, 1979). In this paradigm, a musical context is established, for instance by playing a partial scale or a chord. A probe note is then played, and subjects rate how well this probe note fits into the context. For instance, subjects might rate how well the probe note serves to complete a partial scale. The relatedness between pairs of tones within a musical context can also be measured using variations of this paradigm.

Extensive use of the probe tone method has revealed a hierarchical organization of musical notes. Within a given musical context—a particular musical key—the most stable tone is the tonic, the root of the key. For example, in the musical key of C major, the note C is the most stable. The next most stable tones are those in either the third or fifth positions of the key’s scale. In the key of C major, these are the notes E or G. Less stable than these two notes are any of the set of remaining notes that belong to the context’s scale. In the context of C major, these are the notes D, F, A, and B. Finally, the least stable tones are the set of five notes that do not belong to the context’s scale. For C major, these are the notes C#, D#, F#, G#, and A#.

This hierarchical pattern of stabilities is revealed using different kinds of contexts (e.g., partial scales, chords), and is found in subjects with widely varying degrees of musical expertise (Krumhansl, 1990). It can also be used to account for judgments about the consonance or dissonance of tones, which is one of the oldest topics in the psychology of music (Helmholtz & Ellis, 1954).

Hierarchical tonal stability relationships can also be used to quantify relationships between different musical keys. If two different keys are similar to one another, then their tonal hierarchies should be similar as well. The correlations between tonal hierarchies were calculated for every possible pair of the 12 different major and 12 different minor musical keys, and then multidimensional scaling was performed on the resulting similarity data (Krumhansl & Kessler, 1982). A fourdimensional solution was found to provide the best fit for the data. This solution arranged the tonic notes along a spiral that wrapped itself around a toroidal surface. The spiral represents two circles of fifths, one for the 12 major scales and the other for the 12 minor scales.

The spiral arrangement of notes around the torus reflects elegant spatial relationships among tonic notes (Krumhansl, 1990; Krumhansl & Kessler, 1982). For any key, the nearest neighbours moving around from the inside to the outside of the torus are the neighbouring keys in the circle of fifths. For instance, the nearest neighbours to C in this direction are the notes F and G, which are on either side of C in the circle of fifths.

In addition, the nearest neighbor to a note in the direction along the torus (i.e., orthogonal to the direction that captures the circles of fifths) reflects relationships between major and minor keys. Every major key has a complementary minor key, and vice versa; complimentary keys have the same key signature, and are musically very similar. Complimentary keys are close together on the torus. For example, the key of C major has the key of A minor as its compliment; the tonic notes for these two scales are also close together on the toroidal map.

Krumhansl’s (1990) tonal hierarchy is a classical representation in two senses. First, the toroidal map derived from tonal hierarchies provides one of the many examples of spatial representations that have been used to model regularities in perception (Shepard, 1984a), reasoning (Sternberg, 1977), and language (Tourange au & Sternberg, 1981, 1982). Second, a tonal hierarchy is not a musical property per se, but instead is a psychologically imposed organization of musical elements. “The experience of music goes beyond registering the acoustic parameters of tone frequency, amplitude, duration, and timbre. Presumably, these are recoded, organized, and stored in memory in a form different from sensory codes” (Krumhansl, 1990, p. 281). The tonal hierarchy is one such mental organization of musical tones.

In music, tones are not the only elements that appear to be organized by psychological hierarchies. “When hearing a piece, the listener naturally organizes the sound signals into units such as motives, themes, phrases, periods, theme-groups, and the piece itself” (Lerdahl & Jackendoff, 1983, p. 12). In their classic work A Generative Theory of Tonal Music, Lerdahl and Jackendoff (1983) developed a classical model of how such a hierarchical organization is derived.

Lerdahl and Jackendoff’s (1983) research program was inspired by Leonard Bernstein’s (1976) Charles Eliot Norton lectures at Harvard, in which Bernstein called for the methods of Chomskyan linguistics to be applied to music. “All musical thinkers agree that there is such a thing as a musical syntax, comparable to a descriptive grammar of speech” (p. 56). There are indeed important parallels between language and music that support developing a generative grammar of music (Jackendoff, 2009). In particular, systems for both language and music must be capable of dealing with novel stimuli, which classical researchers argue requires the use of recursive rules. However, there are important differences too. Most notable for Jackendoff (2009) is that language conveys propositional thought, while music does not. This means that while a linguistic analysis can ultimately be evaluated as being true or false, the same cannot be said for a musical analysis, which has important implications for a grammatical model of music.

Lerdahl and Jackendoff’s (1983) generative theory of tonal music correspondingly has components that are closely analogous to a generative grammar for language and other components that are not. The linguistic analogs assign structural descriptions to a musical piece. These structural descriptions involve four different, but interrelated, hierarchies.

The first is grouping structure, which hierarchically organizes a piece into motives, phrases, and sections. The second is metrical structure, which relates the events of a piece to hierarchically organized alternations of strong and weak beats. The third is time-span reduction, which assigns pitches to a hierarchy of structural importance that is related to grouping and metrical structures. The fourth is prolongational reduction, which is a hierarchy that “expresses harmonic and melodic tension and relaxation, continuity and progression” (Lerdahl & Jackendoff, 1983, p. 9). Prolongational reduction was inspired by Schenkerian musical analysis (Schenker, 1979), and is represented in a fashion that is very similar to a phrase marker. As a result, it is the component of the generative theory of tonal music that is most closely related to a generative syntax of language (Jackendoff, 2009).

Each of the four hierarchies is associated with a set of well-formedness rules (Lerdahl & Jackendoff, 1983). These rules describe how the different hierarchies are constructed, and they also impose constraints that prevent certain structures from being created. Importantly, the well-formedness rules provide psychological principles for organizing musical stimuli, as one would expect in a classical theory. The rules “define a class of grouping structures that can be associated with a sequence of pitch-events, but which are not specified in any direct way by the physical signal (as pitches and durations are)” (p. 39). Lerdahl and Jackendoff take care to express these rules in plain English so as not to obscure their theory. However, they presume that the well-formedness rules could be translated into a more formal notation, and indeed computer implementations of their theory are possible (Hamanaka, Hirata, & Tojo, 2006).

Lerdahl and Jackendoff’s (1983) well-formedness rules are not sufficient to deliver a unique “parsing” of a musical piece. One reason for this is because, unlike language, a musical parsing cannot be deemed to be correct; it can only be described as having a certain degree of coherence or preferredness. Lerdahl and Jackendoff supplement their well-formedness rules with a set of preference rules. For instance, one preference rule for grouping structure indicates that symmetric groups are to be preferred over asymmetric ones. Once again there is a different set of preference rules for each of the four hierarchies of musical structure.

The hierarchical structures defined by the generative theory of tonal music (Lerdahl & Jackendoff, 1983) describe the properties of a particular musical event. In contrast, the hierarchical arrangement of musical tones (Krumhansl, 1990) is a general organizational principle that applies to musical pitches in general, not to an event. Interestingly, the two types of hierarchies are not mutually exclusive. The generative theory of tonal music has been extended (Lerdahl, 2001) to include tonal pitch spaces, which are spatial representations of tones and chords in which the distance between two entities in the space reflects the cognitive distance between them. Lerdahl has shown that the properties of tonal pitch space can be used to aid in the construction of the time-span reduction and the prolongational reduction, increasing the power of the original generative theory. The theory can be used to predict listeners’ judgments about the attraction and tension between tones in a musical selection (Lerdahl & Krumhansl, 2007).

Lerdahl and Jackendoff’s (1983) generative theory of tonal music shares another characteristic with the linguistic theories that inspired it: it provides an account of musical competence, and it is less concerned with algorithmic accounts of music perception. The goal of their theory is to provide a “formal description of the musical intuitions of a listener who is experienced in a musical idiom” (p. 1). Musical intuition is the largely unconscious knowledge that a listener uses to organize, identify, and comprehend musical stimuli. Because characterizing such knowledge is the goal of the theory, other processing is ignored.

Instead of describing the listener’s real-time mental processes, we will be concerned only with the final state of his understanding. In our view it would be fruitless to theorize about mental processing before understanding the organization to which the processing leads. (Lerdahl & Jackendoff, 1983, pp. 3–4)

One consequence of ignoring mental processing is that the generative theory of tonal music is generally not applied to psychologically plausible representations. For instance, in spite of being a theory about an experienced listener, the various incarnations of the theory are not applied to auditory stimuli, but are instead applied to musical scores (Hamanaka, Hirata, & Tojo, 2006; Lerdahl, 2001; Lerdahl & Jackendoff, 1983).

Of course, this is not a principled limitation of the generative theory of tonal music. This theory has inspired researchers to develop models that have a more algorithmic emphasis and operate on representations that take steps towards psychological plausibility (Temperley, 2001).

Temperley’s (2001) theory can be described as a variant of the original generative theory of tonal music (Lerdahl & Jackendoff, 1983). One key difference between the two is the input representation. Temperley employs a piano-roll representation, which can be described as being a two-dimensional graph of musical input. The vertical axis, or pitch axis, is a discrete representation of different musical notes. That is, each row in the vertical axis can be associated with its own piano key. The horizontal axis is a continuous representation of time. When a note is played, a horizontal line is drawn on the piano-roll representation; the height of the line indicates which note is being played. The beginning of the line represents the note’s onset, the length of the line represents the note’s duration, and the end of the line represents the note’s offset. Temperley assumes the psychological reality of the piano-roll representation, although he admits that the evidence for this strong assumption is inconclusive.

Temperley’s (2001) model applies a variety of preference rules to accomplish the hierarchical organization of different aspects of a musical piece presented as a piano-roll representation. He provides different preference rule systems for assigning metrical structure, melodic phrase structure, contrapuntal structure, pitch class representation, harmonic structure, and key structure. In many respects, these preference rule systems represent an evolution of the well-formedness and preference rules in Lerdahl and Jackendoff’s (1983) theory.

For example, one of Temperley’s (2001) preference rule systems assigns metrical structure (i.e., hierarchically organized sets of beats) to a musical piece. Lerdahl and Jackendoff (1983) accomplished this by applying four different well-formedness rules and ten different preference rules. Temperley accepts two of Lerdahl and Jackendoff’s well-formedness rules for metre (albeit in revised form, as preference rules) and rejects two others because they do not apply to the more realistic representation that Temperley adopts. Temperley adds three other preference rules. This system of five preference rules derives metric structure to a high degree of accuracy (i.e., corresponding to a degree of 86 percent or better with Temperley’s metric intuitions).

One further difference between Temperley’s (2001) algorithmic emphasis and Lerdahl and Jackendoff’s (1983) emphasis on competence is reflected in how the theory is refined. Because Temperley’s model is realized as a working computer model, he could easily examine its performance on a variety of input pieces and therefore identify its potential weaknesses. He took advantage of this ability to propose an additional set of four preference rules for meter, as an example, to extend the applicability of his algorithm to a broader range of input materials.

To this point, the brief examples provided in this section have been used to illustrate two of the key assumptions made by classical researchers of musical cognition. First, mental representations are used to impose an organization on music that is not physically present in musical stimuli. Second, these representations are classical in nature: they involve different kinds of rules (e.g., preference rules, wellformedness rules) that can be applied to symbolic media that have musical contents (e.g., spatial maps, musical scores, piano-roll representations). A third characteristic also is frequently present in classical theories of musical cognition: the notion that the musical knowledge reflected in these representations is acquired, or can be modified, by experience.

The plasticity of musical knowledge is neither a new idea nor a concept that is exclusively classical. We saw earlier that composers wished to inform their audience about compositional conventions so the latter could better appreciate performances (Copland, 1939). More modern examples of this approach argue that ear training, specialized to deal with some of the complexities of modern music to be introduced later in this chapter, can help to bridge the gaps between composers, performers, and audiences (Friedmann, 1990). Individual differences in musical ability were thought to be a combination of innate and learned information long before the cognitive revolution occurred (Seashore, 1967): “The ear, like the eye, is an instrument, and mental development in music consists in the acquisition of skills and the enrichment of experience through this channel” (p. 3).

The classical approach views the acquisition of musical skills in terms of changes in mental representations. “We learn the structures that we use to represent music” (Sloboda, 1985, p. 6). Krumhansl (1990, p. 286) noted that the robust hierarchies of tonal stability revealed in her research reflect stylistic regularities in Western tonal music. From this she suggests that “it seems probable, then, that abstract tonal and harmonic relations are learned through internalizing distributional properties characteristic of the style.” This view is analogous to those classical theories of perception that propose that the structure of internal representations imposes constraints on visual transformations that mirror the constraints imposed by the physics of the external world (Shepard, 1984b).

Krumhansl’s (1990) internalization hypothesis is one of many classical accounts that have descended from Leonard Meyer’s account of musical meaning arising from emotions manipulated by expectation (Meyer, 1956). “Styles in music are basically complex systems of probability relationships” (p. 54). Indeed, a tremendous variety of musical characteristics can be captured by applying Bayesian models, including rhythm and metre, pitch and melody, and musical style (Temperley, 2007). A great deal of evidence also suggests that expectations about what is to come next are critical determinants of human music perception (Huron, 2006). Temperley argues that classical models of music perception (Lerdahl, 2001; Lerdahl & Jackendoff, 1983; Temperley, 2001) make explicit these probabilistic relationships. “Listeners’ generative models are tuned to reflect the statistical properties of the music that they encounter” (Temperley, 2007, p. 207).

It was earlier argued that there are distinct parallels between Austro-German classical music and the classical approach to cognitive science. One of the most compelling is that both appeal to abstract, formal structures. It would appear that the classical approach to musical cognition takes this parallel very literally. That is, the representational systems proposed by classical researchers of musical cognition internalize the formal properties of music, and in turn they impose this formal structure on sounds during the perception of music.