6.5: The Connectionist Approach to Musical Cognition
- Page ID
Connectionist research on musical cognition is perhaps not as established as classical research, but it has nonetheless produced a substantial and growing literature (Bharucha, 1999; Fiske, 2004; Griffith & Todd, 1999; Todd & Loy, 1991). The purpose of this section is to provide a very brief orientation to this research. As the section develops, the relationship of connectionist musical cognition to certain aspects of musical Romanticism is illustrated.
By the late 1980s, New Connectionism had begun to influence research on musical cognition. The effects of this spreading influence have been documented in two collections of research papers (Griffith & Todd, 1999; Todd & Loy, 1991). Connectionist musical cognition has been studied with a wide variety of network architectures, and covers a broad range of topics, most notably classifying pitch and tonality, assigning rhythm and metre, classifying and completing melodic structure, and composing new musical pieces (Griffith & Todd, 1999).
Why use neural networks to study musical cognition? Bharucha (1999) provided five reasons. First, artificial neural networks can account for the learning of musical patterns via environmental exposure. Second, the type of learning that they describe is biologically plausible. Third, they provide a natural and biologically plausible account of contextual effects and pattern completion during perception. Fourth, they are particularly well suited to modeling similarity-based regularities that are important in theories of musical cognition. Fifth, they can discover regularities (e.g., in musical styles) that can elude more formal analyses.
To begin our survey of connectionist musical cognition, let us consider the artificial neural network classifications of pitch, tonality, and harmony (Griffith & Todd, 1999; Purwins et al., 2008). A wide variety of such tasks have been successfully explored: artificial neural networks have been trained to classify chords (Laden & Keefe, 1989; Yaremchuk & Dawson, 2005; Yaremchuk & Dawson, 2008), assign notes to tonal schema similar to the structures proposed by Krumhansl (1990) (Leman, 1991; Scarborough, Miller, & Jones, 1989), model the effects of expectation on pitch perception and other aspects of musical perception (Bharucha, 1987; Bharucha & Todd, 1989), add harmony to melodies (Shibata, 1991), determine the musical key of a melody (Griffith, 1995), and detect the chord patterns in a composition (Gjerdingen, 1992).
Artificial neural networks are well suited for this wide range of pitch-related tasks because of their ability to exploit contextual information, which in turn permits them to deal with noisy inputs. For example, networks are capable of pattern completion, which is replacing information that is missing from imperfect input patterns. In musical cognition, one example of pattern completion is virtual pitch (Terhardt, Stoll, & Seewann, 1982a, 1982b), the perception of pitches that are missing their fundamental frequency.
Consider a sine wave whose frequency is \(f\). When we hear a musical sound, its pitch (i.e., its tonal height, or the note that we experience) is typically associated with this fundamental frequency (Helmholtz & Ellis, 1954; Seashore, 1967). The harmonics of this sine wave are other sine waves whose frequencies are integer multiples of \(f\) (i.e., \(2f\), \(3f\), \(4f\) and so on). The timbre of the sound (whether we can identify a tone as coming from, for example, a piano versus a clarinet) is a function of the amplitudes of the various harmonics that are also audible (Seashore, 1967).
Interestingly, when a complex sound is filtered so that its fundamental frequency is removed, our perception of its pitch is not affected (Fletcher, 1924). It is as if the presence of the other harmonics provides enough information for the auditory system to fill in the missing fundamental, so that the correct pitch is heard—a phenomenon Schumann exploited in Humoreske. Co-operative interactions amongst neurons that detect the remaining harmonics are likely responsible for this effect (Cedolin & Delgutte, 2010; Smith et al., 1978; Zatorre, 2005).
Artificial neural networks can easily model such co-operative processing and complete the missing fundamental. For instance, one important connectionist system is called a Hopfield network (Hopfield, 1982, 1984). It is an autoassociative network that has only one set of processing units, which are all interconnected. When a pattern of activity is presented to this type of network, signals spread rapidly to all of the processors, producing dynamic interactions that cause the network’s units to turn on or off over time. Eventually the network will stabilize in a least-energy state; dynamic changes in processor activities will come to a halt.
Hopfield networks can be used to model virtual pitch, because they complete the missing fundamental (Benuskova, 1994). In this network, each processor represents a sine wave of a particular frequency; if the processor is on, then this represents that the sine wave is present. If a subset of processors is activated to represent a stimulus that is a set of harmonics with a missing fundamental, then when the network stabilizes, the processor representing the missing fundamental will be also activated. Other kinds of self-organizing networks are also capable of completing the missing fundamental (Sano & Jenkins, 1989).
An artificial neural network’s ability to deal with noisy inputs allows it to cope with other domains of musical cognition as well, such as assigning rhythm and metre (Desain & Honing, 1989; Griffith & Todd, 1999). Classical models of this type of processing hierarchically assign a structure of beats to different levels of a piece, employing rules that take advantage of the fact that musical rhythm and metre are associated with integer values (e.g., as defined by time signatures, or in the definition of note durations such as whole notes, quarter notes, and so on) (Lerdahl & Jackendoff, 1983; Temperley, 2001). However, in the actual performance of a piece, beats will be noisy or imperfect, such that perfect integer ratios of beats will not occur (Gasser, Eck, & Port, 1999). Connectionist models can correct for this problem, much as networks can restore absent information such as the missing fundamental.
For example, one network for assigning rhythm and metre uses a system of oscillating processors, units that fire at a set frequency (Large & Kolen, 1994). One can imagine having available a large number of such oscillators, each representing a different frequency. While an oscillator’s frequency of activity is constant, its phase of activity can be shifted (e.g., to permit an oscillator to align itself with external beats of the same frequency). If the phases of these processors can also be affected by co-operative and competitive interactions between the processors themselves, then the phases of the various components of the system can become entrained. This permits the network to represent the metrical structure of a musical input, even if the actual input is noisy or imperfect. This notion can be elaborated in a self-organizing network that permits preferences for, or expectancies of, certain rhythmic patterns to determine the final representation that the network converges to (Gasser Eck, & Port, 1999).
The artificial neural network examples provided above illustrate another of Bharucha’s (1999) advantages of such models: biological plausibility. Many neural network models are attempts to simulate some aspects of neural accounts of auditory and musical perception. For instance, place theory is the proposal that musical pitch is represented by places of activity along the basilar membrane in the cochlea (Helmholtz & Ellis, 1954; von Bekesy, 1928). The implications of place theory can be explored by using it to inspire spatial representations of musical inputs to connectionist networks (Sano & Jenkins, 1989).
The link between connectionist accounts and biological accounts of musical cognition is not accidental, because both reflect reactions against common criticisms. Classical cognitive scientist Steven Pinker is a noted critic of connectionist cognitive science (Pinker, 2002; Pinker & Prince, 1988). Pinker (1997) has also been a leading proponent of massive modularity, which ascribes neural modules to most cognitive faculties—except for music. Pinker excluded music because he could not see any adaptive value for its natural selection: “As far as biological cause and effect are concerned, music is useless. It shows no signs of design for attaining a goal such as long life, grandchildren, or accurate perception and prediction of the world” (p. 528). The rise of modern research in the cognitive neuroscience of music (Cedolin & Delgutte, 2010; Peretz & Coltheart, 2003; Peretz & Zatorre, 2003; Purwins et al., 2008; Stewart et al., 2006; Warren, 2008) is a reaction against this classical position, and finds a natural ally in musical connectionism.
In the analogy laid out in the previous section, connectionism’s appeal to the brain was presented as an example of its Romanticism. Connectionist research on musical cognition reveals other Romanticist parallels. Like musical Romanticism, connectionism is positioned to capture regularities that are difficult to express in language or by using formal rules (Loy, 1991).
For example, human subjects can accurately classify short musical selections into different genres or styles in a remarkably short period of time, within a quarter of a second (Gjerdingen & Perrott, 2008). But it is difficult to see how one could provide a classical account of this ability because of the difficulty in formally defining a genre or style for a classical model. “It is not likely that musical styles can be isolated successfully by simple heuristics and introspection, nor can they be readily modeled as a rule-solving problem” (Loy, 1991, p. 31).
However, many different artificial neural networks have been developed to classify music using categories that seem to defy precise, formal definitions. These include networks that can classify musical patterns as belonging to the early works of Mozart (Gjerdingen, 1990); classify selections as belonging to different genres of Western music (Mostafa & Billor, 2009); detect patterns of movement between notes in segments of music (Gjerdingen, 1994) in a fashion similar to a model of apparent motion perception (Grossberg & Rudd, 1989, 1992); evaluate the affective aesthetics of a melody (Coutinho & Cangelosi, 2009; Katz, 1995); and even predict the possibility that a particular song has “hit potential” (Monterola et al., 2009).
Categories such as genre or hit potential are obviously vague. However, even identifying a stimulus as being a particular song or melody may also be difficult to define formally. This is because a melody can be transposed into different keys, performed by different instruments or voices, or even embellished by adding improvisational flourishes.
Again, melody recognition can be accomplished by artificial neural networks that map, for instance, transposed versions of the same musical segment onto a single output representation (Benuskova, 1995; Bharucha & Todd, 1989; Page, 1994; Stevens & Latimer, 1992). Neural network melody recognition has implications for other aspects of musical cognition, such as the representational format for musical memories. For instance, self-organizing networks can represent the hierarchical structure of a musical piece in an abstract enough fashion so that only the “gist” is encoded, permitting the same memory to be linked to multiple auditory variations (Large, Palmer, & Pollack, 1995). Auditory processing organizes information into separate streams (Bregman, 1990); neural networks can accomplish this for musical inputs by processing relationships amongst pitches (Grossberg, 1999).
The insights into musical representation that are being provided by artificial neural networks have important implications beyond musical cognition. There is now wide availability of music and multimedia materials in digital format. How can such material be classified and searched? Artificial neural networks are proving to be useful in addressing this problem, as well as for providing adaptive systems for selecting music, or generating musical playlists, based on a user’s mood or past preferences (Bugatti, Flammini, & Migliorati, 2002; Jun, Rho, & Hwang, 2010; Liu, Hsieh, & Tsai, 2010; Muñoz-Expósito et al., 2007).
Musical styles, or individual musical pieces, are difficult to precisely define, and therefore are problematic to incorporate into classical theories. “The fact that even mature theories of music are informal is strong evidence that the performer, the listener, and the composer do not operate principally as rule-based problem solvers” (Loy, 1991, p. 31). That artificial neural networks are capable of classifying music in terms of such vague categories indicates that “perhaps connectionism can show the way to techniques that do not have the liabilities of strictly formal systems” (p. 31). In other words, the flexibility and informality of connectionist systems allows them to cope with situations that may be beyond the capacity of classical models. Might not this advantage also apply to another aspect of musical cognition, composition?
Composition has in fact been one of the most successful applications of musical connectionism. A wide variety of composing networks have been developed. Networks have been developed to compose single-voiced melodies on the basis of learned musical structure (Mozer, 1991; Todd, 1989); to compose harmonized melodies or multiple-voice pieces (Adiloglu & Alpaslan, 2007; Bellgard & Tsang, 1994; Hoover & Stanley, 2009; Mozer, 1994); to learn jazz melodies and harmonies, and then to use this information to generate new melodies when presented with novel harmonies (Franklin, 2006); and to improvise by composing variations on learned melodies (Nagashima & Kawashima, 1997). The logic of network composition is that the relationship between successive notes in a melody, or between different notes played at the same time in a harmonized or multiple-voice piece, is not random, but is instead constrained by stylistic, melodic, and acoustic constraints (Kohonen et al., 1991; Lewis, 1991; Mozer, 1991, 1994). Networks are capable of learning such constraints and using them to predict, for example, what the next note should be in a new composition.
In keeping with musical Romanticism, however, composing networks are presumed to have internalized constraints that are difficult to formalize or to express in ordinary language. “Nonconnectionist algorithmic approaches in the computer arts have often met with the difficulty that ‘laws’ of art are characteristically fuzzy and ill-suited for algorithmic description” (Lewis, 1991, p. 212). Furthermore these “laws” are unlikely to be gleaned from analyzing the internal structure of a network, “since the hidden units typically compute some complicated, often uninterpretable function of their inputs” (Todd, 1989, p. 31). It is too early to label a composing network as an isolated genius, but it would appear that these networks are exploiting regularities that are in some sense sublime!
This particular parallel between musical Romanticism and connectionism, that both capture regularities that cannot be formalized, is apparent in another interesting characteristic of musical connectionism. The most popular algorithm for training artificial neural networks is the generalized delta rule (i.e., error backpropagation) (Chauvin & Rumelhart, 1995; Widrow & Lehr, 1990), and networks trained with this kind of supervised learning rule are the most likely to be found in the cognitive science literature. While self-organizing networks are present in this literature and have made important contributions to it (Amit, 1989; Carpenter & Grossberg, 1992; Grossberg, 1988; Kohonen, 1984, 2001), they are much less popular. However, this does not seem to be the case in musical connectionism.
For example, in the two collections that document advances in artificial neural network applications to musical cognition (Griffith & Todd, 1999; Todd & Loy, 1991), 23 papers describe new neural networks. Of these contributions, 9 involve supervised learning, while 14 describe unsupervised, self-organizing networks. This indicates a marked preference for unsupervised networks in this particular connectionist literature.
This preference is likely due to the view that supervised learning is not practical for musical cognition, either because many musical regularities can be acquired without feedback or supervision (Bharucha, 1991) or because for higher-level musical tasks the definition of the required feedback is impossible to formalize (Gjerdingen, 1989). “One wonders, for example, if anyone would be comfortable in claiming that one interpretation of a musical phrase is only 69 percent [as] true as another” (p. 67). This suggests that the musical Romanticism of connectionism is even reflected in its choice of network architectures.