The notion of representation in classical cognitive science is tightly linked to the structure/process distinction that is itself inspired by the digital computer. An explicit set of rules is proposed to operate on a set of symbols that permits its components to be identified, digitally, as tokens that belong to particular symbol types.
In contrast, artificial neural networks dispense (at first glance) with the sharp distinction between structure and process that characterizes classical cognitive science. Instead, networks themselves take the form of dynamic symbols that represent information at the same time as they transform it. The dynamic, distributed nature of artificial neural networks appears to make them more likely to be explained using statistical mechanics than using propositional logic.
One of the putative advantages of connectionist cognitive science is that it can inspire alternative notions of representation. The blurring of the structure/process distinction, the seemingly amorphous nature of the internal structure that characterizes many multilayer networks, leads to one such proposal, called coarse coding.
A coarse code is one in which an individual unit is very broadly tuned, sensitive to either a wide range of features or at least to a wide range of values for an individual feature (Churchland & Sejnowski, 1992; Hinton, McClelland, & Rumelhart, 1986). In other words, individual processors are themselves very inaccurate devices for measuring or detecting a feature. The accurate representation of a feature can become possible, though, by pooling or combining the responses of many such inaccurate detectors, particularly if their perspectives are slightly different (e.g., if they are sensitive to different ranges of features, or if they detect features from different input locations).
A familiar example of coarse coding is provided by the nineteenth trichromatic theory of colour perception (Helmholtz, 1968; Wasserman, 1978). According to this theory, colour perception is mediated by three types of retinal cone receptors. One is maximally sensitive to short (blue) wavelengths of light, another is maximally sensitive to medium (green) wavelengths, and the third is maximally sensitive to long (red) wavelengths. Thus none of these types of receptors are capable of representing, by themselves, the rich rainbow of perceptible hues.
However, these receptors are broadly tuned and have overlapping sensitivities. As a result, most light will activate all three channels simultaneously, but to different degrees. Actual colored light does not produce sensations of absolutely pure color; that red, for instance, even when completely freed from all admixture of white light, still does not excite those nervous fibers which alone are sensitive to impressions of red, but also, to a very slight degree, those which are sensitive to green, and perhaps to a still smaller extent those which are sensitive to violet rays. (Helmholtz, 1968, p. 97)
The pooling of different activities of the three channels permits a much greater variety of colours to be represented and perceived.
We have already seen examples of coarse coding in some of the network analyses that were presented earlier in this chapter. For instance, consider the chord recognition network. It was shown in Table 4.10.2 that none of its hidden units were accurate chord detectors. Hidden Units 1 and 2 did not achieve maximum activity when presented with any chord. When Hidden Unit 3 achieved maximum activity, this did not distinguish a 6th chord from a major 7th chord. However, when patterns were represented as points in a three-dimensional space, where the coordinates of each point were defined by a pattern’s activity in each of the three hidden units (Figures 4.10.6 and 4.10.7), perfect chord classification was possible.
Other connectionist examples of coarse coding are found in studies of networks trained to accomplish navigational tasks, such as making judgments about the distance or direction between pairs of cities on a map (Dawson & Boechler, 2007; Dawson, Boechler, & Orsten, 2005; Dawson, Boechler, & Valsangkar-Smyth, 2000). For instance, Dawson and Boechler (2007) trained a network to judge the heading from one city on a map of Alberta to another. Seven hidden value units were required to accomplish this task. Each of these hidden units could be described as being sensitive to heading. However, this sensitivity was extremely coarse—some hidden units could resolve directions only to the nearest 180°. Nevertheless, a linear combination of the activities of all seven hidden units represented the desired direction between cities with a high degree of accuracy.
Similarly, Dawson, Boechler, and Valsangkar-Smyth (2000) trained a network of value units to make distance judgments between all possible pairs of 13 Albertan cities. This network required six hidden units to accomplish this task. Again, these units provided a coarse coding solution to the problem. Each hidden unit could be described as occupying a location on the map of Alberta through which a line was drawn at a particular orientation. This oriented line provided a one-dimensional map of the cities: connection weights encoded the projections of the cities from the two-dimensional map onto each hidden unit’s one-dimensional representation. However, because the hidden units provided maps of reduced dimensionality, they were wildly inaccurate. Depending on the position of the oriented line, two cities that were far apart in the actual map could lie close together on a hidden unit’s representation. Fortunately, because each of these inaccurate hidden unit maps encoded projections from different perspectives, the combination of their activities was able to represent the actual distance between all city pairs with a high degree of accuracy.
The discovery of coarse coding in navigational networks has important theoretical implications. Since the discovery of place cells in the hippocampus (O’Keefe & Dostrovsky, 1971), it has been thought that one function of the hippocampus is to instantiate a cognitive map (O’Keefe & Nadel, 1978). One analogy used to explain cognitive maps is that they are like graphical maps (Kitchin, 1994). From this, one might predict that the cognitive map is a metric, topographically organized, two-dimensional array in which each location in the map (i.e., each place in the external world) is associated with the firing of a particular place cell, and neighboring place cells represent neighboring places in the external world.
However, this prediction is not supported by anatomical evidence. First, place cells do not appear to be topographically organized (Burgess, Recce, & O’Keefe, 1995; McNaughton et al., 1996). Second, the receptive fields of place cells are at best locally metric, because one cannot measure the distance between points that are more than about a dozen body lengths apart because of a lack of receptive field overlap (Touretzky, Wan, & Redish, 1994). Some researchers now propose that the cognitive map doesn’t really exit, but that map-like properties emerge when place cells are coordinated with other types of cells, such as head direction cells, which fire when an animal’s head is pointed in a particular direction, regardless of the animal’s location in space (McNaughton et al., 1996; Redish, 1999; Redish & Touretzky, 1999; Touretzky, Wan, & Redish, 1994).
Dawson et al. (2000) observed that their navigational network is also subject to the same criticisms that have been leveled against the notion of a topographically organized cognitive map. The hidden units did not exhibit topographic organization, and their inaccurate responses suggest that they are at best locally metric.
Nevertheless, the behavior of the Dawson et al. (2000) network indicated that it represented information about a metric space. That such behavior can be supported by the type of coarse coding discovered in this network suggests that metric, spatial information can be encoded in a representational scheme that is not isomorphic to a graphical map. This raises the possibility that place cells represent spatial information using a coarse code which, when its individual components are inspected, is not very map-like at all. O’Keefe and Nadel (1978, p. 78) were explicitly aware of this kind of possibility: “The cognitive map is not a picture or image which ‘looks like’ what it represents; rather, it is an information structure from which map-like images can be reconstructed and from which behavior dependent upon place information can be generated.”
What are the implications of the ability to interpret the internal structure of artificial neural networks to the practice of connectionist cognitive science?
When New Connectionism arose in the 1980s, interest in it was fuelled by two complementary perspectives (Medler, 1998). First, there was growing dissatisfaction with the progress being made in classical cognitive science and symbolic artificial intelligence (Dreyfus, 1992; Dreyfus & Dreyfus, 1988). Second, seminal introductions to artificial neural networks (McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986c) gave the sense that the connectionist architecture was a radical alternative to its classical counterpart (Schneider, 1987).
The apparent differences between artificial neural networks and classical models led to an early period of research in which networks were trained to accomplish tasks that had typically been viewed as prototypical examples of classical cognitive science (Bechtel, 1994; Rumelhart & McClelland, 1986a; Seidenberg & McClelland, 1989; Sejnowski & Rosenberg, 1988). These networks were then used as “existence proofs” to support the claim that non-classical models of classical phenomena are possible. However, detailed analyses of these networks were not provided, which meant that, apart from intuitions that connectionism is not classical, there was no evidence to support claims about the non-classical nature of the networks’ solutions to the classical problems. Because of this, this research perspective has been called gee whiz connectionism (Dawson, 2004, 2009).
Of course, at around the same time, prominent classical researchers were criticizing the computational power of connectionist networks (Fodor & Pylyshyn, 1988), arguing that connectionism was a throwback to less powerful notions of associationism that classical cognitive science had already vanquished (Bever, Fodor, & Garrett, 1968; Chomsky, 1957, 1959b, 1965). Thus gee whiz connectionism served an important purpose: providing empirical demonstrations that connectionism might be a plausible medium in which cognitive science can be fruitfully pursued.
However, it was noted earlier that there exists a great deal of research on the computational power of artificial neural networks (Girosi & Poggio, 1990; Hartman, Keeler, & Kowalski, 1989; Lippmann, 1989; McCulloch & Pitts, 1943; Moody & Darken, 1989; Poggio & Girosi, 1990; Renals, 1989; Siegelmann, 1999; Siegelmann & Sontag, 1991); the conclusion from this research is that multilayered networks have the same in-principle power as any universal machine. This leads, though, to the demise of gee whiz connectionism, because if connectionist systems belong to the class of universal machines, “it is neither interesting nor surprising to demonstrate that a network can learn a task of interest” (Dawson, 2004, p. 118). If a network’s ability to learn to perform a task is not of interest, then what is?
It can be extremely interesting, surprising, and informative to determine what regularities the network exploits. What kinds of regularities in the input patterns has the network discovered? How does it represent these regularities? How are these regularities combined to govern the response of the network? (Dawson, 2004, p. 118)
By uncovering the properties of representations that networks have discovered for mediating an input-output relationship, connectionist cognitive scientists can discover new properties of cognitive phenomena.