4.7: Beyond the Terminal Meta-postulate
- Page ID
Connectionist networks are associationist devices that map inputs to outputs, systems that convert stimuli into responses. However, we saw in Chapter 3 that classical cognitive scientists had established that the stimulus-response theories of behaviourist psychology could not adequately deal with the recursive structure of natural language (Chomsky, 1957, 1959b, 1965, 1966). In the terminal meta-postulate argument (Bever, Fodor, and Garrett, 1968), it was noted that the rules of associative theory defined a “terminal vocabulary of a theory, i.e., over the vocabulary in which behavior is described” (p. 583). Bever, Fodor, and Garrett then proceeded to prove that the terminal vocabulary of associationism is not powerful enough to accept or reject languages that have recursive clausal structure.
If connectionist cognitive science is another instance of associative or behaviourist theory, then it stands to reason that it too is subject to these same problems and therefore lacks the computational power required of cognitive theory. One of the most influential criticisms of connectionism has essentially made this point, arguing against the computational power of artificial neural networks because they lack the componentiality and systematicity associated with recursive rules that operate on components of symbolic expressions (Fodor & Pylyshyn, 1988). If artificial neural networks do not belong to the class of universal machines, then they cannot compete against the physical symbol systems that define classical cognitive science (Newell, 1980; Newell & Simon, 1976).
What tasks can artificial neural networks perform, and how well can they perform them? To begin, let us consider the most frequent kind of problem that artificial neural networks are used to solve: pattern recognition (Pao, 1989; Ripley, 1996). Pattern recognition is a process by which varying input patterns, defined by sets of features which may have continuous values, are assigned to discrete categories in an all-or-none fashion (Harnad, 1987). In other words, it requires that a system perform a mapping from continuous inputs to discrete outputs. Artificial neural networks are clearly capable of performing this kind of mapping, provided either that their output units use a binary activation function like the Heaviside, or that their continuous output is extreme enough to be given a binary interpretation. In this context, the pattern of “on” and “off ” responses in a set of output units represents the digital name of the class to which an input pattern has been assigned.
We saw earlier that pattern recognition problems can be represented using pattern spaces (Figure 4-2). To classify patterns, a system carves a pattern space into decision regions that separate all of the patterns belonging to one class from the patterns that belong to others. An arbitrary pattern classifier would be a system that could, in principle, solve any pattern recognition problem with which it was faced. In order to have such ability, such a system must have complete flexibility in carving a pattern space into decision regions: it must be able to slice the space into regions of any required shape or number.
Artificial neural networks can categorize patterns. How well can they do so? It has been shown that a multilayer perceptron with three layers of connections—two layers of hidden units intervening between the input and output layers—is indeed an arbitrary pattern classifier (Lippmann, 1987, 1989). This is because the two layers of hidden units provided the required flexibility in carving pattern spaces into decision regions, assuming that the hidden units use a sigmoid-shaped activation function such as the logistic. “No more than three layers are required in perceptronlike feed-forward nets” (Lippmann, 1987, p. 16).
When output unit activity is interpreted digitally—as delivering “true” or “false” judgments—artificial neural networks can be interpreted as performing one kind of task, pattern classification. However, modern networks use continuous activation functions that do not need to be interpreted digitally. If one applies an analog interpretation to output unit activity, then networks can be interpreted as performing a second kind of input-output mapping task, function approximation.
In function approximation, an input is a set of numbers that represents the values of variables passed into a function, i.e., the values of the set x1 , x2, x3, . . . xN. The output is a single value y that is the result of computing some function of those variables, i.e., y = f(x1 , x2, x3, . . . xN). Many artificial neural networks have been trained to approximate functions (Girosi & Poggio, 1990; Hartman, Keeler, & Kowalski, 1989; Moody & Darken, 1989; Poggio & Girosi, 1990; Renals, 1989). In these networks, the value of each input variable is represented by the activity of an input unit, and the continuous value of an output unit’s activity represents the computed value of the function of those input variables.
A system that is most powerful at approximating functions is called a universal function approximator. Consider taking any continuous function and examining a region of this function from a particular starting point (e.g., one set of input values) to a particular ending point (e.g., a different set of input values). A universal function approximator is capable of approximating the shape of the function between these bounds to an arbitrary degree of accuracy.
Artificial neural networks can approximate functions. How well can they do so? A number of proofs have shown that a multilayer perceptron with two layers of connections—in other words, a single layer of hidden units intervening between the input and output layers—is capable of universal function approximation (Cotter, 1990; Cybenko, 1989; Funahashi, 1989; Hartman, Keeler, & Kowalski, 1989; Hornik, Stinchcombe, &White, 1989). “If we have the right connections from the input units to a large enough set of hidden units, we can always find a representation that will perform any mapping from input to output” (Rumelhart, Hinton, &Williams, 1986a, p. 319).
That multilayered networks have the in-principle power to be arbitrary pattern classifiers or universal function approximators suggests that they belong to the class “universal machine,” the same class to which physical symbol systems belong (Newell, 1980). Newell (1980) proved that physical symbol systems belonged to this class by showing how a universal Turing machine could be simulated by a physical symbol system. Similar proofs exist for artificial neural networks, firmly establishing their computational power.
The Turing equivalence of connectionist networks has long been established. McCulloch and Pitts (1943) proved that a network of McCulloch-Pitts neurons could be used to build the machine head of a universal Turing machine; universal power was then achieved by providing this system with an external memory. “To psychology, however defined, specification of the net would contribute all that could be achieved in that field” (p. 131). More modern results have used the analog nature of modern processors to internalize the memory, indicating that an artificial neural network can simulate the entire Turing machine (Siegelmann, 1999; Siegelmann & Sontag, 1991, 1995).
Modern associationist psychologists have been concerned about the implications of the terminal meta-postulate and have argued against it in an attempt to free their theories from its computational shackles (Anderson & Bower, 1973; Paivio, 1986). The hidden units of modern artificial neural networks break these shackles by capturing higher-order associations—associations between associations—that are not defined in a vocabulary restricted to input and output activities. The presence of hidden units provides enough power to modern networks to firmly plant them in the class “universal machine” and to make them viable alternatives to classical simulations.