4.2: Nurture versus Nature
- Page ID
- 21222
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The second chapter of John Locke’s (1977) An Essay Concerning Human Understanding, originally published in 1706, begins as follows:
It is an established opinion among some men that there are in the understanding certain innate principles; some primary notions, characters, as it were, stamped upon the mind of man, which the soul receives in its very first being, and brings into the world with it. (Locke, 1977, p. 17)
Locke’s most famous work was a reaction against this view; of the “some men” being referred to, the most prominent was Descartes himself (Thilly, 1900).
Locke’s Essay cr iticized Cartesian philosophy, questioning its fundamental teachings, its core principles and their necessary implications, and its arguments for innate ideas, not to mention all scholars who maintained the existence of innate ideas (Thilly, 1900). Locke’s goal was to replace Cartesian rationalism with empiricism, the view that the source of ideas was experience. Locke (1977) aimed to show “how men, barely by the use of their natural faculties, may attain to all of the knowledge they have without the help of any innate impressions” (p. 17). Locke argued for experience over innateness, for nurture over nature.
The empiricism of Locke and his descendants provided a viable and popular alternative to Cartesian philosophy (Aune, 1970). It was also a primary influence on some of the psychological theories that appeared in the late nineteenth and early twentieth centuries (Warren, 1921). Thus it should be no surprise that empiricism is reflected in a different form of cognitive science, connectionism. Furthermore, just as empiricism challenged most of the key ideas of rationalism, connectionist cognitive science can be seen as challenging many of the elements of classical cognitive science.
Surprisingly, the primary concern of connectionist cognitive science is not classical cognitive science’s nativism. It is instead the classical approach’s excessive functionalism, due largely to its acceptance of the multiple realization argument. Logic gates, the core element of digital computers, are hardware independent because different physical mechanisms could be used to bring the two-valued logic into being (Hillis, 1998). The notion of a universal machine is an abstract, logical one (Newell, 1980), which is why physical symbol systems, computers, or universal machines can be physically realized using LEGO (Agulló et al., 2003), electric train sets (Stewart, 1994), gears (Swade, 1993), hydraulic valves (Hillis, 1998) or silicon chips (Reid, 2001). Physical constraints on computation do not seem to play an important role in classical cognitive science.
To connectionist cognitive science, the multiple realization argument is flawed because connectionists believe that the information processing responsible for human cognition depends critically on the properties of particular hardware, the brain. The characteristics of the brain place constraints on the kinds of computations that it can perform and on the manner in which they are performed (Bechtel & Abrahamsen, 2002; Churchland, Koch, & Sejnowski, 1990; Churchland & Sejnowski, 1992; Clark, 1989, 1993; Feldman & Ballard, 1982).
Brains have long been viewed as being different kinds of information processors than electronic computers because of differences in componentry (von Neumann, 1958). While electronic computers use a small number of fast components, the brain consists of a large number of very slow components, that is, neurons. As a result, the brain must be a parallel processing device that “will tend to pick up as many logical (or informational) items as possible simultaneously, and process them simultaneously” (von Neumann, 1958, p. 51).
Von Neumann (1958) argued that neural information processing would be far less precise, in terms of decimal point precision, than electronic information processing. However, this low level of neural precision would be complemented by a comparatively high level of reliability, where noise or missing information would have far less effect than it would for electronic computers. Given that the basic architecture of the brain involves many connections amongst many elementary components, and that these connections serve as a memory, the brain’s memory capacity should also far exceed that of digital computers.
The differences between electronic and brain-like information processing are at the root of connectionist cognitive science’s reaction against classic cognitive science. The classical approach has a long history of grand futuristic predictions that fail to materialize (Dreyfus, 1992, p. 85): “Despite predictions, press releases, films, and warnings, artificial intelligence is a promise and not an accomplished fact.” Connectionist cognitive science argues that this pattern of failure is due to the fundamental assumptions of the classical approach that fail to capture the basic principles of human cognition.
Connectionists propose a very different theory of information processing— a potential paradigm shift (Schneider, 1987)—to remedy this situation. Even staunch critics of artificial intelligence research have indicated a certain sympathy with the connectionist view of information processing (Dreyfus & Dreyfus, 1988; Searle, 1992). “The fan club includes the most unlikely collection of people. . . . Almost everyone who is discontent with contemporary cognitive psychology and current ‘information processing’ models of the mind has rushed to embrace the ‘connectionist alternative’” (Fodor & Pylyshyn, 1988, p. 4).
What are the key problems that connectionists see in classical models? Classical models invoke serial processes, which make them far too slow to run on sluggish componentry (Feldman & Ballard, 1982). They involve explicit, local, and digital representations of both rules and symbols, making these models too brittle. “If in a digital system of notations a single pulse is missing, absolute perversion of meaning, i.e., nonsense, may result” (von Neumann, 1958, p. 78). Because of this brittleness, the behaviour of classical models does not degrade gracefully when presented with noisy inputs, and such models are not damage resistant. All of these issues arise from one underlying theme: classical algorithms reflect the kind of information processing carried out by electronic computers, not the kind that characterizes the brain. In short, classical theories are not biologically plausible.
Connectionist cognitive science “offers a radically different conception of the basic processing system of the mind-brain, one inspired by our knowledge of the nervous system” (Bechtel & Abrahamsen, 2002, p. 2). The basic medium of connectionism is a type of model called an artificial neural network, or a parallel distributed processing (PDP) network (McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986c). Artificial neural networks consist of a number of simple processors that perform basic calculations and communicate the results to other processors by sending signals through weighted connections. The processors operate in parallel, permitting fast computing even when slow componentry is involved. They exploit implicit, distributed, and redundant representations, making these networks not brittle. Because networks are not brittle, their behaviour degrades gracefully when presented with noisy inputs, and such models are damage resistant. These advantages accrue because artificial neural networks are intentionally biologically plausible or neuronally inspired.
Classical cognitive science develops models that are purely symbolic and which can be described as asserting propositions or performing logic. In contrast, connectionist cognitive science develops models that are subsymbolic (Smolensky, 1988) and which can be described as statistical pattern recognizers. Networks use representations (Dawson, 2004; Horgan & Tienson, 1996), but these representations do not have the syntactic structure of those found in classical models (Waskan & Bechtel, 1997). Let us take a moment to describe in a bit more detail the basic properties of artificial neural networks.
An artificial neural network is a computer simulation of a “brain-like” system of interconnected processing units (see Figures 4-1 and 4-5 later in this chapter). In general, such a network can be viewed as a multiple-layer system that generates a desired response to an input stimulus. That is, like the devices described by cybernetics (Ashby, 1956, 1960), an artificial neural network is a machine that computes a mapping between inputs and outputs.
A network’s stimulus or input pattern is provided by the environment and is encoded as a pattern of activity (i.e., a vector of numbers) in a set of input units. The response of the system, its output pattern, is represented as a pattern of activity in the network›s output units. In modern connectionism—sometimes called New Connectionism—there will be one or more intervening layers of processors in the network, called hidden units. Hidden units detect higher-order features in the input pattern, allowing the network to make a correct or appropriate response.
The behaviour of a processor in an artificial neural network, which is analogous to a neuron, can be characterized as follows. First, the processor computes the total signal (its net input) being sent to it by other processors in the network. Second, the unit uses an activation function to convert its net input into internal activity (usually a continuous number between 0 and 1) on the basis of this computed signal. Third, the unit converts its internal activity into an output signal, and sends this signal on to other processors. A network uses parallel processing because many, if not all, of its units will perform their operations simultaneously.
The signal sent by one processor to another is a number that is transmitted through a weighted connection, which is analogous to a synapse. The connection serves as a communication channel that amplifies or attenuates signals being sent through it, because these signals are multiplied by the weight associated with the connection. The weight is a number that defines the nature and strength of the connection. For example, inhibitory connections have negative weights, and excitatory connections have positive weights. Strong connections have strong weights (i.e., the absolute value of the weight is large), while weak connections have near-zero weights.
The pattern of connectivity in a PDP network (i.e., the network’s entire set of connection weights) defines how signals flow between the processors. As a result, a network’s connection weights are analogous to a program in a conventional computer (Smolensky, 1988). However, a network’s “program” is not of the same type that defines a classical model. A network’s program does not reflect the classical structure/process distinction, because networks do not employ either explicit symbols or rules. Instead, a network’s program is a set of causal or associative links from signaling processors to receiving processors. The activity that is produced in the receiving units is literally caused by having an input pattern of activity modulated by an array of connection weights between units. In this sense, connectionist models seem markedly associationist in nature (Bechtel, 1985); they can be comfortably related to the old associationist psychology (Warren, 1921).
Artificial neural networks are not necessarily embodiments of empiricist philosophy. Indeed, the earliest artificial neural networks did not learn from experience; they were nativist in the sense that they had to have their connection weights “hand wired” by a designer (McCulloch & Pitts, 1943). However, their associationist characteristics resulted in a natural tendency for artificial neural networks to become the face of modern empiricism. This is because associationism has always been strongly linked to empiricism; empiricist philosophers invoked various laws of association to explain how complex ideas could be constructed from the knowledge provided by experience (Warren, 1921). By the late 1950s, when computers were being used to bring networks to life, networks were explicitly linked to empiricism (Rosenblatt, 1958). Rosenblatt’s artificial neural networks were not hand wired. Instead, they learned from experience to set the values of their connection weights.
What does it mean to say that artificial neural networks are empiricist? A famous passage from Locke (1977, p. 54) highlights two key elements: “Let us then suppose the mind to be, as we say, white paper, void of all characters, without any idea, how comes it to be furnished? . . . To this I answer, in one word, from experience.”
The first element in the above quote is the “white paper,” often described as the tabula rasa, or the blank slate: the notion of a mind being blank in the absence of experience. Modern connectionist networks can be described as endorsing the notion of the blank slate (Pinker, 2002). This is because prior to learning, the pattern of connections in modern networks has no pre-existing structure. The networks either start literally as blank slates, with all connection weights being equal to zero (Anderson et al., 1977; Eich, 1982; Hinton & Anderson, 1981), or they start with all connection weights being assigned small, randomly selected values (Rumelhart, Hinton, & Williams, 1986a, 1986b).
The second element in Locke’s quote is that the source of ideas or knowledge or structure is experience. Connectionist learning rules provide a modern embodiment of this notion. Artificial neural networks are exposed to environmental stimulation— activation of their input units—which results in changes to connection weights. These changes furnish a network’s blank slate, resulting in a pattern of connectivity that represents knowledge and implements a particular input-output mapping.
In some systems, called self-organizing networks, experience shapes connectivity via unsupervised learning (Carpenter & Grossberg, 1992; Grossberg, 1980, 1987, 1988; Kohonen, 1977, 1984). When learning is unsupervised, networks are only provided with input patterns. They are not presented with desired outputs that are paired with each input pattern. In unsupervised learning, each presented pattern causes activity in output units; this activity is often further refined by a winner-take-all competition in which one output unit wins the competition to be paired with the current input pattern. Once the output unit is selected via internal network dynamics, its connection weights, and possibly the weights of neighbouring output units, are updated via a learning rule.
Networks whose connection weights are modified via unsupervised learning develop sensitivity to statistical regularities in the inputs and organize their output units to reflect these regularities. For instance, in a famous kind of self-organizing network called a Kohonen network (Kohonen, 1984), output units are arranged in a two-dimensional grid. Unsupervised learning causes the grid to organize itself into a map that reveals the discovered structure of the inputs, where related patterns produce neighbouring activity in the output map. For example, when such networks are presented with musical inputs, they often produce output maps that are organized according to the musical circle of fifths (Griffith & Todd, 1999; Todd & Loy, 1991).
In cognitive science, most networks reported in the literature are not selforganizing and are not structured via unsupervised learning. Instead, they are networks that are instructed to mediate a desired input-output mapping. This is accomplished via supervised learning. In supervised learning, it is assumed that the network has an external teacher. The network is presented with an input pattern and produces a response to it. The teacher compares the response generated by the network to the desired response, usually by calculating the amount of error associated with each output unit. The teacher then provides the error as feedback to the network. A learning rule uses feedback about error to modify weights in such a way that the next time this pattern is presented to the network, the amount of error that it produces will be smaller.
A variety of learning rules, including the delta rule (Rosenblatt, 1958, 1962; Stone, 1986; Widrow, 1962; Widrow & Hoff, 1960) and the generalized delta rule (Rumelhart, Hinton, & Williams, 1986b), are supervised learning rules that work by correcting network errors. (The generalized delta rule is perhaps the most popular learning rule in modern connectionism, and is discussed in more detail in Section 4.9.) This kind of learning involves the repeated presentation of a number of inputoutput pattern pairs, called a training set. Ideally, with enough presentations of a training set, the amount of error produced to each member of the training set will be negligible, and it can be said that the network has learned the desired inputoutput mapping. Because these techniques require many presentations of a set of patterns for learning to be completed, they have sometimes been criticized as being examples of “slow learning” (Carpenter, 1989).
Connectionism’s empiricist and associationist nature cast it close to the very position that classical cognitivism reacted against: psychological behaviourism (Miller, 2003). Modern classical arguments against connectionist cognitive science (Fodor & Pylyshyn, 1988) cover much of the same ground as arguments against behaviourist and associationist accounts of language (Bever, Fodor, & Garrett, 1968; Chomsky, 1957, 1959a, 1959b, 1965). That is, classical cognitive scientists argue that artificial neural networks, like their associationist cousins, do not have the computational power to capture the kind of regularities modelled with recursive rule systems.
However, these arguments against connectionism are flawed. We see in later sections that computational analyses of artificial neural networks have proven that they too belong to the class “universal machine.” As a result, the kinds of inputoutput mappings that have been realized in artificial neural networks are both vast and diverse. One can find connectionist models in every research domain that has also been explored by classical cognitive scientists. Even critics of connectionism admit that “the study of connectionist machines has led to a number of striking and unanticipated findings; it’s surprising how much computing can be done with a uniform network of simple interconnected elements” (Fodor & Pylyshyn, 1988, p. 6).
That connectionist models can produce unanticipated results is a direct result of their empiricist nature. Unlike their classical counterparts, connectionist researchers do not require a fully specified theory of how a task is accomplished before modelling begins (Hillis, 1988). Instead, they can let a learning rule discover how to mediate a desired input-output mapping. Connectionist learning rules serve as powerful methods for developing new algorithms of interest to cognitive science. Hillis (1988, p. 176) has noted that artificial neural networks allow “for the possibility of constructing intelligence without first understanding it.”
One problem with connectionist cognitive science is that the algorithms that learning rules discover are extremely difficult to retrieve from a trained network (Dawson, 1998, 2004, 2009; Dawson & Shamanski, 1994; McCloskey, 1991; Mozer & Smolensky, 1989; Seidenberg, 1993). This is because these algorithms involve distributed, parallel interactions amongst highly nonlinear elements. “One thing that connectionist networks have in common with brains is that if you open them up and peer inside, all you can see is a big pile of goo” (Mozer & Smolensky, 1989, p. 3).
In the early days of modern connectionist cognitive science, this was not a concern. This was a period of what has been called “gee whiz” connectionism (Dawson, 2009), in which connectionists modelled phenomena that were typically described in terms of rule-governed symbol manipulation. In the mid-1980s it was sufficiently interesting to show that such phenomena might be accounted for by parallel distributed processing systems that did not propose explicit rules or symbols. However, as connectionism matured, it was necessary for its researchers to spell out the details of the alternative algorithms embodied in their networks (Dawson, 2004). If these algorithms could not be extracted from networks, then “connectionist networks should not be viewed as theories of human cognitive functions, or as simulations of theories, or even as demonstrations of specific theoretical points” (McCloskey, 1991, p. 387). In response to such criticisms, connectionist cognitive scientists have developed a number of techniques for recovering algorithms from their networks (Berkeley et al., 1995; Dawson, 2004, 2005; Gallant, 1993; Hanson &Burr, 1990; Hinton, 1986; Moorhead, Haig,&Clement, 1989; Omlin& Giles, 1996).
What are the elements of connectionism, and how do they relate to cognitive science in general and to classical cognitive science in particular? The purpose of the remainder of this chapter is to explore the ideas of connectionist cognitive science in more detail.