7.6: Local versus Distributed Representations

Last updated
Save as PDF

Page ID: 21247

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Classical and connectionist cognitive scientists agree that theories of cognition must appeal to internal representations (Fodor & Pylyshyn, 1988). However, they appear to have strong disagreements about the nature of such representations. In particular, connectionist cognitive scientists propose that their networks exploit distributed representations, which provide many advantages over the local representations that they argue characterize the classical approach (Bowers, 2009). That is, distributed representations are often taken to be a mark of the connectionist, and local representations are taken to be a mark of the classical.

There is general, intuitive agreement about the differences between distributed and local representations. In a connectionist distributed representation, “knowledge is coded as a pattern of activation across many processing units, with each unit contributing to multiple, different representations. As a consequence, there is no one unit devoted to coding a given word, object, or person” (Bowers, 2009, p. 220). In contrast, in a classical local representation, “individual words, objects, simple concepts, and the like are coded distinctly, with their own dedicated representation” (p. 22).

However, when the definition of distributed representation is examined more carefully (van Gelder, 1991), two facts become clear. First, this term is used by different connectionists in different ways. Second, some of the uses of this term do not appear to differentiate connectionist from classical representations.

Van Gelder (1991) noted, for instance, that one common sense of distributed representation is that it is extended: a distributed representation uses many units to represent each item, while local representations do not. “To claim that a node is distributed is presumably to claim that its states of activation correspond to patterns of neural activity—to aggregates of neural ‘units’—rather than to activations of single neurons” (Fodor & Pylyshyn, 1988, p. 19). It is this sense of an extended or distributed representation that produces connectionist advantages such as damage resistance, because the loss of one of the many processors used to represent a concept will not produce catastrophic loss of represented information.

However, the use of extended to define distributed does not segregate connectionist representations from their classical counterparts. For example, the mental image is an important example of a classical representation (Kosslyn, 1980; Kosslyn, Thompson, & Ganis, 2006; Paivio, 1971, 1986). It would be odd to think of a mental image as being distributed, particularly in the context of the connectionist use of this term. However, proponents of mental imagery would argue that they are extended, functionally in terms of being extended over space, and physically in terms of being extended over aggregates of neurons in topographically organized areas of the cortex (Kosslyn, 1994; Kosslyn, Ganis, & Thompson, 2003; Kosslyn et al., 1995). “There is good evidence that the brain depicts representations literally, using space on the cortex to represent space in the world” (Kosslyn, Thompson, & Ganis, 2006, p. 15).

Another notion of distributed representation considered by van Gelder (1991) was the coarse code (Feldman & Ballard, 1982; Hinton, McClelland, & Rumelhart, 1986). Again, a coarse code is typically presented as distinguishing connectionist networks from classical models. A coarse code is extended in the sense that multiple processors are required to do the representing. These processors have two properties. First, their receptive fields are wide—that is, they are very broadly tuned, so that a variety of circumstances will lead to activation in a processor. Second, the receptive fields of different processors overlap. In this kind of representation, a high degree of accuracy is possible by pooling the responses of a number of broadly tuned (i.e., coarse) processors (Dawson, Boechler, & Orsten, 2005; Dawson, Boechler, & Valsangkar-Smyth, 2000).

While coarse coding is an important kind of representation in the connectionist literature, once again it is possible to find examples of coarse coding in classical models as well. For example, one way that coarse coding of spatial location is presented by connectionists (Hinton, McClelland, & Rumelhart, 1986) can easily be recast in terms of Venn diagrams. That is, each non-empty set represents the coarse location of a target in a broad spatial area; the intersection of overlapping nonempty sets provides more accurate target localization.

However, classical models of syllogistic reasoning can be cast in similar fashions that include Euler circles and Venn diagrams (Johnson-Laird, 1983). Indeed, Johnson-Laird’s (1983) more modern notion of mental models can themselves be viewed as an extension of these approaches: syllogistic statements are represented as a tableau of different instances; the syllogism is solved by combining (i.e., intersecting) tableaus for different statements and examining the relevant instances that result. In other words, mental models can be considered to represent a classical example of coarse coding, suggesting that this concept does not necessarily distinguish connectionist from classical theories.

After his more detailed analysis of the concept, van Gelder (1991) argued that a stronger notion of distributed is required, and that this can be accomplished by invoking the concept of superposition. Two different concepts are superposed if the same resources are used to provide their representations. “Thus in connectionist networks we can have different items stored as patterns of activity over the same set of units, or multiple different associations encoded in one set of weights” (p. 43).

Van Gelder (1991) pointed out that one issue with superposition is that it must be defined in degrees. For instance, it may be the case that not all resources are used simultaneously to represent all contents. Furthermore, operationalizing the notion of superposition depends upon how resources are defined and measured. Finally, different degrees of superposition may be reflected in the number of different contents that a given resource can represent. For example, it is well known that one kind of artificial neural network, the Hopfield network (Hopfield, 1982), is of limited capacity, where if the network is comprised of N processors, it will be only to be able to represent in the order of 0.18N distinct memories (Abu-Mostafa & St. Jacques, 1985; McEliece, et al., 1987).

Nonetheless, van Gelder (1991) expressed confidence that the notion of superposition provides an appropriate characteristic for defining a distributed representation. “It is strong enough that very many kinds of representations do not count as superposed, yet it manages to subsume virtually all paradigm cases of distribution, whether these are drawn from the brain, connectionism, psychology, or optics” (p. 54).

Even if van Gelder’s (1991) definition is correct, it is still the case that the concept of superposition does not universally distinguish connectionist representations from classical ones. One example of this is when concepts are represented as collections of features or microfeatures. For instance, in an influential PDP model called an interactive activation and competition network (McClelland & Rumelhart, 1988), most of the processing units represent the presence of a variety of features. Higherorder concepts are defined as sets of such features. This is an instance of superposition, because the same feature can be involved in the representation of multiple networks. However, the identical type of representation—that is, superposition of featural elements—is also true of many prototypical classical representations, including semantic networks (Collins & Quillian, 1969, 1970a, 1970b) and feature set representations (Rips, Shoben, & Smith, 1973; Tversky, 1977; Tversky & Gati, 1982).

The discussion up to this point has considered a handful of different notions of distributed representation, and has argued that these different definitions do not appear to uniquely separate connectionist and classical concepts of representation. To wrap up this discussion, let us take a different approach, and consider why in some senses connectionist researchers may still need to appeal to local representations.

One problem of considerable interest within cognitive neuroscience is the issue of assigning specific behavioural functions to specific brain regions; that is, the localization of function. To aid in this endeavour, cognitive neuroscientists find it useful to distinguish between two qualitatively different types of behavioural deficits. A single dissociation consists of a patient performing one task extremely poorly while performing a second task at a normal level, or at least very much better than the first. In contrast, a double dissociation occurs when one patient performs the first task significantly poorer than the second, and another patient (with a different brain injury) performs the second task significantly poorer than the first (Shallice, 1988).

Cognitive neuroscientists have argued that double dissociations reflect damages to localized functions (Caramazza, 1986; Shallice, 1988). The view that dissociation data reveals internal structures that are local in nature has been named the locality assumption (Farah, 1994).

However, Farah (1994) hypothesized that the locality assumption may be un warranted for two reasons. First, its validity depends upon the additional assumption that the brain is organized into a set of functionally distinct modules (Fodor, 1983). Farah argued that the modularity of the brain is an unresolved empirical issue. Second, Farah noted that it is possible for nonlocal or distributed architectures, such as parallel distributed processing (PDP) networks, to produce single or double dissociations when lesioned. As the interactive nature of PDP networks is “directly incompatible with the locality assumption” (p. 46), the locality assumption may not be an indispensable tool for cognitive neuroscientists.

Farah (1994) reviewed three areas in which neuropsychological dissociations had been used previously to make inferences about the underlying local structure. For each she provided an alternative architecture—a PDP network. Each of these networks, when locally damaged, produced (local) behavioural deficits analogous to the neuropsychological dissociations of interest. These results led Farah to conclude that one cannot infer that a specific behavioural deficit is associated with the loss of a local function, because the prevailing view is that PDP networks are, by definition, distributed and therefore nonlocal in structure.

However, one study challenged Farah’s (1994) argument both logically and empirically (Medler, Dawson, & Kingstone, 2005). Medler, Dawson, and Kingstone (2005) noted that Farah’s whole argument was based on the assumption that connectionist networks exhibit universally distributed internal structure. However, this assumption needs to be empirically supported; Medler and colleagues argued that this could only be done by interpreting the internal structure of a network and by relating behavioural deficits to interpretations of ablated components. They noted that it was perfectly possible for PDP networks to adopt internal representations that were more local in nature, and that single and double dissociations in lesioned networks may be the result of damaging local representations.

Medler, Dawson, and Kingstone (2005) supported their position by training a network on a logic problem and interpreting the internal structure of the network, acquiring evidence about how local or how nonlocal the function of each hidden unit was. They then created different versions of the network by lesioning one of its 16 hidden units, assessing behavioural deficits in each lesioned network. They found that the more local a hidden unit was the more profound and specific was the behavioural deficit that resulted when the unit was lesioned. “For a double dissociation to occur within a computational model, the model must have some form of functional localization” (p. 149).

We saw earlier that one of the key goals of connectionist cognitive science was to develop models that were biologically plausible. Clearly one aspect of this is to produce networks that are capable of reflecting appropriate deficits in behaviour when damaged, such as single or double dissociations. Medler, Dawson, and Kingstone (2005) have shown that the ability to do so, even in PDP networks, requires local representations. This provides another line of evidence against the claim that distributed representations can be used to distinguish connectionist from classical models. In other words, local representations do not appear to be a mark of the classical.

Search

Text Color

Text Size

Margin Size

Font Type