7.8: Explicit Rules versus Implicit Knowledge
- Page ID
- 41274
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Connectionists have argued that one mark of the classical is its reliance on explicit rules (McClelland, Rumelhart, & Hinton, 1986). For example, it has been claimed that all classical work on knowledge acquisition “shares the assumption that the goal of learning is to formulate explicit rules (proposition, productions, etc.) which capture powerful generalizations in a succinct way” (p. 32).
Explicit rules may serve as a mark of the classical because it has also been argued that they are not characteristic of other approaches in cognitive science, particularly connectionism. Many researchers assume that PDP networks acquire implicit knowledge. For instance, consider this claim about a network that learns to convert verbs from present to past tense:
The model learns to behave in accordance with the rule, not by explicitly noting that most words take -ed in the past tense in English and storing this rule away explicitly, but simply by building up a set of connections in a pattern associator through a long series of simple learning experiences. (McClelland, Rumelhart, & Hinton, 1986, p. 40)
One problem that immediately arises in using explicit rules as a mark of the classical is that the notions of explicit rules and implicit knowledge are only vaguely defined or understood (Kirsh, 1992). For instance, Kirsh (1992) notes that the distinction between explicit rules and implicit knowledge is often proposed to be similar to the distinction between local and distributed representations. However, this definition poses problems for using explicit rules as a mark of the cognitive. This is because, as we have already seen in an earlier section of this chapter, the distinction between local and distributed representations does not serve well to separate classical cognitive science from other approaches.
Furthermore, defining explicit rules in terms of locality does not eliminate connectionism’s need for them (Hadley, 1993). Hadley (1993) argued that there is solid evidence of the human ability to instantaneously learn and apply rules.
Some rule-like behavior cannot be the product of ‘neurally-wired’ rules whose structure is embedded in particular networks, for the simple reason that humans can often apply rules (with considerable accuracy) as soon as they are told the rules. (Hadley, 1993, p. 185)
Hadley proceeded to argue that connectionist architectures need to exhibit such (explicit) rule learning. “The foregoing conclusions present the connectionist with a formidable scientific challenge, which is, to show how general purpose rule following mechanisms may be implemented in a connectionist architecture” (p. 199).
Why is it that, on more careful consideration, it seems that explicit rules are not a mark of the cognitive? It is likely that the assumption that PDP networks acquire implicit knowledge is an example of what has been called gee whiz connectionism (Dawson, 2009). That is, connectionists assume that the internal structure of their networks is neither local nor rule-like, and they rarely test this assumption by conducting detailed interpretations of network representations. When such interpretations are conducted, they can reveal some striking surprises. For instance, the internal structures of networks have revealed classical rules of logic (Berkeley et al., 1995) and classical production rules (Dawson et al., 2000).
The discussion in the preceding paragraphs raises the possibility that connectionist networks can acquire explicit rules. A complementary point can also be made to question explicit rules as a mark of the classical: classical models may not themselves require explicit rules. For instance, classical cognitive scientists view an explicit rule as an encoded representation that is part of the algorithmic level. Furthermore, the reason that it is explicitly represented is that it is not part of the architecture (Fodor & Pylyshyn, 1988). In short, classical theories posit a combination of explicit (algorithmic, or stored program) and implicit (architectural) determinants of cognition. As a result, classical debates about the cognitive architecture can be construed as debates about the implicitness or explicitness of knowledge:
Not only is there no reason why Classical models are required to be rule-explicit but—as a matter of fact—arguments over which, if any, rules are explicitly mentally represented have raged for decades within the Classicist camp. (Fodor & Pylyshyn, p. 60)
To this point, the current section has tacitly employed the context that the distinction between explicit rules and implicit knowledge parallels the distinction between local and distributed representations. However, other contexts are also plausible. For example, classical models may be characterized as employing explicit rules in the sense that they employ a structure/process distinction. That is, classical systems characteristically separate their symbol-holding memories from the rules that modify stored contents.
For instance, the Turing machine explicitly distinguishes its ticker tape memory structure from the rules that are executed by its machine head (Turing, 1936). Similarly, production systems (Anderson, 1983; Newell, 1973) separate their symbolic structures stored in working memory from the set of productions that scan and manipulate expressions. The von Neumann (1958, 1993) architecture by definition separates its memory organ from the other organs that act on stored contents, such as its logical or arithmetical units.
To further establish this alternative context, some researchers have claimed that PDP networks or other connectionist architectures do not exhibit the structure/process distinction. For instance, a network can be considered to be an active data structure that not only stores information, but at the same time manipulates it (Hillis, 1985). From this perspective, the network is both structure and process.
However, it is still the case that the structure/process distinction fails to provide a mark of the classical. The reason for this was detailed in this chapter’s earlier discussion of control processes. That is, almost all PDP networks are controlled by external processes—in particular, learning rules (Dawson & Schopflocher, 1992a; Roy, 2008). This external control takes the form of rules that are as explicit as any to be found in a classical model.
To bring this discussion to a close, I argue that a third context is possible for distinguishing explicit rules from implicit knowledge. This context is the difference between digital and analog processes. Classical rules may be explicit in the sense that they are digital: consistent with the neural all-or-none law (Levitan & Kaczmarek, 1991; McCulloch & Pitts, 1943), as the rule either executes or does not. In contrast, the continuous values of the activation functions used in connectionist networks permit knowledge to be applied to varying degrees. From this perspective, networks are analog, and are not digital.
Again, however, this context also does not successfully provide a mark of the classical. First, one consequence of Church’s thesis and the universal machine is that digital and analogical devices are functionally equivalent, in the sense that one kind of computer can simulate the other (Rubel, 1989). Second, connectionist models themselves can be interpreted as being either digital or analog in nature, depending upon task demands. For instance, when a network is trained to either respond or not, as in pattern classification (Lippmann, 1989) or in the simulation of animal learning (Dawson, 2008), output unit activation is treated as being digital. However, when one is interested in solving a problem in which continuous values are required, as in function approximation (Hornik, Stinchcombe, & White, 1989; Kremer, 1995; Medler & Dawson, 1994) or in probability matching (Dawson et al., 2009), the same output unit activation function is treated as being analog in nature.
In conclusion, though the notion of explicit rules has been proposed to distinguish classical models from other kinds of architectures, a more careful consideration suggests that this approach is flawed. Our analysis suggests, however, that the use of explicit rules does not appear to be a reliable mark of the classical. Regardless of how the notion of explicit rules is defined, it appears that classical architectures do not use such rules exclusively, and it also appears that such rules need to be part of connectionist models of cognition.