# 3.6: Underdetermination and Innateness

- Page ID
- 21219

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The ability of a device to accept or generate a grammar is central to another computational level analysis of language (Gold, 1967). Gold performed a formal analysis of language learning which revealed a situation that is known as Gold’s paradox (Pinker, 1979). One solution to this paradox is to adopt a position that is characteristic of classical cognitive science, and which we have seen is consistent with its Cartesian roots. This position is that a good deal of the architecture of cognition is innate.

Gold (1967) was interested in the problem of how a system could learn the grammar of a language on the basis of a finite set of example expressions. He considered two different situations in which the learning system could be presented with expressions. In informant learning, the learner is presented with either valid or invalid expressions, and is also told about their validity, i.e., told whether they belong to the grammar or not. In text learning, the only expressions that are presented to the learner are grammatical.

Whether a learner is undergoing informant learning or text learning, Gold (1967) assumed that learning would proceed as a succession of presentations of expressions. After each expression was presented, the language learner would generate a hypothesized grammar. Gold proposed that each hypothesis could be described as being a Turing machine that would either accept the (hypothesized) grammar or generate it. In this formalization, the notion of “learning a language” has become “selecting a Turing machine that represents a grammar” (Osherson, Stob, & Weinstein, 1986).

According to Gold’s (1967) algorithm, a language learner would have a current hypothesized grammar. When a new expression was presented to the learner, a test would be conducted to see if the current grammar could deal with the new expression. If current grammar succeeded, then it remained. If the current grammar failed, then a new grammar—a new Turing machine—would have to be selected.

Under this formalism, when can we say that a grammar has been learned? Gold defined language learning as the identification of the grammar in the limit. When a language is identified in the limit, this means that the current grammar being hypothesized by the learner does not change even as new expressions are encountered. Furthermore, it is expected that this state will occur after a finite number of expressions have been encountered during learning.

In the previous section, we considered a computational analysis in which different kinds of computing devices were presented with the same grammar. Gold (1967) adopted an alternative approach: he kept the information processing constant— that is, he always studied the algorithm sketched above—but he varied the complexity of the grammar that was being learned, and he varied the conditions under which the grammar was presented, i.e., informant learning versus text learning.

In computer science, a formal description of any class of languages (human or otherwise) relates its complexity to the complexity of a computing device that could generate or accept it (Hopcroft & Ullman, 1979; Révész, 1983). This has resulted in a classification of grammars known as the Chomsky hierarchy (Chomsky, 1959a). In the Chomsky hierarchy, the simplest grammars are regular, and they can be accommodated by finite state automata. The next most complicated are context-free grammars, which can be processed by pushdown automata (a device that is a finite state automaton with a finite internal memory). Next are the context-sensitive grammars, which are the domain of linear bounded automata (i.e., a device like a Turing machine, but with a ticker tape of bounded length). The most complex grammars are the generative grammars, which can only be dealt with by Turing machines.

Gold (1967) used formal methods to determine the conditions under which each class of grammars could be identified in the limit. He was able to show that text learning could only be used to acquire the simplest grammar. In contrast, Gold found that informant learning permitted context-sensitive and context-free grammars to be identified in the limit.

Gold’s (1967) research was conducted in a relatively obscure field of theoretical computer science. However, Steven Pinker brought it to the attention of cognitive science more than a decade later (Pinker, 1979), where it sparked a great deal of interest and research. This is because Gold’s computational analysis revealed a paradox of particular interest to researchers who studied how human children acquire language.

Gold’s (1967) proofs indicated that informant learning was powerful enough that a complex grammar can be identified in the limit. Such learning was not possible with text learning. Gold’s paradox emerged because research strongly suggests that children are text learners, not informant learners (Pinker, 1979, 1994, 1999). It is estimated that 99.93 percent of the language to which children are exposed is grammatical (Newport, Gleitman, & Gleitman, 1977). Furthermore, whenever feedback about language grammaticality is provided to children, it is not systematic enough to be used to select a grammar (Marcus, 1993).

Gold’s paradox is that while he proved that grammars complex enough to model human language could not be text learned, children learn such grammars—and do so via text learning! How is this possible?

Gold’s paradox is an example of a problem of underdetermination. In a problem of underdetermination, the information available from the environment is not sufficient to support a unique interpretation or inference (Dawson, 1991). For instance, Gold (1967) proved that a finite number of expressions presented during text learning were not sufficient to uniquely determine the grammar from which these expressions were generated, provided that the grammar was more complicated than a regular grammar.

There are many approaches available for solving problems of underdetermination. One that is most characteristic of classical cognitive science is to simplify the learning situation by assuming that some of the to-be-learned information is already present because it is innate. For instance, classical cognitive scientists assume that much of the grammar of a human language is innately available before language learning begins.

*The child has an innate theory of potential structural descriptions that is sufficiently rich and fully developed so that he is able to determine, from a real situation in which a signal occurs, which structural descriptions may be appropriate to this signal. (Chomsky, 1965, p. 32) *

If the existence of an innate, universal base grammar—a grammar used to create phrase markers—is assumed, then a generative grammar of the type proposed by Chomsky can be identified in the limit (Wexler & Culicover, 1980). This is because learning the language is simplified to the task of learning the set of transformations that can be applied to phrase markers. More modern theories of transformational grammars have reduced the number of transformations to one, and have described language learning as the setting of a finite number of parameters that determine grammatical structure (Cook & Newson, 1996). Again, these grammars can be identified in the limit on the basis of very simple input expressions (Lightfoot, 1989). Such proofs are critical to cognitive science and to linguistics, because if a theory of language is to be explanatorily adequate, then it must account for how language is acquired (Chomsky, 1965).

Rationalist philosophers assumed that some human knowledge must be innate. This view was reacted against by empiricist philosophers who viewed experience as the only source of knowledge. For the empiricists, the mind was a tabula rasa, waiting to be written upon by the world. Classical cognitive scientists are comfortable with the notion of innate knowledge, and have used problems of underdetermination to argue against the modern tabula rasa assumed by connectionist cognitive scientists (Pinker, 2002, p. 78): “The connectionists, of course, do not believe in a blank slate, but they do believe in the closest mechanistic equivalent, a generalpurpose learning device.” The role of innateness is an issue that separates classical cognitive science from connectionism, and will be encountered again when connectionism is explored in Chapter 4.