As new problems are encountered in a scientific discipline, one approach to dealing with them is to explore alternative paradigms (Kuhn, 1970). One consequence of adopting this approach is to produce a clash of cultures, as the new paradigms compete against the old.
The social structure of science is such that individual scientists will justify the claims for a new approach by emphasizing the flaws of the old, as well as the virtues and goodness of the new. Similarly, other scientists will justify the continuation of the traditional method by minimizing its current difficulties and by discounting the powers or even the novelty of the new. (Norman, 1993, p. 3)
In cognitive science, one example of this clash of cultures is illustrated in the rise of connectionism. Prior to the discovery of learning rules for multilayered networks, there was a growing dissatisfaction with the progress of the classical approach (Dreyfus, 1972). When trained multilayered networks appeared in the literature, there was an explosion of interest in connectionism, and its merits—and the potential for solving the problems of classical cognitive science—were described in widely cited publications (McClelland & Rumelhart, 1986, 1988; Rumelhart & McClelland, 1986c; Schneider, 1987; Smolensky, 1988). In response, defenders of classical cognitive science argued against the novelty and computational power of the new connectionist models (Fodor & McLaughlin, 1990; Fodor & Pylyshyn, 1988; Minsky & Papert, 1988; Pinker & Prince, 1988).
A similar clash of cultures, concerning the debate that arose as part of embodied cognitive science’s reaction to the classical tradition, is explored in more detail in this section. One context for this clash is provided by the research of eminent AI researcher Terry Winograd. Winograd’s PhD dissertation involved programming a computer to understand natural language, the SHRDLU system that operated in a restricted blocks world (Winograd, 1972a, 1972b). SHRDLU would begin with a representation of different shaped and coloured blocks arranged in a scene. A user would type in a natural language command to which the program would respond, either by answering a query about the scene or performing an action that changed the scene. For instance, if instructed “Pick up a big red block,” SHRDLU would comprehend this instruction, execute it, and respond with “OK.” If then told “Find a block that is taller than the one you are holding and put it in the box,” then SHRDLU had to comprehend the words one and it; it would respond “By it I assume you mean the block which is taller than the one I am holding.”
Winograd’s (1972a) program was a prototypical classical system (Harnish, 2002). It parsed input strings into grammatical representations, and then it took advantage of the constraints of the specialized blocks world to map these grammatical structures onto a semantic interpretation of the scene. SHRDLU showed “that if the database was narrow enough the program could be made deep enough to display human-like interactions” (p. 121).
Winograd’s later research on language continued within the classical tradition. He wrote what served as a bible to those interested in programming computers to understand language, Language As a Cognitive Process, Volume 1: Syntax (Winograd, 1983). This book introduced and reviewed theories of language and syntax, and described how those theories had been incorporated into working computer programs. As the title suggests, a second volume on semantics was planned by Winograd. However, this second volume never appeared.
Instead, Winograd’s next groundbreaking book, Understanding Computers and Cognition, was one of the pioneering works in embodied cognitive science and launched a reaction against the classical approach (Winograd & Flores, 1987b). This book explained why Winograd did not continue with a text on the classical approach to semantics, because he had arrived at the opinion that classical accounts of language understanding would never be achieved. “Our position, in accord with the preceding chapters, is that computers cannot understand language” (p. 107).
The reason that Winograd and Flores (1987b) adopted this position was their view that computers are restricted to a rationalist notion of meaning that, in accordance with methodological solipsism (Fodor, 1980), must interpret terms independently of external situations or contexts. Winograd and Flores argued instead for an embodied, radically non-rational account of meaning: “Meaning always derives from an interpretation that is rooted in a situation” (Winograd & Flores, 1987b, p. 111). They took their philosophical inspiration from Heidegger instead of from Descartes.
Winograd and Flores’ (1987b) book was impactful and divisive. For example, the journal Artificial Intelligence published a set of four widely divergent reviews of the book (Clancey, 1987; Stefik & Bobrow, 1987; Suchman, 1987; Vellino, 1987), prefaced by an introduction noting that “when new books appear to be controversial, we try to present multiple perspectives on them.” Winograd and Flores (1987a) also published a response to the four reviews. In spite of its contentious reception, the book paved the way for research in situated cognition (Clancey, 1997), and it is one of the earliest examples of what is now well-established embodied cognitive science.
The rise of the embodied reaction is the first part of the clash of cultures in Norman’s (1993) sociology of cognitive science. A second part is the response of classical cognitive science to the embodied movement, a response that typically involves questioning the adequacy and the novelty of the new paradigm. An excellent example of this aspect of the culture clash is provided in a series of papers published in the journal Cognitive Science in 1993.
This series began with a paper entitled “Situated action: A symbolic interpretation” (Vera & Simon, 1993), which provided a detailed classical response to theories of situated action (SA) or situated cognition, approaches that belong to embodied cognitive science. This response was motivated by Vera and Simon’s (1993) observation that SA theories reject central assumptions of classical cognitive science: situated action research “denies that intelligent systems are correctly characterized as physical symbol systems, and especially denies that symbolic processing lies at the heart of intelligence” (pp. 7–8). Vera and Simon argued in favor of a much different conclusion: that situated action research is essentially classical in nature. “We find that there is no such antithesis: SA systems are symbolic systems, and some past and present symbolic systems are SA systems” (p. 8).
Vera and Simon (1993) began their argument by characterizing the important characteristics of the two positions that they aimed to integrate. Their view of classical cognitive science is best exemplified by the general properties of physical symbol systems (Newell, 1980) that were discussed in Chapter 3, with prototypical examples being early varieties of production systems (Anderson, 1983; Newell, 1973, 1990; Newell & Simon, 1972).
Vera and Simon (1993) noted three key characteristics of physical symbol systems: perceptual processes are used to establish the presence of various symbols or symbolic structures in memory; reasoning processes are used to manipulate internal symbol strings; and finally, the resulting symbol structures control motor actions on the external world. In other words, sense-think-act processing was explicitly articulated. “Sequences of actions can be executed with constant interchange among (a) receipt of information about the current state of the environment (perception), (b) internal processing of information (thinking), and (c) response to the environment (motor activity)” (p. 10).
Critical to Vera and Simon’s (1993) attempt to cast situated action in a classical context was their notion of “symbol.” First, symbols were taken to be some sort of pattern, so that pattern recognition processes could assert that some pattern is a token of a particular symbolic type (i.e., symbol recognition). Second, such patterns were defined as true symbols when,
they can designate or denote. An information system can take a symbol token as input and use it to gain access to a referenced object in order to affect it or be affected by it in some way. Symbols may designate other symbols, but they may also designate patterns of sensory stimuli, and they may designate motor actions. (Vera & Simon, 1993, p. 9)
Vera and Simon (1993) noted that situated action or embodied theories are highly variable and therefore difficult to characterize. As a result, they provided a very general account of the core properties of such theories by focusing on a small number, including Winograd and Flores (1987b). Vera and Simon observed that situated action theories require accounts of behaviour to consider situations or contexts, particularly those involving an agent’s environment. Agents must be able to adapt to ill-posed (i.e., difficult to formalize) situations, and do so via direct and continuously changing interactions with the environment.
Vera and Simon (1993) went on to emphasize six main claims that in their view characterized most of the situated action literature:
- situated action requires no internal representations
- it operates directly with the environment (sense-act rather than sense-think-act)
- it involves direct access to affordances
- it does not use productions
- it exploits a socially defined, not physically defined, environment
- it makes no use of symbols.
With this position, Vera and Simon were situated to critique the claim that the embodied approach is qualitatively different from classical cognitive science. They did so by either arguing against the import of some embodied arguments, or by in essence arguing for the formal equivalence of classical and SA theories. Both of these approaches are in accord with Norman’s (1993) portrayal of a culture clash.
As an example of the first strategy, consider Vera and Simon’s (1993) treatment of the notion of readiness-to-hand. This idea is related to Heidegger’s (1962) concept of Dasein, or being-in-the-world, which is an agent’s sense of being engaged with its world. Part of this engagement involves using “entities,” which Heidegger called equipment, and which are experienced in terms of what cognitive scientists would describe as affordances or potential actions (Gibson, 1979). “Equipment is essentially ‘something-in-order-to’” (Heidegger, 1962, p. 97).
Heidegger’s (1962) position was that when agents experience the affordances of equipment, other properties—such as the physical nature of equipment—disappear. This is readiness-to-hand. “That with which our everyday dealings proximally dwell is not the tools themselves. On the contrary, that with which we concern ourselves primarily is the work” (p. 99). Another example of readiness-to-hand is the blind person’s cane, which is not experienced as such when it is being used to navigate, but is instead experienced as an extension of the person themselves (Bateson, 1972, p. 465): “The stick is a pathway along which transforms of difference are being transmitted.”
Heidegger’s philosophy played a dominant role in the embodied theory proposed by Winograd and Flores (1987b). They took readiness-to-hand as evidence of direct engagement with the world; we only become aware of equipment itself when the structural coupling between world, equipment, and agent breaks down. Winograd and Flores took the goal of designing equipment, such as human-computer interfaces, to be creating artifacts that are invisible to us when they are used. “A successful word processing device lets a person operate on the words and paragraphs displayed on the screen, without being aware of formulating and giving commands” (Winograd & Flores, 1987b, p. 164). The invisibility of artifacts—the readiness-to-hand of equipment—is frequently characterized as being evidence of good design (Dourish, 2001; Norman, 1998, 2002, 2004).
Importantly, readiness-to-hand was also used by Winograd and Flores (1987b) as evidence for rejecting the need for classical representations, and to counter the claim that tool use is mediated by symbolic thinking or planning (Miller, Galanter, & Pribram, 1960). From the classical perspective, it might be expected that an agent is consciously aware of his or her plans; the absence of such awareness, or readiness-to-hand, must therefore indicate the absence of planning. Thus readiness-to-hand reflects direct, non-symbolic links between sensing and acting.
If we focus on concernful activity instead of on detached contemplation, the status of this representation is called into question. In driving a nail with a hammer (as opposed to thinking about a hammer), I need not make use of any explicit representation of the hammer. (Winograd & Flores, 1987b, p. 33)
Vera and Simon (1993, p. 19) correctly noted, though, that our conscious awareness of entities is mute with respect to either the nature or the existence of representational formats: “Awareness has nothing to do with whether something is represented symbolically, or in some other way, or not at all.” That is, consciousness of contents is not a defining feature of physical symbol systems. This position is a deft dismissal of using readiness-to-hand to support an anti-representational position.
After dealing with the implications of readiness-to-hand, Vera and Simon (1993) considered alternate formulations of the critiques raised by situated action researchers. Perhaps the prime concern of embodied cognitive science is that the classical approach emphasizes internal, symbolic processing to the near total exclusion of sensing and acting. We saw in Chapter 3 that production system pioneers admitted that their earlier efforts ignored sensing and acting (Newell, 1990). (We also saw an attempt to rectify this in more recent production system architectures [Meyer et al., 2001; Meyer & Kieras, 1997a, 1997b]).
Vera and Simon (1993) pointed out that the classical tradition has never disagreed with the claim that theories of cognition cannot succeed by merely providing accounts of internal processing. Action and environment are key elements of pioneering classical accounts (Miller, Galanter, & Pribram, 1960; Simon, 1969). Vera and Simon stress this by quoting the implications of Simon’s (1969) own parable of the ant:
The proper study of mankind has been said to be man. But . . . man—or at least the intellective component of man—may be relatively simple; . . . most of the complexity of his behavior may be drawn from his environment, from his search for good designs. (Simon, 1969, p. 83)
Modern critics of the embodied notion of the extended mind (Adams & Aizawa, 2008) continue to echo this response: “The orthodox view in cognitive science maintains that minds do interact with their bodies and their environments” (pp. 1–2).
Vera and Simon (1993) emphasized the interactive nature of classical models by briefly discussing various production systems designed to interact with the world. These included the Phoenix project, a system that simulates the fighting of forest fires in Yellowstone National Park (Cohen et al., 1989), as well as the Navlab system for navigating an autonomous robotic vehicle (Pomerleau, 1991; Thorpe, 1990). Vera and Simon also described a production system for solving the Towers of Hanoi problem, but it was highly scaffolded. That is, its memory for intermediate states of the problem was in the external towers and discs themselves; the production system had neither an internal representation of the problem nor a goal stack to plan its solution. Instead, it solved the problem perceptually, with its productions driven by the changing appearance of the problem over time.
The above examples were used to argue that at least some production systems are situated action models. Vera and Simon (1993) completed their argument by making the parallel argument that some notable situated action theories are symbolic because they are instances of production systems. One embodied theory that received this treatment was Rodney Brooks’ behaviour-based robotics (Brooks, 1991, 1989, 1999, 2002), which was introduced in Chapter 5. To the extent that they agreed that Brooks’ robots do not employ representations, Vera and Simon suggested that this limits their capabilities. “It is consequently unclear whether Brooks and his Creatures are on the right track towards fully autonomous systems that can function in a wider variety of environments” (Vera & Simon, 1993, p. 35).
However, Vera and Simon (1993) went on to suggest that even systems such as Brooks’ robots could be cast in a symbolic mold. If a system has a state that is in some way indexed to a property or entity in the world, then that state should be properly called a symbol. As a result, a basic sense-act relationship that was part of the most simplistic subsumption architecture would be an example of a production for Vera and Simon.
Furthermore, Vera and Simon (1993) argued that even if a basic sense-act relationship is wired in, and therefore there is no need to view it as symbolized, it is symbolic nonetheless:
On the condition end, the neural impulse aroused by the encoded incoming stimuli denotes the affordances that produced these stimuli, while the signals to efferent nerves denote the functions of the actions. There is every reason to regard these impulses and signals as symbols: A symbol can as readily consist of the activation of a neuron as it can of the creation of a tiny magnetic field. (Vera and Simon, 1993, p. 42)
Thus any situated action model can be described in a neutral, symbolic language— as a production system—including even the most reflexive, anti-representational instances of such models.
The gist of Vera and Simon’s (1993) argument, then, was that there is no principled difference between classical and embodied theories, because embodied models that interact with the environment are in essence production systems. Not surprisingly, this position attracted a variety of criticisms.
For example, Cognitive Science published a number of articles in response to the original paper by Vera and Simon (Norman, 1993). One theme apparent in some of these papers was that Vera and Simon’s definition of symbol was too vague to be useful (Agre, 1993; Clancey, 1993). Agre, for instance, accused Vera and Simon not of defending a well-articulated theory, but instead of exploiting an indistinct worldview. He argued that they “routinely claim vindication through some ‘symbolic’ gloss of whatever phenomenon is under discussion. The problem is that just about anything can seem ‘symbolic’ if you look at it right” (Agre, 1993, p. 62).
One example of such vagueness was Vera and Simon’s (1993) definition of a symbol as a “designating pattern.” What do they mean by designate? Designation has occurred if “an information system can take a symbol token as input and use it to gain access to a referenced object in order to affect it or to be affected by it in some way” (Vera & Simon, 1993, p. 9). In other words the mere establishment of a deictic or indexing relationship (Pylyshyn, 1994, 2000, 2001) between the world and some state of an agent is sufficient for Vera and Simon to deem that state “symbolic.”
This very liberal definition of symbolic leads to some very glib characterizations of certain embodied positions. Consider Vera and Simon’s (1993) treatment of affordances as defined in the ecological theory of perception (Gibson, 1979). In Gibson’s theory, affordances—opportunities for action offered by entities in the world—are perceived directly; no intervening symbols or representations are presumed. “When I assert that perception of the environment is direct, I mean that it is not mediated by retinal pictures, neural pictures, or mental pictures” (p. 147). Vera and Simon (1993, p. 20) denied direct perception: “the thing that corresponds to an affordance is a symbol stored in central memory denoting the encoding in functional terms of a complex visual display, the latter produced, in turn, by the actual physical scene that is being viewed.”
Vera and Simon (1993) adopted this representational interpretation of affordances because, by their definition, an affordance designates some worldly state of affairs and must therefore be symbolic. As a result, Vera and Simon redefined the sense-act links of direct perception as indirect sense-think-act processing. To them, affordances were symbols informed by senses, and actions were the consequence of the presence of motor representations. Similar accounts of affordances have been proposed in the more recent literature (Sahin et al., 2007).
While Vera and Simon’s (1993) use of designation to provide a liberal definition of symbol permits a representational account of anti-representational theories, it does so at the expense of neglecting core assumptions of classical models. In particular, other leading classical cognitive scientists adopt a much more stringent definition of symbol that prevents, for instance, direct perception to be viewed as a classical theory. Pylyshyn has argued that cognitive scientists must adopt a cognitive vocabulary in their theories (Pylyshyn, 1984). Such a vocabulary captures regularities by appealing to the contents of representational states, as illustrated in adopting the intentional stance (Dennett, 1987) or in employing theory-theory (Gopnik & Meltzoff, 1997; Gopnik & Wellman, 1992).
Importantly, for Pylyshyn mere designation is not sufficient to define the content of symbols, and therefore is not sufficient to support a classical or cognitive theory. As discussed in detail in Chapter 8, Pylyshyn has developed a theory of vision that requires indexing or designation as a primitive operation (Pylyshyn, 2003c, 2007). However, this theory recognizes that designation occurs without representing the features of indexed entities, and therefore does not establish cognitive content. As a result, indexing is a critical component of Pylyshyn’s theory—but it is also a component that he explicitly labels as being non-representational and non-cognitive.
Vera and Simon’s (1993) vagueness in defining the symbolic has been a central concern in other critiques of their position. It has been claimed that Vera and Simon omit one crucial characteristic in their definition of symbol system: the capability of being a universal computing device (Wells, 1996). Wells (1996) noted in one example that devices such as Brooks’ behavior-based robots are not capable of universal computation, one of the defining properties of a physical symbol system (Newell & Simon, 1976). Wells argues that if a situated action model is not universal, then it cannot be a physical symbol system, and therefore cannot be an instance of the class of classical or symbolic theories.
The trajectory from Winograd’s (1972a) early classical research to his pioneering articulation of the embodied approach (Winograd & Flores, 1987b) and the route from Winograd and Flores’ book to Vera and Simon’s (1993) classical account of situated action to the various responses that this account provoked raise a number of issues.
First, this sequence of publications nicely illustrates Norman’s (1993) description of culture clashes in cognitive science. Dissatisfied with the perceived limits of the classical approach, Winograd and Flores highlighted its flaws and detailed the potential advances of the embodied approach. In reply, Vera and Simon (1993) discounted the differences between classical and embodied theories, and even pointed out how connectionist networks could be cast in the light of production systems.
Second, the various positions described above highlight a variety of perspectives concerning the relationships between different schools of thought in cognitive science. At one extreme, all of these different schools of thought are considered to be classical in nature, because all are symbolic and all fall under a production system umbrella (Vera & Simon, 1993). At the opposite extreme, there are incompatible differences between the three approaches, and supporters of one approach argue for its adoption and for the dismissal of the others (Chemero, 2009; Fodor & Pylyshyn, 1988; Smolensky, 1988; Winograd & Flores, 1987b).
In between these poles, one can find compromise positions in which hybrid models that call upon multiple schools of thought are endorsed. These include proposals in which different kinds of theories are invoked to solve different sorts of problems, possibly at different stages of processing (Clark, 1997; Pylyshyn, 2003c). These also include proposals in which different kinds of theories are invoked simultaneously to co-operatively achieve a full account of some phenomenon (McNeill, 2005).
Third, the debate between the extreme poles appears to hinge on core definitions used to distinguish one position from another. Is situated cognition classical? As we saw earlier, this depends on the definition of symbolic, which is a key classical idea, but it has not been as clearly defined as might be expected (Searle, 1992). It is this third point that is the focus of the remainder of this chapter. What are the key concepts that are presumed to distinguish classical cognitive science from its putative competitors? When one examines these concepts in detail, are they truly distinguished between positions? Or do they instead reveal potential compatibilities between the different approaches to cognitive science?