3.12: Weak Equivalence and the Turing Test

Last updated
Save as PDF

Page ID: 35723

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

There are two fundamentals that follow from accepting the physical symbol system hypothesis (Newell, 1980; Newell & Simon, 1976). First, general human intelligence is the product of rule-governed symbol manipulation. Second, because they are universal machines, any particular physical symbol system can be configured to simulate the behaviour of another physical symbol system.

A consequence of these fundamentals is that digital computers, which are one type of physical symbol system, can simulate another putative member of the same class, human cognition (Newell & Simon, 1961, 1972; Simon, 1969). More than fifty years ago it was predicted “that within ten years most theories in psychology will take the form of computer programs, or of qualitative statements about the characteristics of computer programs” (Simon & Newell, 1958, pp. 7–8). One possible measure of cognitive science’s success is that a leading critic of artificial intelligence has conceded that this particular prediction has been partially fulfilled (Dreyfus, 1992).

There are a number of advantages to using computer simulations to study cognition (Dawson, 2004; Lewandowsky, 1993). The difficulties in converting a theory into a working simulation can identify assumptions that the theory hides. The formal nature of a computer program provides new tools for studying simulated concepts (e.g., proofs of convergence). Programming a theory forces a researcher to provide rigorous definitions of the theory’s components. “Programming is, again like any form of writing, more often than not experimental. One programs, just as one writes, not because one understands, but in order to come to understand.” (Weizenbaum, 1976, p. 108).

However, computer simulation research provides great challenges as well. Chief among these is validating the model, particularly because one universal machine can simulate any other. A common criticism of simulation research is that it is possible to model anything, because modelling is unconstrained:

Just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is based on informed judgment, and how much is convenience? (Oreskes, ShraderFrechette, & Belitz, 1994, p. 644)

Because of similar concerns, mathematical psychologists have argued that computer simulations are impossible to validate in the same way as mathematical models of behaviour (Estes, 1975; Luce, 1989, 1999). Evolutionary biologist John Maynard Smith called simulation research “fact free science” (Mackenzie, 2002).

Computer simulation researchers are generally puzzled by such criticisms, because their simulations of cognitive phenomena must conform to a variety of challenging constraints (Newell, 1980, 1990; Pylyshyn, 1984). For instance, Newell’s (1980, 1990) production system models aim to meet a number of constraints that range from behavioural (flexible responses to environment, goal-oriented, operate in real time) to biological (realizable as a neural system, develop via embryological growth processes, arise through evolution).

In validating a computer simulation, classical cognitive science becomes an intrinsically comparative discipline. Model validation requires that theoretical analyses and empirical observations are used to evaluate both the relationship between a simulation and the subject being simulated. In adopting the physical symbol system hypothesis, classical cognitive scientists are further committed to the assumption that this relation is complex, because it can be established (as argued in Chapter 2) at many different levels (Dawson, 1998; Marr, 1982; Pylyshyn, 1984). Pylyshyn has argued that model validation can take advantage of this and proceed by imposing severe empirical constraints. These empirical constraints involve establishing that a model provides an appropriate account of its subject at the computational, algorithmic, and architectural levels of analysis. Let us examine this position in more detail.

First, consider a relationship between model and subject that is not listed above—a relationship at the implementational level of analysis. Classical cognitive science’s use of computer simulation methodology is a tacit assumption that the physical structure of its models does not need to match the physical structure of the subject being modelled.

The basis for this assumption is the multiple realization argument that we have already encountered. Cognitive scientists describe basic information processes in terms of their functional nature and ignore their underlying physicality. This is because the same function can be realized in radically different physical media. For instance, AND-gates can be created using hydraulic channels, electronic components, or neural circuits (Hillis, 1998). If hardware or technology were relevant—if the multiple realization argument was false—then computer simulations of cognition would be absurd. Classical cognitive science ignores the physical when models are validated. Let us now turn to the relationships between models and subjects that classical cognitive science cannot and does not ignore.

In the most abstract sense, both a model and a modelled agent can be viewed as opaque devices, black boxes whose inner workings are invisible. From this perspective, both are machines that convert inputs or stimuli into outputs or responses; their behaviour computes an input-output function (Ashby, 1956, 1960). Thus the most basic point of contact between a model and its subject is that the input-output mappings produced by one must be identical to those produced by the other. Establishing this fact is establishing a relationship between model and subject at the computational level.

To say that a model and subject are computing the same input-output function is to say that they are weakly equivalent. It is a weak equivalence because it is established by ignoring the internal workings of both model and subject. There are an infinite number of different algorithms for computing the same input-output function (Johnson-Laird, 1983). This means that weak equivalence can be established between two different systems that use completely different algorithms.

Weak equivalence is not concerned with the possibility that two systems can produce the right behaviours but do so for the wrong reasons. Weak equivalence is also sometimes known as Turing equivalence. This is because weak equivalence is at the heart of a criterion proposed by computer pioneer Alan Turing, to determine whether a computer program had achieved intelligence (Turing, 1950). This criterion is called the Turing test.

Turing (1950) believed that a device’s ability to participate in a meaningful conversation was the strongest test of its general intelligence. His test involved a human judge conducting, via teletype, a conversation with an agent. In one instance, the agent was another human. In another, the agent was a computer program. Turing argued that if the judge could not correctly determine which agent was human then the computer program must be deemed to be intelligent. A similiar logic was subscribed to by Descartes (2006). Turing and Descartes both believed in the power of language to reveal intelligence; however, Turing believed that machines could attain linguistic power, while Descartes did not.

A famous example of the application of the Turing test is provided by a model of paranoid schizophrenia, PARRY (Kosslyn, Ball, & Reiser, 1978). This program interacted with a user by carrying on a conversation—it was a natural language communication program much like the earlier ELIZA program (Weizenbaum, 1966). However, in addition to processing the structure of input sentences, PARRY also computed variables related to paranoia: fear, anger, and mistrust. PARRY’s responses were thus affected not only by the user’s input, by also by its evolving affective states. PARRY’s contributions to a conversation became more paranoid as the interaction was extended over time.

A version of the Turing test was used to evaluate PARRY’s performance (Colby et al., 1972). Psychiatrists used teletypes to interview PARRY as well as human 96 Chapter 3 paranoids. Forty practising psychiatrists read transcripts of these interviews in order to distinguish the human paranoids from the simulated ones. They were only able to do this at chance levels. PARRY had passed the Turing test: “We can conclude that psychiatrists using teletyped data do not distinguish real patients from our simulation of a paranoid patient” (p. 220).

The problem with the Turing test, though, is that in some respects it is too easy to pass. This was one of the points of the pioneering conversation-making program, ELIZA (Weizenbaum, 1966), which was developed to engage in natural language conversations. Its most famous version, DOCTOR, modelled the conversational style of an interview with a humanistic psychotherapist. ELIZA’s conversations were extremely compelling. “ELIZA created the most remarkable illusion of having understood the minds of the many people who conversed with it” (Weizenbaum, 1976, p. 189). Weizenbaum was intrigued by the fact that “some subjects have been very hard to convince that ELIZA is not human. This is a striking form of Turing’s test” (Weizenbaum, 1966, p. 42).

However, ELIZA’s conversations were not the product of natural language understanding. It merely parsed incoming sentences, and then put fragments of these sentences into templates that were output as responses. Templates were ranked on the basis of keywords that ELIZA was programmed to seek during a conversation; this permitted ELIZA to generate responses rated as being highly appropriate. “A large part of whatever elegance may be credited to ELIZA lies in the fact that ELIZA maintains the illusion of understanding with so little machinery” (Weizenbaum, 1966, p. 43).

Indeed, much of the apparent intelligence of ELIZA is a contribution of the human participant in the conversation, who assumes that ELIZA understands its inputs and that even strange comments made by ELIZA are made for an intelligent reason.

The ‘sense’ and the continuity the person conversing with ELIZA perceives is supplied largely by the person himself. He assigns meanings and interpretations to what ELIZA ‘says’ that confirm his initial hypothesis that the system does understand, just as he might do with what a fortune-teller says to him. (Weizenbaum, 1976, p. 190)

Weizenbaum believed that natural language understanding was beyond the capability of computers, and also believed that ELIZA illustrated this belief. However, ELIZA was received in a fashion that Weizenbaum did not anticipate, and which was opposite to his intent. He was so dismayed that he wrote a book that served as a scathing critique of artificial intelligence research (Weizenbaum, 1976, p. 2): “My own shock was administered not by any important political figure in establishing his philosophy of science, but by some people who insisted on misinterpreting a piece of work I had done.”

The ease with which ELIZA was misinterpreted—that is, the ease with which it passed a striking form of Turing’s test—caused Weizenbaum (1976) to question most research on the computer simulation of intelligence. Much of Weizenbaum’s concern was rooted in AI’s adoption of Turing’s (1950) test as a measure of intelligence.

An entirely too simplistic notion of intelligence has dominated both popular and scientific thought, and this notion is, in part, responsible for permitting artificial intelligence’s perverse grand fantasy to grow. (Weizenbaum, 1976, p. 203)

However, perhaps a more reasoned response would be to adopt a stricter means of evaluating cognitive simulations. While the Turing test has had more than fifty years of extreme influence, researchers are aware of its limitations and have proposed a number of ways to make it more sensitive (French, 2000).

For instance, the Total Turing Test (French, 2000) removes the teletype and requires that a simulation of cognition be not only conversationally indistinguishable from a human, but also physically indistinguishable. Only a humanoid robot could pass such a test, and only do so by not only speaking but also behaving (in very great detail) in ways indistinguishable from a human. A fictional version of the Total Turing Test is the Voight-Kampff scale described in Dick’s (1968) novel Do Androids Dream of Electric Sheep? This scale used behavioural measures of empathy, including pupil dilation, to distinguish humans from androids.