5.9: Robotic Moments in Social Environment

Last updated
Save as PDF

Page ID: 41183

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The embodied approach has long recognized that an agent’s environment is much more that a static array of stimuli (Gibson, 1979; Neisser, 1976; Scribner & Tobach, 1997; Vygotsky, 1986). “The richest and most elaborate affordances of the environment are provided by other animals and, for us, other people” (Gibson, 1979, p. 135). A social environment is a rich source of complexity and ranges from dynamic interactions with other agents to cognitive scaffolding provided by cultural conventions. “All higher mental processes are primarily social phenomena, made possible by cognitive tools and characteristic situations that have evolved in the course of history” (Neisser, 1976, p. 134).

In the most basic sense of social, multiple agents in a shared world produce a particularly complex source of feedback between each other’s actions. “What the other animal affords the observer is not only behavior but also social interaction. As one moves so does the other, the one sequence of action being suited to the other in a kind of behavioral loop” (Gibson, 1979, p. 42).

Grey Walter (1963) explored such behavioral loops when he placed two Tortoises in the same room. Mounted lights provided particularly complex stimuli in this case, because robot movements would change the position of the two lights, which in turn altered subsequent robot behaviors. In describing a photographic record of one such interaction, Grey Walter called the social dynamics of his machines,

the formation of a cooperative and a competitive society.... When the two creatures are released at the same time in the dark, each is attracted by the other’s headlight but each in being attracted extinguishes the source of attraction to the other. The result is a stately circulating movement of minuet-like character; whenever the creatures touch they become obstacles and withdraw but are attracted again in rhythmic fashion. (Holland, 2003a, p. 2104)

Similar behavioral loops have been exploited to explain the behavior of larger collections of interdependent agents, such as flocks of flying birds or schools of swimming fish (Nathan & Barbosa, 2008; Reynolds, 1987). Such an aggregate presents itself as another example of a superorganism, because the synchronized movements of flock members give “the strong impression of intentional, centralized control” (Reynolds, 1987, p. 25). However, this impression may be the result of local, stigmergic interactions in which an environment chiefly consists of other flock members in an agent’s immediate vicinity.

In his pioneering work on simulating the flight of a flock of artificial birds, called boids, Reynolds (1987) created lifelike flocking behavior by having each independently flying boid adapt its trajectory according to three simple rules: avoid collision with nearby flock mates, match the velocity of nearby flock mates, and stay close to nearby flock mates. A related model (Couzin et al., 2005) has been successfully used to predict movement of human crowds (Dyer et al., 2008; Dyer et al., 2009; Faria et al., 2010).

However, many human social interactions are likely more involved than the simple behavioral loops that defined the social interactions amongst Grey Walter’s (1963) Tortoises or the flocking behavior of Reynolds’ (1987) boids. These interactions are possibly still behavioral loops, but they may be loops that involve processing special aspects of the social environment. This is because it appears that the human brain has a great deal of neural circuitry devoted to processing specific kinds of social information.

Social cognition is fundamentally involved with how we understand others (Lieberman, 2007). One key avenue to such understanding is our ability to use and interpret facial expressions (Cole, 1998; Etcoff & Magee, 1992). There is a long history of evidence that indicates that our brains have specialized circuitry for processing faces. Throughout the eighteenth and nineteenth centuries, there were many reports of patients whose brain injuries produced an inability to recognize faces but did not alter the patients’ ability to identify other visual objects. This condition was called prosopagnosia, for “face blindness,” by German neuroscientist Joachim Bodamer in a famous 1947 manuscript (Ellis & Florence, 1990). In the 1980s, recordings from single neurons in the monkey brain revealed cells that appeared to be tailored to respond to specific views of monkey faces (Perrett, Mistlin, & Chitty, 1987; Perrett, Rolls, & Caan, 1982). At that time, though, it was unclear whether analogous neurons for face processing were present in the human brain.

Modern brain imaging techniques now suggest that the human brain has an elaborate hierarchy of co-operating neural systems for processing faces and their expressions (Haxby, Hoffman, & Gobbini, 2000, 2002). Haxby, Hoffman, and Gobbini (2000, 2002) argue for the existence of multiple, bilateral brain regions involved in different face perception functions. Some of these are core systems that are responsible for processing facial invariants, such as relative positions of the eyes, nose, and mouth, which are required for recognizing faces. Others are extended systems that process dynamic aspects of faces in order to interpret, for instance, the meanings of facial expressions. These include subsystems that co-operatively account for lip reading, following gaze direction, and assigning affect to dynamic changes in expression.

Facial expressions are not the only source of social information. Gestures and actions, too, are critical social stimuli. Evidence also suggests that mirror neurons in the human brain (Gallese et al., 1996; Iacoboni, 2008; Rizzolatti & Craighero, 2004; Rizzolatti, Fogassi, & Gallese, 2006) are specialized for both the generation and interpretation of gestures and actions.

Mirror neurons were serendipitously discovered in experiments in which motor neurons in region F5 were recorded when monkeys performed various reaching actions (Di Pellegrino et al., 1992). By accident, it was discovered that many of the neurons that were active when a monkey performed an action also responded when similar actions were observed being performed by another:

After the initial recording experiments, we incidentally observed that some experimenter’s actions, such as picking up the food or placing it inside the testing box, activated a relatively large proportion of F5 neurons in the absence of any overt movement of the monkey. (Di Pellegrino et al., 1992, p. 176)

The chance discovery of mirror neurons has led to an explosion of research into their behavior (Iacoboni, 2008). It has been discovered that when the neurons fire, they do so for the entire duration of the observed action, not just at its onset. They are grasp specific: some respond to actions involving precision grips, while others respond to actions involving larger objects. Some are broadly tuned, in the sense that they will be triggered when a variety of actions are observed, while others are narrowly tuned to specific actions. All seem to be tuned to object-oriented action: a mirror neuron will respond to a particular action on an object, but it will fail to respond to the identical action if no object is present.

While most of the results described above were obtained from studies of the monkey brain, there is a steadily growing literature indicating that the human brain also has a mirror system (Buccino et al., 2001; Iacoboni, 2008).

Mirror neurons are not solely concerned with hand and arm movements. For instance, some monkey mirror neurons respond to mouth movements, such as lip smacking (Ferrari et al., 2003). Similarly, the human brain has a mirror system for the act of touching (Keysers et al., 2004). Likewise, another part of the human brain, the insula, may be a mirror system for emotion (Wicker et al., 2003). For example, it generates activity when a subject experiences disgust, and also when a subject observes the facial expressions of someone else having a similar experience.

Two decades after its discovery, extensive research on the mirror neuron system has led some researchers to claim that it provides the neural substrate for social cognition and imitative learning (Gallese & Goldman, 1998; Gallese, Keysers, & Rizzolatti, 2004; Iacoboni, 2008), and that disruptions of this system may be responsible for autism (Williams et al., 2001). The growing understanding of the mirror system and advances in knowledge about the neuroscience of face perception have heralded a new interdisciplinary research program, called social cognitive neuroscience (Blakemore, Winston, & Frith, 2004; Lieberman, 2007; Ochsner & Lieberman, 2001).

It may once have seemed foolhardy to work out connections between fundamental neurophysiological mechanisms and highly complex social behavior, let alone to decide whether the mechanisms are specific to social processes. However... neuroimaging studies have provided some encouraging examples. (Blakemore, Winston, & Frith, 2004, p. 216)

The existence of social cognitive neuroscience is a consequence of humans evolving, embodied and situated, in a social environment that includes other humans and their facial expressions, gestures, and actions. The modern field of sociable robotics (Breazeal, 2002) attempts to develop humanoid robots that are also socially embodied and situated. One purpose of such robots is to provide a medium for studying human social cognition via forward engineering.

A second, applied purpose of sociable robotics is to design robots to work co-operatively with humans by taking advantage of a shared social environment. Breazeal (2002) argued that because the human brain has evolved to be expert in social interaction, “if a technology behaves in a socially competent manner, we evoke our evolved social machinery to interact with it” (p. 15). This is particularly true if a robot’s socially competent behavior is mediated by its humanoid embodiment, permitting it to gesture or to generate facial expressions. “When a robot holds our gaze, the hardwiring of evolution makes us think that the robot is interested in us. When that happens, we feel a possibility for deeper connection” (Turkle, 2011, p. 110). Sociable robotics exploits the human mechanisms that offer this deeper connection so that humans won’t require expert training in interacting with sociable robots.

A third purpose of sociable robotics is to explore cognitive scaffolding, which in this literature is often called leverage, in order to extend the capabilities of robots. For instance, many of the famous platforms of sociable robotics—including Cog (Brooks et al., 1999; Scassellati, 2002), Kismet (Breazeal, 2002, 2003, 2004), Domo (Edsinger-Gonzales & Weber, 2004), and Leanardo (Breazeal, Gray, & Berlin, 2009)—are humanoid in form and are social learners—their capabilities advance through imitation and through interacting with human partners. Furthermore, the success of the robot’s contribution to the shared social environment leans heavily on the contributions of the human partner. “Edsinger thinks of it as getting Domo to do more ‘by leveraging the people.’ Domo needs the help. It understands very little about any task as a whole” (Turkle, 2011, p. 157).

The leverage exploited by a sociable robot takes advantage of behavioral loops mediated by the expressions and gestures of both robot and human partner. For example, consider the robot Kismet (Breazeal, 2002). Kismet is a sociable robotic “infant,” a dynamic, mechanized head that participates in social interactions. Kismet has auditory and visual perceptual systems that are designed to perceive social cues provided by a human “caregiver.” Kismet can also deliver such social cues by changing its facial expression, directing its gaze to a location in a shared environment, changing its posture, and vocalizing.

When Kismet is communicating with a human, it uses the interaction to fulfill internal drives or needs (Breazeal, 2002). Kismet has three drives: a social drive to be in the presence of and stimulated by people, a stimulation drive to be stimulated by the environment in general (e.g., by colorful toys), and a fatigue drive that causes the robot to “sleep.” Kismet sends social signals to satisfy these drives. It can manipulate its facial expression, vocalization, and posture to communicate six basic emotions: anger, disgust, fear, joy, sorrow, and surprise. These expressions work to meet the drives by manipulating the social environment in such a way that the environment changes to satisfy Kismet’s needs.

For example, an unfulfilled social drive causes Kismet to express sadness, which initiates social responses from a caregiver. When Kismet perceives the caregiver’s face, it wiggles its ears in greeting, and initiates a playful dialog to engage the caregiver. Kismet will eventually habituate to these interactions and then seek to fulfill a stimulation drive by coaxing the caregiver to present a colourful toy. However, if this presentation is too stimulating—if the toy is presented too closely or moved too quickly—the fatigue drive will produce changes in Kismet’s behaviour that attempt to decrease this stimulation. If the world does not change in the desired way, Kismet will end the interaction by “sleeping.” “But even at its worst, Kismet gives the appearance of trying to relate. At its best, Kismet appears to be in continuous, expressive conversation” (Turkle, 2011, p. 118).

Kismet’s behavior leads to lengthy, dynamic interactions that are realistically social. A young girl interacting with Kismet “becomes increasingly happy and relaxed. Watching girl and robot together, it is easy to see Kismet as increasingly happy and relaxed as well. Child and robot are a happy couple” (Turkle, 2011, p. 121). Similar results occur when adults converse with Kismet. “One moment, Rich plays at a conversation with Kismet, and the next, he is swept up in something that starts to feel real” (p. 154).

Even the designer of a humanoid robot can be “swept up” by their interactions with it. Domo (Edsinger-Gonzales & Weber, 2004) is a limbed humanoid robot that is intended to be a physical helper, by performing such actions as placing objects on shelves. It learns to behave by physically interacting with a human teacher. These physical interactions give even sophisticated users—including its designer, Edsinger—a strong sense that Domo is a social creature. Edsinger finds himself vacillating back and forth between viewing Domo as a creature or as being merely a device that he has designed.

For Edsinger, this sequence—experiencing Domo as having desires and then talking himself out of the idea—becomes familiar. For even though he is Domo’s programmer, the robot’s behavior has not become dull or predictable.Working together, Edsigner and Domo appear to be learning from each other. (Turkle, 2011, p. 156)

That sociable robots can generate such strong reactions within humans is potentially concerning. The feeling of the uncanny occurs when the familiar is presented in unfamiliar form (Freud, 1976). The uncanny results when standard categories used to classify the world disappear (Turkle, 2011). Turkle (2011) called one such instance, when a sociable robot is uncritically accepted as a creature, the robotic moment. Edsinger’s reactions to Domo illustrated its occurrence: “And this is where we are in the robotic moment. One of the world’s most sophisticated robot ‘users’ cannot resist the idea that pressure from a robot’s hand implies caring” (p. 160).

At issue in the robotic moment is a radical recasting of the posthuman (Hayles, 1999). “The boundaries between people and things are shifting” (Turkle, 2011, p. 162). The designers of sociable robots scaffold their creations by taking advantage of the expert social abilities of humans. The robotic moment, though, implies a dramatic rethinking of what such human abilities entail. Might human social interactions be reduced to mere sense-act cycles of the sort employed in devices like Kismet? “To the objection that a robot can only seem to care or understand, it has become commonplace to get the reply that people, too, may only seem to care or understand” (p. 151).

In Hayles’ (1999) definition of posthumanism, the body is dispensable, because the essence of humanity is information. But this is an extremely classical view. An alternative, embodied posthumanism is one in which the mind is dispensed with, because what is fundamental to humanity is the body and its engagement with reality. “From its very beginnings, artificial intelligence has worked in this space between a mechanical view of people and a psychological, even spiritual, view of machines” (Turkle, 2011, p. 109). The robotic moment leads Turkle to ask “What will love be? And what will it mean to achieve ever-greater intimacy with our machines? Are we ready to see ourselves in the mirror of the machine and to see love as our performances of love?” (p. 165).