9.4: Lessons from Natural Computation

Last updated
Save as PDF

Page ID: 21259

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

To sighted human perceivers, visual perception seems easy: we simply look and see. Perhaps this is why pioneers of computer vision took seeing for granted. One student of Marvin Minsky was assigned—as a summer project—the task of programming vision into a computer (Horgan, 1993). Only when such early projects were attempted, and had failed, did researchers realize that the visual system was effortlessly solving astronomically difficult information processing problems.

Visual perception is particularly difficult when one defines its goal as the construction of internal models of the world (Horn, 1986; Marr, 1976, 1982; Ullman, 1979). Such representations, called distal stimuli, must make explicit the threedimensional structure of the world. However, the information from which the distal stimulus is constructed—the proximal stimulus—is not rich enough to uniquely specify 3-D structure. As discussed in Chapter 8, the poverty of proximal stimuli underdetermines visual representations of the world. A single proximal stimulus is consistent with, in principle, an infinitely large number of different world models. The underdetermination of vision makes computer vision such a challenge to artificial intelligence researchers because information has to be added to the proximal stimulus to choose the correct distal stimulus from the many that are possible.

The cognitive revolution in psychology led to one approach for dealing with this problem: the New Look in perception proposed that seeing is a form of problem solving (Bruner, 1957, 1992; Gregory, 1970, 1978; Rock, 1983). General knowledge of the world, as well as beliefs, expectations, and desires, were assumed to contribute to our visual experience of the world, providing information that was missing from proximal stimuli.

The New Look also influenced computer simulations of visual perception. Knowledge was loaded into computer programs to be used to guide the analysis of visual information. For instance, knowledge of the visual appearance of the components of particular objects, such as an air compressor, could be used to guide the segmentation of a raw image of such a device into meaningful parts (Tenenbaum& Barrow, 1977). That is, the computer program could see an air compressor by exploiting its pre-existing knowledge of what it looked like. This general approach—using pre-existing knowledge to guide visual perception—was widespread in the computer science literature of this era (Barrow & Tenenbaum, 1975). Barrow and Tenenbaum’s (1975) review of the state of the art at that time concluded that image segmentation was a low-level interpretation that was guided by knowledge, and they argued that the more knowledge the better.

Barrow and Tenenbaum’s (1975) review described a New Look within computer vision:

Higher levels of perception could involve partitioning the picture into ‘meaningful’ regions, based on models of particular objects, classes of objects, likely events in the world, likely configurations, and even on nonvisual events. Vision might be viewed as a vast, multi-level optimization problem, involving a search for the best interpretation simultaneously over all levels of knowledge. (Barrow & Tenenbaum, 1975, p. 2)

However, around the same time a very different data-driven alternative to computer vision emerged (Waltz, 1975).

Waltz’s (1975) computer vision system was designed to assign labels to regions and line segments in a scene produced by drawing lines and shadows. “These labels describe the edge geometry, the connection or lack of connection between adjacent regions, the orientation of each region in three dimensions, and the nature of the illumination for each region” (p. 21). The goal of the program was to assign one and only one label to each part of a scene that could be labelled, except in cases where a human observer would find ambiguity.

Waltz (1975) found that extensive, general knowledge of the world was not required to assign labels. Instead, all that was required was a propagation of local constraints between neighbouring labels. That is, if two to-be-labelled segments were connected by a line, then the segments had to be assigned consistent labels. Two ends of a line segment could not be labelled in such a way that one end of the line would be given one interpretation and the other end a different interpretation that was incompatible with the first. Waltz found that this approach was very powerful and could be easily applied to novel scenes, because it did not depend on specialized, scene-specific knowledge. Instead, all that was required was a method to determine what labels were possible for any scene location, followed by a method for comparisons between possible labels, in order to choose unique and compatible labels for neighbouring locations.

The use of constraints to filter out incompatible labels is called relaxation labelling (Rosenfeld, Hummel, & Zucker, 1976); as constraints propagate through neighbouring locations in a representation, the representation moves into a stable, lower-energy state by removing unnecessary labels. The discussion of solving Sudoku problems in Chapter 7 illustrates an application of relaxation labelling. Relaxation labelling proved to be a viable data-driven approach to dealing with visual underdetermination.

Relaxation labelling was the leading edge of a broad perspective for understanding vision. This was the natural computation approach to vision (Hildreth, 1983; Marr, 1976, 1982; Marr & Hildreth, 1980; Marr & Nishihara, 1978; Marr, Palm, & Poggio, 1978; Marr & Poggio, 1979; Marr & Ullman, 1981; Richards, 1988; Ullman, 1979). Researchers who endorse the natural computation approach to vision use naïve realism to solve problems of underdetermination.They hypothesize that the visual world is intrinsically structured, and that some of this structure is true of any visual scene. They assume that a visual system that has evolved in such a structured world is able to take advantage of these visual properties to solve problems of underdetermination.

The properties of interest to natural computation researchers are called natural constraints. A natural constraint is a property of the visual world that is almost always true of any location in any scene. For example, a great many visual properties of three-dimensional scenes (depth, texture, color, shading, motion) vary smoothly. This means that two locations very near one another in a scene are very likely to have very similar values for any of these properties. Locations that are further apart will not be as likely to have similar values for these properties.

Natural constraints can be used to solve visual problems of underdetermination by imposing restrictions on scene interpretations. Natural constraints are properties that must be true of an interpretation of a visual scene. They can therefore be used to filter out interpretations consistent with the proximal stimulus but not consistent with the natural constraint. For example, an interpretation of a scene that violated the smoothness constraint, because its visual properties did not vary smoothly in the sense described earlier, could be automatically rejected and never experienced.

The natural computation approach triumphed because it was able to identify a number of different natural constraints for solving a variety of visual problems of underdetermination (for many examples, see Marr, 1982). As in the scene labelling approach described above, the use of natural constraints did not require scene-specific knowledge. Natural computation researchers did not appeal to problem solving or inference, in contrast to the knowledge-based models of an earlier generation (Barrow & Tenenbaum, 1975; Tenenbaum & Barrow, 1977). This was because natural constraints could be exploited using data-driven algorithms, such as neural networks. For instance, one can exploit natural constraints for scene labelling by using processing units to represent potential labels and by defining natural constraints between labels using the connection weights between processors (Dawson, 1991). The dynamics of the signals sent through this network will turn on the units for labels consistent with the constraints and turn off all of the other units.

In the context of the current discussion of the cognitive sciences, the natural computation approach to vision offers an interesting perspective on how a useful synthesis of divergent perspectives is possible. This is because the natural computation approach appeals to elements of classical, connectionist, and embodied cognitive science. Initially, the natural computation approach has strong classical characteristics. It views visual perception as a prototypical representational phenomenon, endorsing sense-think-act processing.

The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal representations by which we capture this information and thus make it available as a basis for decisions about our thoughts and actions. (Marr, 1982, p. 3)

Marr’s theory of early vision proposed a series of different kinds of representations of visual information, beginning with the raw primal sketch and ending with the 2½-D sketch that represented the three-dimensional locations of all visible points and surfaces.

However representational it is, though, the natural computation approach is certainly not limited to the study of what Norman (1980) called the pure cognitive system. For instance, unlike New Look theories of human perception, natural computation theories paid serious attention to the structure of the world. Indeed, natural constraints are not psychological properties, but are instead properties of the world. They are not identified by performing perceptual experiments, but are instead discovered by careful mathematical analyses of physical structures and their optical projections onto images. “The major task of Natural Computation is a formal analysis and demonstration of how unique and correct interpretations can be inferred from sensory data by exploiting lawful properties of the natural world” (Richards, 1988, p. 3). The naïve realism of the natural computation approach forced it to pay careful attention to the structure of the world.

In this sense, the natural computation approach resembles a cornerstone of embodied cognitive science, Gibson’s (1966, 1979) ecological theory of perception. Marr (1982) himself saw parallels between his natural computation approach and Gibson’s theory, but felt that natural computation addressed some flaws in ecological theory. Marr’s criticism was that Gibson rejected the need for representation, because Gibson underestimated the complexity of detecting invariants: “Visual information processing is actually very complicated, and Gibson was not the only thinker who was misled by the apparent simplicity of the act of seeing” (p. 30). In Marr’s view, detecting visual invariants required exploiting natural constraints to build representations from which invariants could be detected and used. For instance, detecting the invariants available in a key Gibsonian concept, the optic flow field, requires applying smoothness constraints to local representations of detected motion (Hildreth, 1983; Marr, 1982).

Strong parallels also exist between the natural computation approach and connectionist cognitive science, because natural computation researchers were highly motivated to develop computer simulations that were biologically plausible. That is, the ultimate goal of a natural computation theory was to provide computational, algorithmic, and implementational accounts of a visual process. The requirement that a visual algorithm be biologically implementable results in a preference for parallel, co-operative algorithms that permit local constraints to be propagated through a network. As a result, most natural computation theories can be translated into connectionist networks.

How is it possible for the natural computation approach to endorse elements of each school of thought in cognitive science? In general, this synthesis of ideas is the result of a very pragmatic view of visual processing. Natural computation researchers recognize that “pure” theories of vision will be incomplete. For instance, Marr (1982) argued that vision must be representational in nature. However, he also noted that these representations are impossible to understand without paying serious attention to the structure of the external world.

Similarly, Marr’s (1982) book, Vision, is a testament to the extent of visual interpretation that can be achieved by data-driven processing. However, data-driven processes cannot deliver a complete visual interpretation. At some point—when, for instance, the 2½-D sketch is linked to a semantic category—higher-order cognitive processing must be invoked. This openness to different kinds of processing is why a natural computation researcher such as Shimon Ullman can provide groundbreaking work on an early vision task such as computing motion correspondence matches (1979) and also be a pioneer in the study of higher-order processes of visual cognition (1984, 2000).

The search for biologically plausible algorithms is another example of the pragmatism of the natural computation approach. Classical theories of cognition have been criticized as being developed in a biological vacuum (Clark, 1989). In contrast, natural computation theories have no concern about eliminating low-level biological accounts from their theories. Instead, the neuroscience of vision is used to inform natural computation algorithms, and computational accounts of visual processing are used to provide alternative interpretations of the functions of visual neurons. For instance, it was only because of his computational analysis of the requirements of edge detection that Marr (1982) was able to propose that the centre-surround cells of the lateral geniculate nucleus were convolving images with difference-of-Gaussian filters.

The pragmatic openness of natural computation researchers to elements of the different approaches to cognitive science seems to markedly contrast with the apparent competition that seems to characterize modern cognitive science (Norman, 1993). One account of this competition might be to view it as a conflict between scientific paradigms (Kuhn, 1970). From this perspective, some antagonism between perspectives is necessary, because newer paradigms are attempting to show how they are capable of replacing the old and of solving problems beyond the grasp of the established framework. If one believes that they are engaged in such an endeavour, then a fervent and explicit rejection of including any of the old paradigm within the new is to be expected.

According to Kuhn (1970), a new paradigm will not emerge unless a crisis has arisen in the old approach. Some may argue that this is exactly the case for classical cognitive science, whose crises have been identified by its critics (Dreyfus, 1972, 1992), and which have led to the new connectionist and embodied paradigms. However, it is more likely that it is premature for paradigms of cognitive science to be battling one another, because cognitive science may very well be pre-paradigmatic, in search of a unifying body of belief that has not yet been achieved.

The position outlined in Chapter 7, that it is difficult to identify a set of core tenets that distinguish classical cognitive science from the connectionist and the embodied approaches, supports this view. Such a view is also supported by the existence of approaches that draw on the different “paradigms” of cognitive science, such as the theory of seeing and visualizing (Pylyshyn, 2003c, 2007) discussed in Chapter 8, and the natural computation theory of vision. If cognitive science were not pre-paradigmatic, then it should be easy to distinguish its different paradigms, and theories that draw from different paradigms should be impossible.

If cognitive science is pre-paradigmatic, then it is in the process of identifying its core research questions, and it is still deciding upon the technical requirements that must be true of its theories. My suspicion is that a mature cognitive science will develop that draws on core elements of all three approaches that have been studied. Cognitive science is still in a position to heed the call of a broadened cognitivism (Miller, Galanter, & Pribram, 1960; Norman, 1980). In order to do so, rather than viewing its current approaches as competing paradigms, it would be better served by adopting the pragmatic approach of natural computation and exploiting the advantages offered by all three approaches to cognitive phenomena.