Skip to main content
Social Sci LibreTexts

6.3: Cognitive and computational approaches

  • Page ID
    129526
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Again, the primary aim of the present chapter is to synthesize evidence across domains that hint at the penetrability of perception. Instead of being able to draw a hard line between cognition and perception, evidence continues to mount suggesting perceptual experiences are shaped by temporal and spatial predictions (Rohenkohl, Gould, Pessoa, & Nobre, 2014). The location of this influence, then, is the critical component in debate. Indeed, it would be uncontroversial to find that expectations shape judgements.

    Instead of focusing on the physical regions of the brain that may allow for a particular sequence of neuronal firing, cognitive approaches are rooted in information processing theory. Namely, exogenous stimuli provide particular information that is then transduced and processed through the brain, much like a metaphorical computer. These approaches to object recognition have, like with models of instance theory (Hintzman, 1986), back-propagation (Rumelhart, Hinton, & Williams, 1986) and hierarchical models of associative memory (Fukushima, 1984), illustrated the benefit of representing structures in a way that makes mapping and generalizing patterns feasible. Though it is unlikely that human object recognition and computer vision converge, utilizing these models to combine them with theory of human vision may reveal interesting predictions and advances in the understanding of perception.

    Cognitive Approaches

    Before diving into cognitive theories of object recognition, it is first important to situate visual perception within the cognitive domain. Research within this domain focused on characteristics and patterns of visual perception (Gibson,1950) and dominate processing styles (Navon, 1977). These research enterprises have spurred a great deal of subsequent visual research and continue to inform models today. For instance, Navon (1977) explicated how processing of global features precedes that of local features. This perspective directly maps onto other theories that consider the predictive mind and the quick processing of coarse information through M-pathway channels.

    Early cognitive theories of object recognition were grounded in information processing perspectives. One such perspective came from Biederman (1987) where he posited that objects are made up of reducible units called ‘geons’. These units provide that foundation for the visual system to build up and recognize more complex objects and scenes. This view, much like early semantic models (see McClelland & Rumerlhart, 1981), relied on basic feature detection as the mechanism that allows for complex combinations and processing of visual information. From this perspective, object attributes like size and location do not appear to be integral to the recognition process. For example, researchers in a priming study found that object representations were not affected by removal of attributes or alteration of left-right orientation, suggesting that identification is occurring on the geon level (Biederman & Cooper, 1991; Hummel & Biederman, 1992). Further, controlled cognitive processes (e.g., semantic) cannot account for differences in priming effects. Instead, matched exemplars do not see a recognition advantage (Biederman & Cooper, 1991). While these approaches have yielded important results, there are still limitations in the organization of these attributes and geons, perspective and field of vision variations, and the concentration on a traditional bottom-up sequence that is triggered by individual stimulus properties.

    Several other factors contribute to and moderate successful object recognition. Some work has attempted to extend basic units for recognition approaches by including subliminal priming. These approaches allow for the more specific understanding of how object representations reach the conscious level. For example, without inclusion of semantic effects, visual subliminal priming facilitated object recognition, even in instances in which the objects location had changed (Bar & Biederman, 1998). Other work has demonstrated the effects of color on object recognition, which highlight some important features. Namely, color did not facilitate the identification and recognition of objects that were manufactured or that did not naturally occur in the presented color scheme (Humphrey, Goodale, Jakobson, & Servos, 1994). These findings underscore two important concepts; first, the cascade of processing that allows for integration of more complex information (including color) over the series, and second, the importance of expectation in the identification process. Colors that did not match their typical or predicted form did not facilitate the recognition process (Humphrey et al., 1994). Evidence in this direction supports the notion that prior and expectancies are informing ongoing visual perception. Encountering a mismatch of expectation requires more effortful, controlled processing to make sense of the prediction error.

    Computational Approaches

    Computational models of object recognition include a variety of methods and purposes. Many of these recent models primarily focus on error-reduction or variance mapping as a means to achieve a specific outcome, with little care for cognitive or neurophysiological theories. However, even these models still enable interesting, fruitful tests of object perception and cognitive penetration.

    Attneave (1954) first emphasized that the primary role of visual perception is to process relevant information. From there, he claimed, it becomes clear how repetitive and interdependent the majority of our visual experience is, and how these associations allow for perceptual processes to incorporate higher-order information, as it is purely economical to do so. In spite of this perspective, much of the literature still focused on ascending, feed-forward processing. For example, Marr (1982) developed a prominent theoretical approach to object recognition that emphasized computational methods, the complexity of constructing 3-D representations, and the bottom-up nature of processing stages. Additionally, Marr (1982) emphasized the influence of viewpoint on object perception. Indeed, it seems that viewpoint is an important factor for the object identification and not the object categorization process (for review see Milivojevic, 2012). Again, the bottom-up approach has illuminated a number of facets of the visual perceptual process that are important but is fundamentally excluding how top-down processes interact. To understand how we utilize prior knowledge, how it is integrated, descending neural pathways and the corresponding information must be included.

    As mentioned above, some computational models have focused less on updating or integrating a cognitive theory of visual perception, instead favoring an outcome-related, engineering approach. Machine learning and computer algorithms have given rise to research devoted to creation of technological advances in the categorization and decoding of objects. One study found that stimulus representation cortical patterns can predict the contents of sleep imagery by correlating patterns of hallucinations during sleep with specific patterns of stimuli representations while awake (Horikawa, Tamaki, Miyawaki, & Kamitani, 2013). Another interesting study reconstructed faces by correlating trained faces with patterns of voxel activity (Cowen, Chun, & Kuhl, 2014). However, unlike objects, faces do not contain much variance in the general structure, which makes conservation of integral information after a primary component analysis much more straightforward. These studies highlight how computer-driven methodology can produce meaningful results, but without a theoretical foundation, how these results can inform ongoing debates on human vision is often obfuscated. How are these patterns of neural activity representing different features of objects or faces? How does viewpoint and an object’s position in space and time affect perception and recognition? These questions are fundamental to the understanding of complex object recognition, and necessary to the question of integration of expectation and prediction.

    Other computational models have focused more specifically on the combination of theory and empirical evidence. Such models, like predictive coding models (Friston, 2010; Clark 2013), utilize Bayesian theory, and generative models to propose hierarchical perceptual processing that is integrated with descending connections from high-order cortical structures. Instead of purely processing in a feed-forward manner, the human brain is constantly maintaining a representation of the external environment that is informed by past experience, motivations and emotions, memory, and object values. Predictive coding purposes that optimization of perception and action relies on the minimization of prediction errors with recurrent loops (Friston, 2008; Friston, 2010). This idea harkens back to neurophysiological models of the dynamic, iterative brain (Cunningham & Zelazo, 2007). Here, the predictive coding model informs both computational theories by introducing ways in which information is represented (e.g., prediction error) and how those signals are integrated into ongoing processes.

    Predictive coding reveals how the brain may be economically reducing the processing power required to manage the massive amounts of sensory information. This information is quickly assessed to allow appropriate responding to environment momenta. By understanding that the brain is relying on presuppositions about the organization and probabilities of specific pieces of information, we can begin to make sense of the disparate findings within the psychological study of vision. Whereas bottom-up, hierarchical processes explain how representation units are passed from one area to the next when encountering input, descending pathways hold information about expectations, predictions, and incorporate error (Clark, 2013). This approach parallels models of associative learning whereby bottom-up learning provides necessary cues for encoding, but retrieval is not perfect. Instead, retrieval is related to a number of environmental aspects of the encounter. Research suggests that frequency (Tulving, 1972), emotional value (Carstensen, Fung & Charles, 2003; Teasdale & Russell, 1983), and many other features (for another example, see Storbeck & Clore, 2005) of both internal and external experience overlap and account for variance in accurate retrieval.

    Still evidence has not adequately delineated to what extent and at what level priors and expectations may influence perception and object recognition. Bayesian priors have oft been utilized as a means for incorporating high-level information into processing. But the question of where the priors interact still remains. Moreover, the distinction between types of top-down influences and cognitive penetration may be an important feature to explore (Hohwy, 2017). What kind of information are they holding? Hohwy (2017) describes the ways in which the minimization of prediction error coupled with Bayesian priors can lead to instances of cognitive penetrability. Indeed, there is a theoretical requirement and a practical one to the inclusion of error minimization processes in cases in which expectations are especially strong to encounter particular stimuli. Otherwise, it is difficult to reconcile how information is learned so that it may be integrated into an expectation.

    Further, the iterative nature of processing allows for prediction and corroborating (or not) evidence to occur time and time again. Much like associative memory models, the very repetition across contexts can allow for the decoupling of the stimulus in the original environment but holds onto the cooccurrence of an object and the surrounding environment. The goal of perception is to be accurate on a global level, meaning prediction errors that occur in response to visual illusions should be considered functional for the minimization of error over time (Lupyan, 2015; Purves et al., 2011). Although it is indeed true that low-level, sensory signals are necessary input for the process, the synthesizing of visual input is not passive. Instead, perception has been called a “constructive process of turning various forms of energy (mechanical, chemical, electromagnetic) intoinformationuseful for guiding behavior” (Lupyan, 2015). Moreover, recent research in the emotion field has suggested that language is a major contributor to the emotional cascade. This sentiment has been paralleled in other perceptional processes (Nook et al., 2017). Lupyan and Clark (2015) have proposed that language plays a vital role in visual perceptual processes as well. If language is one top-down constraint on perception, a number of mixed findings within cultural psychology can begin to merge. This represents an informative and interesting perspective for the many cognitive processes that require prediction and perception.

    Conclusion

    In sum, cognitive and computational approaches have incorporated findings from neurophysiology to support understanding the process by which visual object recognition occurs. Cognitive approaches have demonstrated specific units that are represented in the cascade, uncovering the increasingly complex series. Two-dimensional surfaces, patterns, colors, edges, and many other aspects of objects all inform the visual system and aid in the overall identification process. Computational approaches allow for modelling of unobserved phenomena that has extended what was previously understood about the hierarchical nature of the brain. These models have demonstrated how prediction and prediction error can make sense of our complex perceptual system.

    Limits to these approaches

    However, these approaches are also subject to limitations. Namely, cognitive studies which reduce objects to basic units suffer much of the same problem that neurophysiological findings do, they are constrained by only studying aspects in isolation. For instance, Biederman (1991) cannot account for a number of naturally occurring visual “environments”. Point of view, field of vision, emotional or motivational state, attentional biases, and more all interact with the most fundamental features of visual object recognition. Computational cognitive approaches run into slightly different issues. Most models require training from set of stimuli, which can lead to biases. Further, this research has given rise to the technologies that have creeped into nearly every facet of our daily lives. Facial recognition is used to unlock smartphones, and while this may be an efficient means for accessing a handheld device, there are still a number of implications. Importantly, the machine learning training sets may be systematically biased, leading to a biased algorithm and subsequent codification systems that rely on this type of data synthesis.

     


    This page titled 6.3: Cognitive and computational approaches is shared under a not declared license and was authored, remixed, and/or curated by Matthew J. C. Crump via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?