Skip to main content
Social Sci LibreTexts

5.2.2: Questions About Multimodal Perception

  • Page ID
    224768
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Questions About Multimodal Perception

    Several theoretical problems are raised by multimodal perception. After all, the world is a “blooming, buzzing world of confusion” that constantly bombards our perceptual system with light, sound, heat, pressure, and so forth. To make matters more complicated, these stimuli come from multiple events spread out over both space and time. To return to our example: Let’s say the car crash you observed happened on Main Street in your town. Your perception during the car crash might include a lot of stimulation that was not relevant to the car crash. For example, you might also overhear the conversation of a nearby couple, see a bird flying into a tree, or smell the delicious scent of freshly baked bread from a nearby bakery (or all three!). However, you would most likely not make the mistake of associating any of these stimuli with the car crash. In fact, we rarely combine the auditory stimuli associated with one event with the visual stimuli associated with another (although, under some unique circumstances—such as ventriloquism—we do). How is the brain able to take the information from separate sensory modalities and match it appropriately, so that stimuli that belong together stay together, while stimuli that do not belong together get treated separately? In other words, how does the perceptual system determine which unimodal stimuli must be integrated, and which must not?

    Once unimodal stimuli have been appropriately integrated, we can further ask about the consequences of this integration: What are the effects of multimodal perception that would not be present if perceptual processing were only unimodal? Perhaps the most robust finding in the study of multimodal perception concerns this last question. No matter whether you are looking at the actions of neurons or the behavior of individuals, it has been found that responses to multimodal stimuli are typically greater than the combined response to either modality independently. In other words, if you presented the stimulus in one modality at a time and measured the response to each of these unimodal stimuli, you would find that adding them together would still not equal the response to the multimodal stimulus. This superadditive effect of multisensory integration indicates that there are consequences resulting from the integrated processing of multimodal stimuli.

    The extent of the superadditive effect (sometimes referred to as multisensory enhancement) is determined by the strength of the response to the single stimulus modality with the biggest effect. To understand this concept, imagine someone speaking to you in a noisy environment (such as a crowded party). When discussing this type of multimodal stimulus, it is often useful to describe it in terms of its unimodal components: In this case, there is an auditory component (the sounds generated by the speech of the person speaking to you) and a visual component (the visual form of the face movements as the person speaks to you). In the crowded party, the auditory component of the person’s speech might be difficult to process (because of the surrounding party noise). The potential for visual information about speech—lipreading—to help in understanding the speaker’s message is, in this situation, quite large. However, if you were listening to that same person speak in a quiet library, the auditory portion would probably be sufficient for receiving the message, and the visual portion would help very little, if at all (Sumby & Pollack, 1954). In general, for a stimulus with multimodal components, if the response to each component (on its own) is weak, then the opportunity for multisensory enhancement is very large. However, if one component—by itself—is sufficient to evoke a strong response, then the opportunity for multisensory enhancement is relatively small. This finding is called the Principle of Inverse Effectiveness (Stein & Meredith, 1993) because the effectiveness of multisensory enhancement is inversely related to the unimodal response with the greatest effect.

    Another important theoretical question about multimodal perception concerns the neurobiology that supports it. After all, at some point, the information from each sensory modality is definitely separated (e.g., light comes in through the eyes, and sound comes in through the ears). How does the brain take information from different neural systems (optic, auditory, etc.) and combine it? If our experience of the world is multimodal, then it must be the case that at some point during perceptual processing, the unimodal information coming from separate sensory organs—such as the eyes, ears, skin—is combined. A related question asks where in the brain this integration takes place. We turn to these questions in the next section.

    Biological Bases of Multimodal Perception

    Multisensory Neurons and Neural Convergence

    blue outline of brain .png

    In order for us to perceive the world effectively, neurons from our various senses carry information that is integrated in the brain. [Image: DARPA, https://goo.gl/kat7ws, CC0 Public Domain, https://goo.gl/m25gce]

    A surprisingly large number of brain regions in the midbrain and cerebral cortex are related to multimodal perception. These regions contain neurons that respond to stimuli from not just one, but multiple sensory modalities. For example, a region called the superior temporal sulcus contains single neurons that respond to both the visual and auditory components of speech (Calvert, 2001; Calvert, Hansen, Iversen, & Brammer, 2001). These multisensory convergence zones are interesting, because they are a kind of neural intersection of information coming from the different senses. That is, neurons that are devoted to the processing of one sense at a time—say vision or touch—send their information to the convergence zones, where it is processed together.

    One of the most closely studied multisensory convergence zones is the superior colliculus (Stein & Meredith, 1993), which receives inputs from many different areas of the brain, including regions involved in the unimodal processing of visual and auditory stimuli (Edwards, Ginsburgh, Henkel, & Stein, 1979). Interestingly, the superior colliculus is involved in the “orienting response,” which is the behavior associated with moving one’s eye gaze toward the location of a seen or heard stimulus. Given this function for the superior colliculus, it is hardly surprising that there are multisensory neurons found there (Stein & Stanford, 2008).


    Multi-Modal Perception by Lorin Lachs is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available in our Licensing Agreement.


    This page titled 5.2.2: Questions About Multimodal Perception is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Michael Miguel.