Our goal in this section is to understand just enough about the biology to get an overall sense of how information flows through the visual system, and the basic facts about how different parts of the system operate. This will serve to situate the models that come later, which provide a much more complete picture of each step of information processing.
Figure \(6.1\): The pathway of early visual processing from the retina through lateral geniculate nucleus of the thalamus (LGN) to primary visual cortex (V1), showing how information from the different visual fields (left vs. right) are routed to the opposite hemisphere.
Figure \(6.2\): How the retina compresses information by only responding to areas of contrasting illumination, not solid uniform illumination. The response properties of retinal cells can be summarized by these Difference-of-Gaussian (DoG) filters, with a narrow central region and a wider surround (also called center-surround receptive fields). The excitatory and inhibitory components exactly cancel when both are uniformly illuminated, but when light falls more on the center vs. the surround (or vice-versa), they respond, as illustrated with an edge where illumination transitions between darker and lighter.
Figure 6.1 shows the basic optics and transmission pathways of visual signals, which come in through the retina, and progress to the lateral geniculate nucleus of the thalamus (LGN), and then to primary visual cortex (V1). The primary organizing principles at work here, and in other perceptual modalities and perceptual areas more generally, are:
- Transduction of different information -- in the retina, photoreceptors are sensitive to different wavelengths of light (red = long wavelengths, green = medium wavelengths, and blue = short wavelengths), giving us color vision, but the retinal signals also differ in their spatial frequency (how coarse or fine of a feature they detect -- photoreceptors in the central fovea region can have high spatial frequency = fine resolution, while those in the periphery are lower resolution), and in their temporal response (fast vs. slow responding, including differential sensitivity to motion).
- Organization of information in a topographic fashion -- for example, the left vs. right visual fields are organized into the contralateral hemispheres of cortex -- as the figure shows, signals from the left part of visual space are routed to the right hemisphere, and vice-versa. Information within LGN and V1 is also organized topographically in various ways. This organization generally allows similar information to be contrasted, producing an enhanced signal, and also grouped together to simplify processing at higher levels.
- Extracting relevant signals, while filtering irrelevant ones -- Figure 6.2 shows how retinal cells respond only to contrast, not uniform illumination, by using center-surround receptive fields (e.g., on-center, off-surround, or vice-versa). Only when one part of this receptive field gets different amounts of light compared to the others do these neurons respond. Typically this arises with edges of contrast, where illumination transitions between light and dark, as shown in the figure -- these transitions are the most informative aspects of an image, while regions of constant illumination can be safely ignored. Figure 6.3 shows how these center-surround signals (which are present in the LGN as well) can be integrated together in V1 simple cells to detect the orientation of these edges -- these edge detectors form the basic vocabulary for describing images in V1. It should be easy to see how more complex shapes can then be constructed from these basic line/edge elements. V1 also contains complex cells that build upon the simple cell responses (Figure 6.4), providing a somewhat richer basic vocabulary. The following videos show how we know what these receptive fields look like:
Figure \(6.3\): A V1 simple cell that detects an oriented edge of contrast in the image, by receiving from a line of LGN on-center cells aligned along the edge. The LGN cells will fire due to the differential excitation vs. inhibition they receive (see previous figure), and then will activate the V1 neuron that receives from them.
Figure \(6.4\): Simple and complex cell types within V1 -- the complex cells integrate over the simple cell properties, including abstracting across the polarity (positions of the on vs. off coding regions), and creating larger receptive fields by integrating over multiple locations as well (the V1-Simple-Max cells are only doing this spatial integration). The end stop cells are the most complex, detecting any form of contrasting orientation adjacent to a given simple cell. In the simulator, the V1 simple cells are encoded more directly using gabor filters, which mathematically describe their oriented edge sensitivity.
In the auditory pathway, the cochlear membrane plays an analogous role to the retina, and it also has a topographic organization according to the frequency of sounds, producing the rough equivalent of a fourier transformation of sound into a spectrogram. This basic sound signal is then processed in auditory pathways to extract relevant patterns of sound over time, in much the same way as occurs in vision.
Figure \(6.5\): Felleman & Van Essen's (1991) diagram of the anatomical connectivity of visual processing pathways, starting with retinal ganglion cells (RGC) to the LGN of the thalamus, then primary visual cortex (V1) and on up.
Moving up beyond the primary visual cortex, the perceptual system provides an excellent example of the power of hierarchically organized layers of neural detectors, as we discussed in the Networks Chapter. Figure 6.5 shows the anatomical connectivity patterns of all of the major visual areas, starting from the retinal ganglion cells (RGC) to LGN to V1 and on up. The specific patterns of connectivity allow a hierarchical structure to be extracted, as shown, even though there are many interconnections outside of a strict hierarchy as well.
Figure \(6.6\): Division of What vs Where (ventral vs. dorsal) pathways in visual processing.
Figure 6.6 puts these areas into their anatomical locations, showing more clearly a what vs where (ventral vs dorsal) split in visual processing. The projections going in a ventral direction from V1 to V4 to areas of inferotemporal cortex (IT) (TE, TEO, labeled as PIT for posterior IT in the previous figure) are important for recognizing the identity ("what") of objects in the visual input, while those going up through parietal cortex extract spatial ("where") information, including motion signals in area MT and MST. We will see later in this chapter how each of these visual streams of processing can function independently, and also interact together to solve important computational problems in perception.
Here is a quick summary of the flow of information up the what side of the visual pathway (pictured on the right side of Figure 6.5):
- V1 -- primary visual cortex, which encodes the image in terms of oriented edge detectors that respond to edges (transitions in illumination) along different angles of orientation. We will see in the first simulation in this chapter how these edge detectors develop through self-organizing learning, driven by the reliable statistics of natural images.
- V2 -- secondary visual cortex, which encodes combinations of edge detectors to develop a vocabulary of intersections and junctions, along with many other basic visual features (e.g., 3D depth selectivity, basic textures, etc), that provide the foundation for detecting more complex shapes. These V2 neurons also encode these features in a broader range of locations, starting a process that ends up with IT neurons being able to recognize an object regardless of where it appears in the visual field (i.e., invariant object recognition).
- V4 -- detects more complex shape features, over an even larger range of locations (and sizes, angles, etc).
- IT-posterior (PIT) -- detects entire object shapes, over a wide range of locations, sizes, and angles. For example, there is an area near the fusiform gyrus on the bottom surface of the temporal lobe, called the fusiform face area (FFA), that appears especially responsive to faces. As we saw in the Networks Chapter, however, objects are encoded in distributed representations over a broad range of areas in IT.
- IT-anterior (AIT) -- this is where visual information becomes extremely abstract and semantic in nature -- it can encode all manner of important information about different people, places and things.
In contrast, the where aspect of visual processing going up in a dorsal directly through the parietal cortex (areas MT, VIP, LIP, MST) contains areas that are important for processing motion, depth, and other spatial features.