We begin with episodic memory, because it is such a major part of our conscious lives, and really of our identities. For example, the movie Total Recall, loosely based on the Philip K. Dick novel We Can Remember it for You Wholesale (wikipedia link), explores this connection between episodic memories and our sense of self. All people with a functioning hippocampus have this remarkable "tape recorder" constantly encoding everything that happens during our waking lives -- we don't have to exert particular effort to recall what happened 20 minutes or a few hours ago -- it is just automatically there. Most people end up forgetting the vast majority of the daily flux of our lives, retaining only the particularly salient or meaningful events.
However, a tiny percentage of otherwise seemingly "normal" people have Exceptional memory (wikipedia link), or hyperthymesia. Interestingly, it is not the hippocampus itself that differentiates these people from you and me -- instead they are characterized by the obsessive rehearsal and retrieval of episodic memories, with areas of the basal ganglia apparently enlarged (which is associated with obsessive compulsive disorder (OCD)). As we'll see in Executive Function, the basal ganglia participate not only in motor control and reinforcement learning, but also the reinforcement of updating and maintenance of active memory. This suggests that in normal human brains, the hippocampus has the raw ability to encode and remember every day of our lives, but most people just don't bother to rehearse these memories to the point where they can all be reliably retrieved. Indeed, a major complaint that these people with exceptional memory have is that they are unable to forget all the unpleasant stuff in life that most people just let go.
So what exactly makes the hippocampus such an exceptionally good episodic memory system? Our investigation begins with failure. Specifically, the failure of a "generic" cortical neural network model of the sort we've been exploring in this textbook to exhibit any kind of useful episodic memory ability. This failure was first documented by McCloskey and Cohen (1989), using a generic backpropagation network trained on the AB-AC paired associate list learning task (Barnes & Underwood, 1959) (Figure 8.1). This task involves learning an initial list of arbitrary word pairs, called the AB list -- for example:
- locomotive - dishtowel
- window - reason
- bicycle - tree
People are tested on their ability to recall the B associate for each A item, and training on the AB list ends when they achieve perfect recall. Then, they start learning the AC list, which involves new associates for the previous A items:
- locomotive - cloud
- window - book
- bicycle - couch
After 1, 5, 10, and 20 iterations of learning this AC list, people are tested on their ability to recall the original AB items, without any additional training on those items. Figure 8.1 shows that there is a significant amount of interference on the AB list as a result of learning the AC items, due to the considerable overlap between the two lists, but even after 20 iterations through the AC items, people can still recall about 50% of the AB list. In contrast, McCloskey & Cohen (1989) showed that the network model exhibited catastrophic interference -- performance on the AB list went to 0% immediately. They concluded that this invalidated all neural network models of human cognition, because obviously people have much better episodic memory abilities.
But we'll see that this kind of whole-sale abandonment of neural networks is unjustified (indeed, the brain is a massive neural network, so there must be some neural network description of any phenomenon, and we take these kind of challenges as informative opportunities to identify the relevant mechanisms). Indeed, in the following exploration we will see that there are certain network parameters that reduce the levels of interference. The most important manipulation required is to increase the level of inhibition so that fewer neurons are active, which reduces the overlap between the internal representation of the AB and AC list items, thus allowing the system to learn AC without overwriting the prior AB memories. We'll then see that the hippocampal system exploits this trick to an extreme degree (along with a few others), making it an exceptionally good episodic memory system.
Exploration of Catastrophic Interference
Open the ABAC simulation and follow the directions there.
The Hippocampus and Pattern Separation / Pattern Completion
The hippocampus is specifically optimized to rapidly record episodic memories using highly sparse representations (i.e., having relatively few neurons active) that minimize overlap (through pattern separation) and thus interference. This idea is consistent with such a large quantity of data, that it is close to established fact (a rarity in cognitive neuroscience). This data includes the basic finding of episodic memory impairments (and particularly in pattern separation) that result from selective hippocampal lesions, the unique features of the hippocampal anatomy, which are so distinctive relative to other brain areas that they cry out for an explanation, and the vast repertoire of neural recording data from different hippocampal areas. We start with an overview of hippocampal anatomy, followed by the neural recording data and an understanding of how relatively sparse neural activity levels also results in pattern separation, which minimizes interference.
The anatomy of the hippocampus proper and the areas that feed into it is shown in Figure 8.2. The hippocampus represents one of two "summits" on top of the hierarchy of interconnected cortical areas (where the bottom are sensory input areas, e.g., primary visual cortex) -- the other such summit is the prefrontal cortex explored in the Executive Function Chapter. Thus, it posesses a critical feature for an episodic memory system: access to a very high-level summary of everything of note going on in your brain at the moment. This information, organized along the dual-pathway dorsal vs. ventral pathways explored in the Perception and Attention Chapter, converges on the parahippocampal (PHC) (dorsal) and perirhinal (PRC) (ventral) areas, which then feed into the entorhinal cortex (EC), and then into the hippocampus proper. The major hippocampal areas include the dentate gyrus (DG) and the areas of "ammon's horn" (cornu ammonis (CA) in latin), CA3 and CA1 (what happened to CA2? turns out it is basically the same as CA3 so we just use that label). All of these strange names have to do with the shapes of these areas, including the term "hippocampus" itself, which refers to the seahorse shape it has in the human brain (hippocampus is greek for seahorse)).
The basic episodic memory encoding story in terms of this anatomy goes like this. The high-level summary of everything in the brain is activated in EC, which then drives the DG and CA3 areas via the perforant pathway -- the end result of this is a highly sparse, distinct pattern of neural firing in CA3, which represents the main "engram" of the hippocampus. The EC also drives activity in CA1, which has the critical feature of being able to then re-activate this same EC pattern all by itself (i.e., an invertible mapping or auto-encoder relationship between CA1 and EC). These patterns of activity then drive synaptic plasticity (learning) in all the interconnected synapses, with the most important being the synaptic connections among CA3 neurons (in the CA3 recurrent pathway), and the connections between CA3 and CA1 (the Schaffer collateral pathway). These plastic changes effectively "glue together" the different neurons in the CA3 engram, and associate them with the CA1 invertible pattern, so that subsequent retrieval of the CA3 engram can then activate the CA1, then EC, and back out to the cortex. Thus, the primary function of the hippocampus is to bind together all the disparate elements of an episode, and then be able to retrieve this conjunctive memory and reinstate it out into the cortex during recall. This is how a memory can come "flooding back" -- it floods back from CA3 to CA1 to EC to cortex, reactivating something approximating the original brain pattern at the time the memory was encoded.
As noted in the introduction, every attempt to simplify and modularize memory in this fashion is inaccurate, and in fact memory encoding is distributed among all the neurons that are active at the time of the episode. For example, learning in the perforant pathway is important for reactivating the CA3 engram from the EC inputs (especially when they represent only a partial memory retrieval cue). In addition, learning all the way through the cortical pathways into and out of the hippocampus "greases" the retrieval process. Indeed, if a memory pattern is reactivated frequently, then these cortical connections can be strong enough to drive reactivation of the full memory, without the benefit of the hippocampus at all. We discuss this consolidation process in detail later. Finally, the retrieval process can be enhanced by controlled retrieval of memory using top-down strategies using the prefrontal cortex. We don't consider this aspect of controlled retrieval here, but it depends on a combination of activation and weight based memory analogous to some features we will explore in Executive Function.
Properties of Hippocampal Neurons: Sparsness, Pattern Separation
A representative picture of a critical difference between the hippocampus (CA3, CA1) and cortex is shown in Figure 8.3, where it is clear that CA3 and CA1 neurons fire much less often than those in the cortex (entorhinal cortex and subiculum). This is what we mean by sparseness in the hippocampal representation -- for any given episode, only relatively few neurons are firing, and conversely, each neuron only fires under a very specific circumstance. In rats, these circumstances tend to be identifiable as spatial locations, i.e., place cells, but this is not generally true of primate hippocampus. This sparseness is thought to result from high levels of GABA inhibition in these areas, keeping many neurons below threshold, and requiring active neurons to receive a relatively high level of excitatory input to overcome this inhibition. The direct benefit of this sparseness is that the engrams for different episodes will overlap less, just from basic probabilities (Figure 8.4). For example, if the probability of a neuron being active for a given episode is 1% (typical of the DG), then the probability for any two random episodes is that value squared, which is .01% (a very small number). In comparison, if the probability is higher, e.g., 25% (typical of cortex), then there is a 6.25% chance of overlap for two episodes. David Marr appears to have been the first one to point out this pattern separation property of sparse representations, in an influential 1971 paper.
The connection between activity levels and pattern separation can also be observed within the hippocampus itself, by comparing the firing properties of DG vs. CA3 neurons, where DG neurons have the sparsest activity levels, even compared to the somewhat less sparse CA3 (roughly 2-5% activity level). Figure 8.5 from a study by Leutgeb et al (2007) shows that the DG exhibits more pattern separation than the CA3, as a function of systematic morphing of an environment from a square to a circle and back again. The DG neurons exhibit a greater variety of neural firing as a function of this environmental change, suggesting that they separate these different environments to a greater extent than the CA3. There are many other compelling demonstrations of pattern separation in various hippocampal areas relative to cortex, and in particular in DG relative to other areas (see e.g., the extensive work of Kesner on this topic).
Another factor that contributes to effective pattern separation is the broad and diffuse connectivity from EC to DG and CA3, via the perforant pathway. This allows many different features in EC to be randomly combined in DG and CA3, enabling them to be sensitive to combinations or conjunctions of inputs. Because of the high inhibitory threshold associated with sparse activations, this means a given neuron in these areas must receive significant excitation from multiple of these diffuse input sources. In other words, these neurons have conjunctive representations.
Pattern separation is important for enabling the hippocampus to rapidly encode novel episodes with a minimum of interference on prior learning, because the patterns of neurons involved overlap relatively little.
Pattern Completion: Cued Recall
While pattern separation is important for encoding new memories, this encoding would be useless unless these memories can be subsequently recalled. This recall process is also known as pattern completion, where a partial retrieval cue triggers the completion of the full original pattern associated with the memory. For example, if I cue you with the question: "did you go to summer camp as a kid?" you can pattern complete from this to memories of summer camp, or not, as the case may be. The amazing thing about human memory is that it is content addressable memory -- any sufficiently specific subset of information can serve as a retrieval cue, enabling recovery of previously-encoded episodic memories. In contrast, memory in a computer is accessed by a memory address or a variable pointer, which has no relationship to the actual content stored in that memory. The modern web search engines like Google demonstrate the importance of content addressability, and function much like the human memory system, taking search terms as retrieval cues to find relevant "memories" (web pages) with related information. As you probably know from searching the web, the more specific you can make your query, the more likely you will retrieve relevant information -- the same principle applies to human memory as well.
In the hippocampus, pattern completion is facilitated by the recurrent connections among CA3 neurons, which glues them together during encoding, such that a subset of CA3 neurons can trigger recall of the remainder. In addition, the synaptic changes during encoding in the perforant pathway make it more likely that the original DG and CA3 neurons will become reactivated by a partial retrieval cue.
Interestingly, there is a direct tension or tradeoff between pattern separation and pattern completion, and the detailed parameters of the hippocampal anatomy can be seen as optimizing this tradeoff (O'Reilly & McClelland, 1994). Pattern separation makes it more likely that the system will treat the retrieval cue like a novel stimulus, and thus encode a new distinct engram pattern in CA3, instead of completing to the old one. Likewise, if the system is too good at pattern completion, it will reactivate old memories instead of encoding new pattern separated ones, for truly novel episodes. Although the anatomical parameters in our model do help to find a good balance between these different forces of completion and separation, it is also likely that the hippocampus benefits from strategic influences from other brain areas, e.g., prefrontal cortex executive control areas, to emphasize either completion or separation depending on whether the current demands require recall or encoding, respectively. We will explore this issue further in the Executive Function Chapter.
Now, let's explore how the hippocampus encodes and recalls memories, using the AB-AC task. Just click on the following exploration and follow the instructions from there:
Complementary Learning Systems
As noted earlier, when McCloskey & Cohen first discovered the phenomenon of catastrophic interference, they concluded that neural networks are fatally flawed and should not be considered viable models of human cognition. This is the same thing that happened with Minksky and Papert in 1969, in the context of networks that lack a hidden layer and thus cannot learn more difficult mappings such as XOR (see Learning Chapter for more details). In both cases, there are ready solutions to these problems, but people seem all too willing to seize upon an excuse to discount the neural network model of the mind. Perhaps it is just too reductionistic or otherwise scary to think that everything that goes on in your brain could really boil down to mere neurons... However, this problem may not be unique to neural networks -- researchers often discount various theories of the mind, including Bayesian models for example, when they don't accord with some pattern of data. The trick is to identify when any given theory is fundamentally flawed given challenging data; the devil is in the details, and oftentimes there are ways to reconcile or refine an existing theory without "throwing out the baby with the bathwater".
Such musings aside, there are (at least) two possible solutions to the catastrophic interference problem. One would be to somehow improve the performance of a generic neural network model in episodic memory tasks, inevitably by reducing overlap in one way or another among the representations that form. The other would be to introduce a specialized episodic memory system, i.e., the hippocampus, which has parameters that are specifically optimized for low-interference rapid learning through pattern separation, while retaining the generic neural network functionality as a model of neocortical learning. The advantage of this latter perspective, known as the complementary learning systems (CLS) framework (McClelland, McNaughton & O'Reilly, 1995; Norman & O'Reilly, 2003), is that the things you do to make the generic neural model better at episodic memory actually interfere with its ability to be a good model of neocortex. Specifically, neocortical learning for things like object recognition (as we saw in the Perception Chapter), and semantic inference (as we'll see in the Language Chapter) really benefit from highly overlapping distributed representations, and slow interleaved learning. These overlapping distributed representations enable patterns of neural activity to encode complex, high-dimensional similarity structures among items (objects, words, etc), which is critical for obtaining a "common sense" understanding of the world. Figure 8.6 summarizes this fundamental tradeoff between statistical or semantic learning (associated with the neocortex) and episodic memory (associated with the hippocampus).
Consistent with this basic tradeoff, people with exceptional episodic memory abilities (as discussed earlier) often suffer from a commensurate difficulty with generalizing knowledge across episodes. Even more extreme, autistic memory savants, who can memorize all manner of detailed information on various topics, generally show an even more profound lack of common sense reasoning and general ability to get by in the real world. In these cases, it was speculated that the neocortex also functions much more like a hippocampus, with sparser activity patterns, resulting in overall greater capacity for memorizing specifics, but correspondingly poor abilities to generalize across experiences to produce common sense reasoning (McClelland, 2000).
Amnesia: Anterograde vs. Retrograde
Having seen how the intact hippocampus functions, you may be wondering what goes wrong to produce amnesia. The hollywood version of amnesia involves getting hit on the head, followed by a complete forgetting of everything you know (e.g., your spouse becomes a stranger). Then of course another good whack restores those memories, but not before many zany hijinks have ensued. In reality, there are many different sources of amnesia, and memory researchers typically focus on the kind that is caused by direct damage to the hippocampus and related structures, known as hippocampal amnesia. The most celebrated case of this is a person known to science as H.M. (Henry Molaison), who had his hippocampus removed to prevent otherwise intractable epilepsy, in 1957. He then developed the inability to learn new episodic information (anterograde amnesia), as well as some degree of forgetting of previously learned knowledge (retrograde amnesia). But he remembered how to talk, the meanings of different words and objects, how to ride a bike, and could learn all manner of new motor skills. This was a clear indication that the hippocampus is critical for learning only some kinds of new knowledge.
More careful studies with HM showed that he could also learn new semantic information, but that this occurred relatively slowly, and the learned knowledge was more brittle in the way it could be accessed, compared to neurologically intact people. This further clarifies that the hippocampus is critical for episodic, but not semantic learning. However, for most people semantic information can be learned initially via the hippocampus, and then more slowly acquired by the neocortex over time. One indication that this process occurs is that HM lost his most recent memories prior to the surgery, more than older memories (i.e., a temporally-graded retrograde gradient, also know as a Ribot gradient). Thus, the older memories had somehow become consolidated outside of the hippocampus, suggesting that this gradual process of the neocortex learning information that is initially encoded in the hippocampus, is actually taking place. We discuss this process in the next section.
Certain drugs can cause a selective case of anterograde amnesia. For example, the benzodiazepines (including the widely-studied drug midazolam) activate GABA inhibitory neurons throughout the brain, but benzodiazepene (GABA-A) receptors are densely expressed in the hippocampus, and because of the high levels of inhibition, it is very sensitive to this. At the right dosage, this inhibition is sufficient to prevent synaptic plasticity from occurring within the hippocampus, to form new memories, but previously-learned memories can still be reactivated. This then gives rise to a more pure case of anterograde, without retrograde, amnesia. Experimentally, midazolam impairs hippocampal-dependent rapid memory encoding but spares other forms of integrative learning such as reinforcement learning (e.g., Hirshman et al, 2001; Frank, O'Reilly & Curran, 2006).
Another source of amnesia comes from Korsakoff's syndrome, typically resulting from lack of vitamin B1 due to long-term alcoholism. This apparently affects parts of the thalamus and the mammillary bodies, which in turn influence the hippocampus via various neuromodulatory pathways, including GABA innervation from the medial septum, which can then influence learning and recall dynamics in the hippocampus.
Memory Consolidation from Hippocampus to Neocortex
Why do we dream? Is there something useful happening in our brains while we sleep, or is it just random noise and jumbled nonsensical associations? Can you actually learn a foreign language while sleeping? Our enduring fascination with the mysteries of sleep and dreaming may explain the excitement surrounding the idea that memories can somehow migrate from the hippocampus to the neocortex while we sleep. This process, known as memory consolidation, was initially motivated by the observation that more recent memories were more likely to be lost when people suffer from acquired amnesia, as in the case of H.M. discussed above. More recently, neural recordings in the hippocampus during wakefulness and sleep have revealed that patterns of activity that occur while a rat is running a maze seem to also be reactivated when the animal is then asleep. However, the measured levels of reactivation are relatively weak compared to the patterns that were active during the actual behavior, so it is not clear how strong of a learning signal could be generated from this. Furthermore, there is considerable controversy over the presence of the temporally-graded retrograde gradients in well-controlled animal studies, raising some doubts about the existence of the consolidation phenomenon in the first place. Nevertheless, on balance it seems safe to conclude that this process does occur at least to some extent, in at least some situations, even if not fully ubiquitous. In humans, slow wave oscillations during non-REM sleep are thought to be associated with memory consolidation. Indeed, one recent study showed that external induction of slow wave oscillations during sleep actually resulted in enhanced subsequent hippocampal-dependent memories for items encoded just prior to sleep (Marshall et al., 2006)
One prediction from the complementary learning systems perspective regarding this consolidation process is that the information encoded in the neocortex will be of a different character to that initially encoded by the hippocampus, due to the very different nature of the learning and representations in these two systems. Thus, to the extent that episodic memories can be encoded in the neocortex, they will become more "semanticized" and generalized, integrating with other existing memories, as compared to the more distinct and crisp pattern separated representations originally encoded in the hippocampus. Available evidence appears to support this idea, for example by comparing the nature of the intact memories from hippocampal amnesics to neurologically intact controls.
Role of Space in the Hippocampus
A large amount of research on the hippocampus takes place in the rat, and spatial navigation is one of the most important behavioral functions for a rat. Thus, it is perhaps not too surprising that the rat hippocampus exhibits robust place cell firing (as shown in the figures above), where individual DG, CA3 and CA1 neurons respond to a particular location in space. A given neuron will have a different place cell location in different environments, and there does not appear to be any kind of topography or other systematic organization to these place cells. This is consistent with the random, diffuse nature of the perforant pathway projections into these areas, and the effects of pattern separation.
More recently, spatial coding in the entorhinal cortex has been discovered, in the form of grid cells. These grid cells form a regular hexagonal lattice or grid over space, and appear to depend on various forms of oscillations. These grid cells may then provide the raw spatial information that gets integrated into the place cells within the hippocampus proper. In addition, head direction cells have been found in a number of different areas that project into the hippocampus, and these cells provide a nice dead reckoning signal about where the rat is facing based on the accumulation of recent movements.
The combination of all these cell types provides a solid basis for spatial navigation in the rat, and various computational models have been developed that show how these different signals can work together to support navigation behavior. An exploration model of this domain will be available in a future edition.
An important property of the hippocampus is an overall oscillation in the rate of neural firing, in the so-called theta frequency band in rats, which ranges from about 8-12 times per second. As shown in Figure 8.7, different areas of the hippocampus are out of phase with each other with respect to this theta oscillation, and this raises the possibility that these phase differences may enable the hippocampus to learn more effectively. Hasselmo et al (2002) argued that this theta phase relationship enables the system to alternate between encoding of new information vs. recall of existing information. This is an appealing idea, because as we discussed earlier, there can be a benefit by altering the hippocampal parameters to optimize encoding or retrieval based on various other kinds of demands.
The emergent software now supports an extension to this basic theta encoding vs. retrieval idea that enables Leabra error-driven learning to shape two different pathways of learning in the hippocampus, all within one standard trial of processing (see KetzMorkondaOReilly13 for the published paper). Each pathway has an effective minus and plus phase activation state (although in fact they share the same plus phase). The main pathway, trained on the standard minus to plus phase difference, involves CA3-driven recall of the corresponding CA1 activity pattern, which can then reactivate EC and so on out to cortex. The second pathway, trained using a special initial phase of settling within the minus phase, is the CA1 <-> EC invertible auto-encoder, which ensures that CA1 can actually reactivate the EC if it is correctly recalled. In our standard hippocampal model explored previously, this auto-encoder pathway is trained in advance on all possible sub-patterns within a single subgroup of EC and CA1 units (which we call a "slot"). This new model suggests how this auto-encoder can instead be learned via the theta phase cycle.
See Hippocampus Theta Phase for details on this theta phase version of the hippocampus, which is recommended to use for any computationally demanding hippocampal applications.
Theta oscillations are also thought to play a critical role in the grid cell activations in the EC layers, and perhaps may also serve to encode temporal sequence information, because place field activity firing shows a theta phase procession, with different place fields firing at different points within the unfolding theta wave. We will cover these topics in greater detail in a subsequent revision.
The Function of the Subiculum
The subiculum is often neglected in theories of hippocampal function, and yet it likely plays various important roles. Anatomically, it is situated in a similar location as the entorhinal cortex (EC) relative to the other hippocampal areas, but instead of being interconnected with neocortical areas, it is interconnected more directly with subcortical areas (Figure 8.2). Thus, by analogy to the EC, we can think of it as the input/output pathway for subcortical information to/from the hippocampus. One very important function that the subiculum may perform is computing the relative novelty of a given situation, and communicating this to the midbrain dopamine systems and thence to basal ganglia, to modulate behavior appropriately (Lisman & Grace, 2005). Novelty can have complex affective consequences, being both anxiogenic (anxiety producing) and motivational for driving further exploration, and generally increases overall arousal levels. The hippocampus is uniquely capable of determining how novel a situation is, taking into account the full conjunction of the relevant spatial and other contextual information. The subiculum could potentially compute novelty by comparing CA1 and EC states during the recall phase of the theta oscillation, for example, but this is purely conjecture at this point. Incorporating this novelty signal is an important goal for future computational models.