Neural Dynamics and Control Group


A recurrent network model of planning explains hippocampal replay and human behavior
bioRXiv (2023)
K Jensen, G Hennequin<sup>*</sup>, and M Mattar<sup>*</sup>



Abstract

When interacting with complex environments, humans can rapidly adapt their behavior to changes in task or context. To facilitate this adaptation, we often spend substantial periods of time contemplating possible futures before acting. For such planning to be rational, the benefits of planning to future behavior must at least compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where not only actions, but also planning, are controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences drawn from its own policy, which we refer to as ‘rollouts’. Our results demonstrate that this agent learns to plan when planning is beneficial, explaining the empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded in a spatial navigation task, in terms of both their spatial statistics and their relationship to subsequent behavior. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by – and in turn adaptively affect – prefrontal dynamics.


Try and play the game ↓ (you will need a keyboard...) You have 20 seconds to explore the maze (which has cyclic boundaries), find the hidden reward, get teleported, and go back to the reward as many times as you can. At the end of the 20s, a new environment will be generated with a new hidden reward location and you can do all this again (again again!).

(If using the up/down arrows annoyingly also scrolls the page, try holding the Shift key at the same time.)