Synergizing habits and goals with variational Bayes: A new framework for biological and artificial embodied agents

Publié juin 19, 2024

Par Dongqi Han , Researcher

Partagez cette page

Diagrams showing features of habitual behavior (e.g., eating snack when focusing on work) and goal-directed behavior (planning a meal to lose weight). Left: habitual behavior with features like automatic, model-free, and fast; Right: goal-directed behavior with features like thoughtful, model-based, and slow.

In the intertwined worlds of psychology, cognitive neuroscience, and artificial intelligence, scientists continue to pursue the elusive goal of decoding and mimicking human and animal behavior. One of the most intriguing aspects of this research is the interplay between two types of behaviors: habitual and goal directed. Traditionally, these behaviors are believed to be managed by two distinct systems within the brain — habitual behaviors are fast and automatic, while goal-directed behaviors are slow and flexible. However, a recent paper in Nature Communications, «Synergizing Habits and Goals with Variational Bayes (opens in new tab),» by researchers from Microsoft Research Asia (opens in new tab) and collaborators from Okinawa Institute of Science and technology (opens in new tab), introduces a groundbreaking theoretical framework that challenges this traditional view. Instead, it integrates these two types of behaviors using variational Bayesian methods, which involve statistical techniques for updating beliefs or probabilities based on new evidence. In this context, the use of variational Bayesian methods suggests a novel approach to understanding how habitual and goal-oriented behavior interact and influence decision-making processes of biological and artificial embodied agents (hereinafter referred to as “agent”).

The core idea

The paper proposes the Bayesian behavior framework, which aims to enhance the understanding of behavior in sensorimotor tasks. At its core, this framework harnesses variational Bayesian methods to model human and animal actions. The key innovation is the introduction of a pivotal concept: the Bayesian intention variable, designed to bridge habitual behavior and goal-directed behavior. Habitual behaviors are driven by pre-existing distribution of intention shaped by sensory cues rather than explicit goals. In contrast, goal-directed behaviors are guided by a posterior distribution of intention conditioned on specific goals, which is inferred through the minimization of variational free energy.

The authors argue that habitual and goal-directed behaviors should not be treated independently. Instead, these behaviors share neural pathways and can build on each other’s strengths. For example, habitual behaviors, while inflexible, offer finely honed motor skills that goal-directed behaviors can leverage for more complex planning. This synergistic approach comes to fruition through two key mechanisms: first, by minimizing the divergence between the habitual and goal-directed intentions, and second, by combining the prior and posterior intentions into a unified, synergized intention via inverse variance-weighted averaging. This consolidated intention then empowers the agent to effectively engage with its environment.

Diagrams showing a: an overview of the Bayesian behavior framework; b: the framework in learning; c: the framework in behaving. — Figure 2: (a) an overview of the Bayesian behavior framework. (b) and (c): diagrams of the framework in learning and behaving.

Simulation experiments

The framework was tested through simulations in vision-based sensorimotor tasks, specifically using a T-maze environment. The results replicated the observation in neuroscience and psychology experiments.

1. Transition from goal-directed to habitual behavior: The simulations demonstrated that with repetitive trials, an agent’s behavior naturally transitions from slow, goal-directed behavior to faster, habitual behavior. This transition is driven by the increasing precision of habitual intentions, reducing the computational burden on goal-directed processes.

2. Behavior change after reward devaluation: The study also explored how agents adapt their behaviors when the reward values change, mirroring the concept of outcome devaluation in psychology. Agents with extensive training showed more resistance to behavior change, reflecting the robust nature of habitual behaviors.

3. Zero-shot goal-directed planning: The framework demonstrated the ability to tackle new goals without additional training. By leveraging existing habitual behaviors, the agent could efficiently plan and execute new tasks.

Diagrams illustrating the trained agent performing goal-directed planning for unseen goals. a: Illustration of the experimental setting. Unlike the previous habitization experiment, the rewards are the same for the left and right exits. After stage 2 (adaptation), the model is fixed, and we test the agent’s goal-directed planning capacity (stage 3); b: An example agent behavior (movement trajectories of 10 trials in each plot, aerial view) during stage 2; c: Statistics of policy diversity using purely habitual behavior (actions computed by prior intention). Totally 12 agents, trained with different random seeds, are tested for 60 trials for each; d: Statistics of success rate in planning (tested using 12 agents and 10 episodes for each agent in each case) with different kinds of goals; e: Examples of movement trajectories and internal predictions of current and future observations in goal-directed planning. — Figure 3: the trained agent (a-c) can perform goal-directed planning for unseen goals (d,e).

Key insights for cognitive neuroscience

1. How does an agent arbitrate between model-free, habitual behavior and model-based, goal-directed behavior?

The paper proposes that the agent uses a synergized intention, calculated as an inverse variance-weighted average of habitual and goal-directed intentions. This approach inherently measures the uncertainty of behaviors by analyzing the statistical variance of the intention distribution. The framework allows the agent to dynamically and autonomously adjust this balance during training by minimizing free energy and reinforcement learning loss.

2. How does an agent autonomously transfer from slow, goal-directed behavior to fast, habitual behavior with repetitive trials?

The simulations demonstrate that the variance of habitual intention is initially high when adapting to a new task but decreases with repeated trials due to the simplicity of model-free decisions. As the variance decreases, the balance shifts progressively toward habitual intention. A mechanism is introduced to early-stop goal-directed active inference when the synergized intention is precise enough, conserving computational resources while maintaining high behavior precision. This explains why extensive training results in a transition from goal-directed to habitual behavior.

3. How does an agent perform goal-directed planning for a novel goal that has not been trained to accomplish?

The agent should have an internal predictive model of the environment to perform a mental search for motor patterns. The goal-directed intention is inferred with a constraint from habitual intention, using the KL-divergence term in active inference. This constraint ensures that effective goal-directed planning, leveraging well-developed low-level motor skills formed in the habitual intention and the shared policy network. Consequently, the framework allows the agent to efficiently generalize human behavior to novel goals. These answers provide a comprehensive understanding of the dynamic interaction between habitual and goal-directed behaviors, and the mechanisms enabling efficient and flexible behavior in agents.

Broader implications

The implications of this research extend beyond theoretical modeling. In machine learning and AI, this framework can inform the design of more efficient and adaptable systems. For instance, combining reinforcement learning with active inference could enhance the decision-making capabilities of autonomous agents in complex environments.

Conclusion

The paper marks a significant advancement in our understanding of behavior in the context of cognitive science. By integrating habitual and goal-directed behavior through a Bayesian framework, it offers a comprehensive model that balances efficiency and flexibility. This research not only advances theoretical knowledge but also provides new insights for practical applications in AI and robotics.

For those interested in the intricate details and mathematical foundations of this framework, in-depth exploration offered in the full paper is strongly encouraged. As the fields of cognitive science and AI continuously evolve, Microsoft researchers remain committed to embracing innovative perspectives through interdisciplinary endeavors.

Publications connexes