Researchers from Google DeepMind have collaborated with Mila, and McGill University defined appropriate reward functions to address the challenge of efficiently training reinforcement learning (RL) agents. The reinforcement learning method uses a rewarding system for achieving desired behaviors and punishing undesired ones. Hence, designing effective reward functions is crucial for RL agents to learn efficiently, but it often requires significant effort from environment designers. The paper proposes leveraging Vision-Language Models (VLMs) to automate the process of generating reward functions.
The existing models that define reward function for RL agents have been a manual and labor-intensive process, often requiring domain expertise. The paper introduces a framework called Code as Reward (VLM-CaR), which utilizes pre-trained VLMs to generate dense reward functions for RL agents automatically. Unlike direct querying of VLMs for rewards, which is computationally expensive and unreliable, VLM-CaR generates reward functions through code generation, significantly reducing the computational burden. With this framework, researchers aimed to provide accurate rewards that are interpretable and can be derived from visual inputs.
VLM-CaR operates in three stages: generating programs, verifying programs, and RL training. In the first stage, pre-trained VLMs are prompted to describe tasks and sub-tasks based on initial and goal images of an environment. The generated descriptions are then used to produce executable computer programs for each sub-task. The programs generated are verified to ensure correctness using expert and random trajectories. After the verification step, the programs act as reward functions for training RL agents. Using the generated reward function, VLM-CaR is trained for RL policies and enables efficient training even in environments with sparse or unavailable rewards.
In conclusion, the proposed method addresses the problem of manually defining reward functions by providing a systematic framework for generating interpretable rewards from visual observations. VLM-CaR demonstrates the potential for significantly improving the training efficiency and performance of RL agents in various environments.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our Telegram Channel
You may also like our FREE AI Courses….
Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.