Skip to content

Official Code Repository for 《InCLET: Large Language Model In-context Learning can Improve Embodied Instruction-following》

Notifications You must be signed in to change notification settings

yuanyaaa/InCLET

Repository files navigation

InCLET: Large Language Model In-context Learning can Improve Embodied Instruction-following

Abstract: Natural language-conditioned reinforcement learning (NLC-RL) empowers embodied agent to complete various tasks following human instruction. However, the unbounded natural language examples still introduce much complexity for the agent that solves concrete RL tasks, which can distract policy learning from completing the task. Consequently, extracting effective task representation from human instruction emerges as the critical component of NLC-RL. While previous methods have attempted to address this issue by learning task-related representation using large language models (LLMs), they highly rely on pre-collected task data and require extra training procedure. In this study, we uncover the inherent capability of LLMs to generate task representations and present a novel method, in-context learning embedding as task representation (InCLET). InCLET is grounded on a foundational finding that LLM in-context learning using trajectories can greatly help represent tasks. We thus firstly employ LLM to imagine task trajectories following the natural language instruction, then use in-context learning of LLM to generate task representations, and finally aggregate and project into a compact low-dimensional task representation. This representation is then used to train a human instruction following agent. We conduct experiments on various embodied control environments and results show that InCLET creates effective task representations. Furthermore, this representation can significantly improve the RL training efficiency, compared to the baseline methods.

Setup

  1. Please finish the following steps to install conda environment and related python packages
    • Package install
    pip install -r requirements.txt
  2. The environments used in this work require MuJoCo, CLEVR-Robot Environment and LLM as dependecies. Please setup them following the instructions:

Using

Training goal-conditioned-policy of InCLET:

Before training the Goal-Conditioned-Policy (GCP), we need to train a TL translator using the process described in step 4. When the traning of TL translator model is completed, please place the model in the designated location:

<project_path>/models/
algorithms/translation/llm_encoder.py
  • FrankaKitchen
    • Instruction following policy: Run this command in shell
    python kitchen_train.py --seed <SEED>
    • The models of goal-conditioned-policy will be saved at kitchen_model.
    • The tensorboard log of goal-conditioned-policy will be saved at kitchen_train.
    • The evaluation result of goal-conditioned-policy will be saved at kitchen_callback.
  • CLEVR-Robot
    • Instruction following policy: Run this command in shell
    python ball_train.py --seed <SEED>
    • The models of goal-conditioned-policy will be saved at ball_model.
    • The tensorboard log of goal-conditioned-policy will be saved at ball_train.
    • The evaluation result of goal-conditioned-policy will be saved at ball_callback.

About

Official Code Repository for 《InCLET: Large Language Model In-context Learning can Improve Embodied Instruction-following》

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages