♊RoboDual

Important

🌟 Stay up to date at opendrivelab.com!

♊RoboDual

The official implementation of our paper:
Towards Synergistic, Generalized and Efficient Dual-System for Robotic Manipulation

Overview of RoboDual:

Our objective is to develop a synergistic dual-system framework which supplements the generalizability of large-scale pre-trained generalist with the efficient and task-specific adaptation of specialist. (a) The fast specialist policy obsesses real-time and accurate control by aid of the slow yet generalized outcome from the generalist one with large-scale data. (b) RoboDual exhibits significant improvement in terms of performance and efficiency over a single standalone option and surpasses previous state-of-the-arts in the real-robot setting.

Qingwen Bu, Li Chen, et al.

📝 Paper | 🌍 Project Page

📬 Point of contact: Qingwen Bu ( qingwen@opendrivelab.com ) or Li Chen ( ilnehc@opendrivelab.com )

🔥 Highlight

[Auto-regressive Generalist + DIffusion Action Specialist] We introduce a novel approach that integrates generalist and specialist policies into a synergistic framework, dubbed ♊RoboDual, following a dual-system spirit.
[Decoupled Training & Input] The framework facilitates the flexible integration of diverse modalities and allows for the deconstruction of the two models on the aspect of training data, thereby enhancing their individual strengths and capabilities.

Current Endeavors on Dual-systems

The trend of dual-systems for robotics is shown below. In particular, Asynchronous implementations include:

Helix from Figure
HiRT from Tsinghua
LCB from UC Berkeley
RoboDual (This work)

Following RoboDual, the architecture of dual-systems in robotics converges to the 'VLM + Diffusion Transformer' paradigm.

Asynchronous inference with dual-system allows a more decoupled design and enables more flexible and scalable reasoning.

Beyond latents, explicit representations (e.g., coarse action output from the System-2 as in RoboDual) should also be explored!

📢 News

[2025/04] Code of RoboDual released. Check it out!
[2024/10] We released our paper on arXiv.

📌 TODO list

Release checkpoints for reproduction (Scheduled Release Date: Mid-April, 2025)

🎮 Getting Started

(Optional) We use conda to manage the environment.

conda create -n robodual python=3.10 -y
conda activate robodual

Install dependencies.

# Install pytorch
# Look up https://pytorch.org/get-started/previous-versions/ with your cuda version for a correct command
pip install torch torchvision torchaudio

# Clone our repo and pip install to download dependencies
git clone git@github.com:OpenDriveLab/RoboDual.git
cd robodual
pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Install CALVIN simulator.

git clone --recurse-submodules https://github.com/mees/calvin.git
export CALVIN_ROOT=$(pwd)/calvin
cd $CALVIN_ROOT
sh install.sh

⭐ Model Checkpoints

Generalist Policy:
Specialist Policy:

Experiment on CALVIN

☑️ Relevant Files:

Training

vla-scripts/
- train_generalist_calvin.py: Train OpenVLA on CALVIN dataset
- train_specialist_calvin.py: Train DiT specialist with pre-trained generalist
prismatic/vla/datasets/
- calvin_dataset.py: Data loader for CALVIN dataset

Evaluation

vla-scripts/
- evaluate_calvin.py: Initiate evaluation on CALVIN
- dual_sys_evaluation.py: RoboDual-specific core implementation

Model

prismatic/models/policy/:
- diffusion_policy.py: Core implementation of our DiT action expert

1️⃣ Generalist Training

Our generalist model is built upon OpenVLA, first change vla_path to your local path of OpenVLA model.
By default, we employ parameter efficient fine-tuning with LoRA rank 32.
Then initiate triaining with 8 GPUs:

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/train_generalist_calvin.py \
                                 --dataset_name "calvin" \
                                 --run_root_dir "run_log" \

2️⃣ Specialist Training

We do not train generalist with the specialist with an end-to-end manner and find it works equally well on CALVIN. To further train generalist, modify freeze_slow = False in the config.
Start training (100k steps) on CALVIN with 8 GPUs:

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/train_spacialist_calvin.py \
                                 --num_inference_steps 5 \       # sampling steps for DiT
                                 --cond_drop_chance 0.1 \        # condition drop chance for calssifier-free guidance
                                 --with_depth True \             # use depth input
                                 --with_gripper True \           # use gripper-view inputs (both RGB and depth)
                                 --with_tactile True \           # use visuo-tactile input
                                 --batch_size 8 \                # fine-tuning batch size
                                 --learning_rate 1e-4 \          # fine-tuning learning rate
                                 --dataset_name "calvin" \
                                 --run_root_dir "run_log" \

3️⃣ Evaluation

First set your CALVIN_ROOT environment variable wtih:

export CALVIN_ROOT=/path/to/your/calvin_root_path

Start evaluation on CALVIN (multi-GPU is also supported):

torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/evaluate_calvin.py \
                                 --generalist_path "/path/to/calvin_generalist" \
                                 --specialist_path "/path/to/calvin_specialist" \
                                 --with_depth \                 # use depth input
                                 --with_gripper \               # use gripper-view inputs (both RGB and depth)
                                 --with_cfg \                   # enable classifier-free guidance
                                 --log_dir calvin

Please refer to vla-scripts/evaluate_calvin.py for all evaluation options.

📝 Citation

If you find our code or models useful in your work, please cite our paper:

@article{bu2024robodual,
  title={Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation}, 
  author={Qingwen Bu and Hongyang Li and Li Chen and Jisong Cai and Jia Zeng and Heming Cui and Maoqing Yao and Yu Qiao},
  journal={arXiv preprint arXiv:2410.08001},
  year={2024}
}

Acknowledgements

We thank OpenVLA and Latte for their open-sourced work!

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
assets		assets
experiments/robot		experiments/robot
prismatic		prismatic
scripts		scripts
vla-scripts		vla-scripts
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

♊RoboDual

Overview of RoboDual:

📝 Paper | 🌍 Project Page

🔥 Highlight

Current Endeavors on Dual-systems

📢 News

📌 TODO list

🎮 Getting Started

⭐ Model Checkpoints

Experiment on CALVIN

☑️ Relevant Files:

Training

Evaluation

Model

1️⃣ Generalist Training

2️⃣ Specialist Training

3️⃣ Evaluation

📝 Citation

Acknowledgements

About

Uh oh!

Sponsor this project

Contributors 4

Languages

License

OpenDriveLab/RoboDual

Folders and files

Latest commit

History

Repository files navigation

♊RoboDual

Overview of RoboDual:

📝 Paper | 🌍 Project Page

🔥 Highlight

Current Endeavors on Dual-systems

📢 News

📌 TODO list

🎮 Getting Started

⭐ Model Checkpoints

Experiment on CALVIN

☑️ Relevant Files:

Training

Evaluation

Model

1️⃣ Generalist Training

2️⃣ Specialist Training

3️⃣ Evaluation

📝 Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Contributors 4

Languages