This repository contains PyTorch implementation for 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
[📖 arXiv] [🤖 model] [📑 dataset]
Manipulation has been a challenging task for robots, a major obstacle is the lack of a large, uniform dataset for teaching robots manipulation skills. We observe that understanding how objects should move in 3D space is crucial for guiding manipulation actions, and this insight is applicable to both humans and robots. We aim to develop a 3D flow world model, which predicts the future movement of interacting objects in 3D space to guide action planning. We also introduce a flow-guided rendering mechanism that predicts the final state and uses GPT-4o to evaluate whether the predicted flow aligns with the task description, enabling closed-loop planning for robots. The predicted 3D optical flow serves as constraints for an optimization policy that determines the robot's actions for manipulation. Extensive experiments show strong generalization across diverse robotic tasks and effective cross-embodiment adaptation without hardware-specific training.
- Release Moving object detection pipeline for BridgeV2
- Release ManiFlow-110k
- Release model weight of 3D Flow World Model
- Release inference code of 3D Flow World Model
- Release training code of 3D Flow World Model
- Release realworld robot implement code
Cotracker3, VideoDepthAnything, GroundingSam2
conda env create -f environment.yaml# We use BridgeV2 as an example to generation task-related 3D Flow
# Source data structure
BridgeV2-Processed
── depth
│ ├── 0_meter.npz
│ ├── 1_meter.npz
├── frames
│ ├── 0.jpg
│ ├── 1.jpg
├── instructions.txt
# Process
cd preprocess/BridgeV2
python moving_obj_det_pipeline_all.pybash run_scripts/preprocess_bridge_dataset.shrun_scripts/train_flow_3d_bridge_wovae_slurm.shpython scripts/flow_generation/viz_3d_flow_batch.py# Put release checkpoint to results/release/checkpoints/epoch_400
bash run_scripts/inference.sh