Skip to content

Hoyyyaard/3DFlowAction

Repository files navigation

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

This repository contains PyTorch implementation for 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

[📖 arXiv] [🤖 model] [📑 dataset]

Overview

Manipulation has been a challenging task for robots, a major obstacle is the lack of a large, uniform dataset for teaching robots manipulation skills. We observe that understanding how objects should move in 3D space is crucial for guiding manipulation actions, and this insight is applicable to both humans and robots. We aim to develop a 3D flow world model, which predicts the future movement of interacting objects in 3D space to guide action planning. We also introduce a flow-guided rendering mechanism that predicts the final state and uses GPT-4o to evaluate whether the predicted flow aligns with the task description, enabling closed-loop planning for robots. The predicted 3D optical flow serves as constraints for an optimization policy that determines the robot's actions for manipulation. Extensive experiments show strong generalization across diverse robotic tasks and effective cross-embodiment adaptation without hardware-specific training.

TODO

  • Release Moving object detection pipeline for BridgeV2
  • Release ManiFlow-110k
  • Release model weight of 3D Flow World Model
  • Release inference code of 3D Flow World Model
  • Release training code of 3D Flow World Model
  • Release realworld robot implement code

Step0: Install environment requirements

Cotracker3, VideoDepthAnything, GroundingSam2

conda env create -f environment.yaml

Step1: Extract 2D optical flow for manipulated object(Moving object detection pipeline)

# We use BridgeV2 as an example to generation task-related 3D Flow
# Source data structure
BridgeV2-Processed
── depth
│   ├── 0_meter.npz
│   ├── 1_meter.npz
├── frames
│   ├── 0.jpg
│   ├── 1.jpg
├── instructions.txt

# Process
cd preprocess/BridgeV2
python moving_obj_det_pipeline_all.py

Step2: Use VideoDepthAnything to estimate depth of frames and Project the 2D flow to 3D space

Step3: Prepare 3D optical flow for training

bash run_scripts/preprocess_bridge_dataset.sh

Step4: Training

run_scripts/train_flow_3d_bridge_wovae_slurm.sh

Step5: Visualization evaluation results

python scripts/flow_generation/viz_3d_flow_batch.py

Step6: Inference using release checkpoints

# Put release checkpoint to results/release/checkpoints/epoch_400
bash run_scripts/inference.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages