3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

This repository contains PyTorch implementation for 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

[📖 arXiv] [🤖 model] [📑 dataset]

Overview

Manipulation has been a challenging task for robots, a major obstacle is the lack of a large, uniform dataset for teaching robots manipulation skills. We observe that understanding how objects should move in 3D space is crucial for guiding manipulation actions, and this insight is applicable to both humans and robots. We aim to develop a 3D flow world model, which predicts the future movement of interacting objects in 3D space to guide action planning. We also introduce a flow-guided rendering mechanism that predicts the final state and uses GPT-4o to evaluate whether the predicted flow aligns with the task description, enabling closed-loop planning for robots. The predicted 3D optical flow serves as constraints for an optimization policy that determines the robot's actions for manipulation. Extensive experiments show strong generalization across diverse robotic tasks and effective cross-embodiment adaptation without hardware-specific training.

TODO

Release Moving object detection pipeline for BridgeV2
Release ManiFlow-110k
Release model weight of 3D Flow World Model
Release inference code of 3D Flow World Model
Release training code of 3D Flow World Model
Release realworld robot implement code

Step0: Install environment requirements

Cotracker3, VideoDepthAnything, GroundingSam2

conda env create -f environment.yaml

Step1: Extract 2D optical flow for manipulated object(Moving object detection pipeline)

# We use BridgeV2 as an example to generation task-related 3D Flow
# Source data structure
BridgeV2-Processed
── depth
│   ├── 0_meter.npz
│   ├── 1_meter.npz
├── frames
│   ├── 0.jpg
│   ├── 1.jpg
├── instructions.txt

# Process
cd preprocess/BridgeV2
python moving_obj_det_pipeline_all.py

Step2: Use VideoDepthAnything to estimate depth of frames and Project the 2D flow to 3D space

Step3: Prepare 3D optical flow for training

bash run_scripts/preprocess_bridge_dataset.sh

Step4: Training

run_scripts/train_flow_3d_bridge_wovae_slurm.sh

Step5: Visualization evaluation results

python scripts/flow_generation/viz_3d_flow_batch.py

Step6: Inference using release checkpoints

# Put release checkpoint to results/release/checkpoints/epoch_400
bash run_scripts/inference.sh

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
doc		doc
example_bridge_data		example_bridge_data
im2flow2act		im2flow2act
preprocess/BridgeV2		preprocess/BridgeV2
run_scripts		run_scripts
scripts		scripts
tapnet		tapnet
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

Overview

TODO

Step0: Install environment requirements

Step1: Extract 2D optical flow for manipulated object(Moving object detection pipeline)

Step2: Use VideoDepthAnything to estimate depth of frames and Project the 2D flow to 3D space

Step3: Prepare 3D optical flow for training

Step4: Training

Step5: Visualization evaluation results

Step6: Inference using release checkpoints

About

Uh oh!

Releases

Packages

Languages

Hoyyyaard/3DFlowAction

Folders and files

Latest commit

History

Repository files navigation

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

Overview

TODO

Step0: Install environment requirements

Step1: Extract 2D optical flow for manipulated object(Moving object detection pipeline)

Step2: Use VideoDepthAnything to estimate depth of frames and Project the 2D flow to 3D space

Step3: Prepare 3D optical flow for training

Step4: Training

Step5: Visualization evaluation results

Step6: Inference using release checkpoints

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages