Open Vocabulary Mobile Manipulation

Leverage LLMs to generate code policies and VLMs to generate online semantic memory to perform long-horizon tasks.

Pipeline

Open Vocabulary Mobile Manipulation (OVMM): Use LLMs such as Qwen to generate code policies for mobile manipulation tasks.
Plan Generation: We use Abstract Syntax Trees (ASTs) to generate plans from the code policies. Each statement is converted into an equivalent managed operation and added to the task queue.
Semantic Memory: We use vision models such as siglip to generate online semantic memory. This memory is used to perform long-horizon tasks.
Task Execution: We use the task queue to execute the tasks. The task queue is updated based on the semantic memory.

Running OVMM

To run OVMM, you first need launch Docker with stretch_ros2_bridge server on your robot with this command:

cd stretch_ai && bash scripts/run_stretch_ai_ros2_bridge.sh

Then proceed to run OVMM on your PC with this command:

cd stretch_ai && bash scripts/run_stretch_ai_gpu_client.sh
python3 -m stretch.app.run_ovmm --robot_ip $ROBOT_IP --enable-realtime-updates

The robot_ip is used to communicate with the robot.
The enable-realtime-updates flag is used to enable real-time updates from the robot. The server uses a modified slam_toolbox ROS package to send pose graph vertices and edges to the client. The client uses this information to match incoming observations and update the semantic memory.

Prompt-based Task Execution

After running OVMM, you will be prompted to type in a natural language long-horizon task. The robot will then execute the task using the generated code policies and semantic memory.

Example Tasks

Pick and Place: "Pick up the toy and place it inside the box."
Navigation: "Go to the sofa and pick up the remote."
Engagement: "Go to the chair and wave at the person and then crack a joke."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

ovmm.md

ovmm.md

Open Vocabulary Mobile Manipulation

Pipeline

Running OVMM

Prompt-based Task Execution

Example Tasks

Collapse file tree

Files

ovmm.md

Latest commit

History

ovmm.md

File metadata and controls

Open Vocabulary Mobile Manipulation

Pipeline

Running OVMM

Prompt-based Task Execution

Example Tasks