Skip to content

Latest commit

 

History

History
32 lines (22 loc) · 2.22 KB

prompt_craft.md

File metadata and controls

32 lines (22 loc) · 2.22 KB

December 2023

tl;dr: A pipeline to use ChatGPT for robotics tasks via prompt engineering, and writing high level code for execution. Similar to CaP (code as policies).

Overall impression

Robotics systems, unlike text-only apps, require deep understanding of real-world physics, environmental context, and the ability to perform physical actions.

LLM's out-of-the-box understanding of basic concepts (control, camera geometry, physical form factors) makes it an excellenet choice to build generalizable and user-friendly robotics pipeline.

PromptCraft replaces a specialized engineer-in-the-loop with a user-on-the-loop. --> How to polish the interaction between user and the robot or automate as much as possible is the key to real world application (productization).

PromptCraft is NOT a fully automated process, and needs human on the loop to monitor and intervene in case of unexpected behavior generated by LLM, especially so for safety-critical application.

PromptCraft is not using VLM, but rather only LLM.

Key ideas

  • Pipeline to construct ChatGPT-based robotics app
    • Define high level robot function lib.
    • Prompt with objectives and allowed functions.
    • The user stays on the loop to evaluate.
    • Deployed onto the robot.

Technical details

  • The creation of a high level function library, and listing them in the prompt is a key concept that unlock the ablity to solve robotics app with ChatGPT. This avoids unbounded text-based answer, and avoids API under-specification.
  • The capability to write new functions confers flexibility and robustness to LLMs.
  • The diaglog/conversation ability of ChatGPT is a surprisingly effective vehicle for interactive behavior correction.
  • The user of simulators can be particularly useful to evaluate model's performance before deployment in the real world. --> Simulation (Habitat, AirSim, etc) is the right vehicle to evaluate closed-loop high level task planning.

Notes

  • Application of LLM application on robotics, include visual-language navigation, language-based human-robot interaction, and visual-langauge manipulation control (PerAct, Cliport by Dieter Fox)