Skip to content

isaac-sim/IsaacLabEureka

Repository files navigation

Isaac Lab Eureka

IsaacSim Isaac Lab Python Linux platform Windows platform License

Overview

This repository is an implementation of Eureka: Human-Level Reward Design via Coding Large Language Models in Isaac Lab. It prompts an LLM to discover and tune reward functions automatically for your specific task.

We support the native Openai and the Azure Openai APIs.

Installation

  • Make sure that you have either an Openai API or Azure Openai API key.

  • Install Isaac Lab, see the installation guide.

  • Using a python interpreter that has Isaac Lab installed, install Isaac Lab Eureka

    python -m pip install -e exts/isaaclab_eureka
    

Running Isaac Lab Eureka

Run Eureka from the root repo directory IsaacLabEureka.

The Openai API key has to be exposed to the script via an environment variable. We follow the Openai API convention and use OPENAI_API_KEY, AZURE_OPENAI_API_KEY, and AZURE_OPENAI_ENDPOINT.

Running with the Openai API

Linux
OPENAI_API_KEY=your_key python scripts/train.py --task=Isaac-Cartpole-Direct-v0 --max_training_iterations=100 --rl_library="rl_games"
Windows

Powershell

$env:OPENAI_API_KEY="your_key"
python scripts\train.py --task=Isaac-Cartpole-Direct-v0 --max_training_iterations=100 --rl_library="rl_games"

Command line

set OPENAI_API_KEY=your_key
python scripts\train.py --task=Isaac-Cartpole-Direct-v0 --max_training_iterations=100 --rl_library="rl_games"

Running with the Azure Openai API

Linux
AZURE_OPENAI_API_KEY=your_key AZURE_OPENAI_ENDPOINT=azure_endpoint_url python scripts/train.py --task=Isaac-Cartpole-Direct-v0 --max_training_iterations=100 --rl_library="rl_games"
Windows

Powershell

$env:AZURE_OPENAI_API_KEY="your_key"
$env:AZURE_OPENAI_ENDPOINT="azure_endpoint_url"
python scripts\train.py --task=Isaac-Cartpole-Direct-v0 --max_training_iterations=100 --rl_library="rl_games"

Command line

set AZURE_OPENAI_API_KEY=your_key
set AZURE_OPENAI_ENDPOINT=azure_endpoint_url
python scripts\train.py --task=Isaac-Cartpole-Direct-v0 --max_training_iterations=100 --rl_library="rl_games"

Running Eureka Trained Policies

For each Eureka run, logs for the Eureka iterations are available under IsaacLabEureka/logs/eureka. This directory holds files containing the output from each Eureka iteration, as well as output and metrics of the final Eureka results for the task. The tensorboard log also contains a Text tab which shows the raw LLM output and the provided feedback at every iteration.

In addition, trained policies during the Eureka run are saved under IsaacLabEureka/logs/rl_runs. This directory contains checkpoints for each valid Eureka run, similar to the checkpoints available when training with Isaac Lab.

To run inference on an Eureka-trained policy, locate the path to the desired checkpoint and run the scripts/play.py script.

For RSL RL, run:

    python scripts/play.py --task=Isaac-Cartpole-Direct-v0 --checkpoint=/path/to/desired/checkpoint.pt --num_envs=20 --rl_library="rsl_rl"

For RL-Games, run:

    python scripts/play.py --task=Isaac-Cartpole-Direct-v0 --checkpoint=/path/to/desired/checkpoint.pth --num_envs=20 --rl_library="rl_games"

Limitations

  • Isaac Lab Eureka currently only supports tasks implemented in the direct-workflow style, basing off of the DirectRLEnv class. Available examples can be found in the task config. Following the DirectRLEnv interface, we assume each task has the observation function implemented in a method named _get_observations().
  • Currently, only RSL RL and RL-Games libraries are supported.
  • Due to limitations of multiprocessing on Windows, running with argument num_parallel_runs > 1 is not supported on Windows.
  • When running with num_parallel_runs > 1 on a single-GPU machine, training will run in parallel in the background and CPU and memory usage will increase.
  • Best policy is selected based on the success_metric defined for the task. For best performance, make sure to define an accurate success metric in the task config to guide the reward function generation process.
  • During the reward generation process, the LLM may generate code that introduces syntax or logical errors during the training process. In such case, the error message will be propagated to the output and the Eureka iteration will be skipped.

Code formatting

We have a pre-commit template to automatically format your code. To install pre-commit:

pip install pre-commit

Then you can run pre-commit with:

pre-commit run --all-files