Skip to content

Commit a263946

Browse files
Update OpenEnv guide with latest details (#4552)
Co-authored-by: burtenshaw <ben.burtenshaw@gmail.com>
1 parent 1a9ff52 commit a263946

File tree

1 file changed

+31
-13
lines changed

1 file changed

+31
-13
lines changed

docs/source/openenv.md

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
# OpenEnv Integration for Training LLMs with Environments
22

3-
## Overview
4-
53
[OpenEnv](https://github.com/meta-pytorch/OpenEnv) is an open-source framework from Meta's PyTorch team for defining, deploying, and interacting with environments in reinforcement learning (RL) and agentic workflows. It offers [Gymnasium-style APIs](https://gymnasium.farama.org) (e.g., `reset()` and `step()`) to interface with environments in a standard manner, and supports running these environments as backend servers (for example, via HTTP or containerised execution). You can find a collection of ready-to-use OpenEnv environments on the [Hugging Face Hub](https://huggingface.co/collections/openenv/environment-hub).
64

75
In this guide, we’ll focus on **how to integrate OpenEnv with TRL**, but feel free to explore the links above to dive deeper into OpenEnv itself.
86

97
> [!NOTE]
108
> You can explore ready-to-use example scripts in the [`examples/scripts/openenv/`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/) directory.
119
10+
> [!NOTE]
11+
> Explore the [OpenEnv docs](https://meta-pytorch.org/OpenEnv/) for more details.
12+
1213
## Installation
1314

1415
To use OpenEnv with TRL, install the framework:
@@ -54,7 +55,7 @@ def rollout_func(
5455

5556
The typical pattern when combining OpenEnv with TRL looks like this:
5657

57-
1. Start or connect to an OpenEnv environment (e.g., an HTTP endpoint or Dockerized env).
58+
1. Start or connect to an OpenEnv environment (e.g., a Dockerized env or HTTP endpoint).
5859
2. Generate completions from your model — either via `trl.experimental.openenv.generate_rollout_completions` when using colocated vLLM, or by hitting your inference server when using vLLM in server mode.
5960
3. Step through the environment using each completion to compute rewards or metrics.
6061
4. Add environment results (e.g., `env_reward`) to the rollout result dict.
@@ -105,16 +106,16 @@ args = GRPOConfig(
105106
You can run OpenEnv environments in three different ways:
106107

107108
- We can load the environment from the Hugging Face Hub and execute it as a Docker container.
108-
- We can launch the environment directly using Uvicorn in Python, which you need on Google Colab.
109109
- We can connect to a hosted environment running on the Hugging Face Hub.
110+
- We can launch the environment directly using Uvicorn in Python.
110111

111112
<hfoptions id="env_mode">
112113

113114
<hfoption id="docker">
114115

115116
**Load from Hugging Face Hub** *(recommended)*
116117

117-
We can use the `from_hub` method to load the environment from the hub. This method will automatically start a Docker container for the environment on your local machine. `openenv/echo-env` is the repo_id of the space on the hub.
118+
We can use the [`from_hub`](https://meta-pytorch.org/OpenEnv/core/#core.http_env_client.HTTPEnvClient.from_hub) method to load the environment from the hub. This method will automatically start a Docker container for the environment on your local machine. [`openenv/echo-env`](https://huggingface.co/spaces/openenv/echo_env) is the repo_id of the space on the hub.
118119

119120
```python
120121
env = EchoEnv.from_hub("openenv/echo-env")
@@ -144,6 +145,10 @@ Here, we map the ports from 8001 to 8000 to make space for a vLLM server, but yo
144145
>
145146
> ![open_env_launch_docker](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/open_env_launch_docker.png)
146147
148+
> [!NOTE]
149+
> You can also use the **Docker option** with `from_docker_image` by providing the image name..
150+
> For more details, refer to the official [OpenEnv documentation](https://meta-pytorch.org/OpenEnv/core/).
151+
147152
</hfoption>
148153
<hfoption id="space">
149154

@@ -163,13 +168,16 @@ env = EchoEnv(base_url="https://openenv-echo-env.hf.space")
163168
> * Select **“Embed this Space.”**
164169
> * Copy the connection URL.
165170
171+
> [!WARNING]
172+
> **Currently**, it is recommended to **duplicate the Space to your own account** to avoid potential concurrency issues.
173+
166174
</hfoption>
167175

168176
<hfoption id="local">
169177

170178
**Local Python process**
171179

172-
You can start the server manually as a local Python process. For more details about the available environments, refer to the [OpenEnv repository](https://github.com/meta-pytorch/OpenEnv/tree/main/src/envs).
180+
You can start the server manually as a local Python process. For more details about the available environments, refer to the [OpenEnv catalog](https://meta-pytorch.org/OpenEnv/environments/).
173181

174182
```bash
175183
hf download openenv/echo_env --repo-type=space --local-dir=echo_env
@@ -186,9 +194,21 @@ env = EchoEnv(base_url="http://0.0.0.0:8001")
186194

187195
</hfoptions>
188196

197+
## Environments Catalog
198+
199+
Environment development is active and evolving.
200+
The best way to explore the **current catalog of maintained environments** is by visiting the official OpenEnv [catalog](https://huggingface.co/collections/openenv/environment-hub).
201+
202+
Custom environments are also supported. To learn how to create your own, check out the guide on [Building Your Own Environment with OpenEnv](https://meta-pytorch.org/OpenEnv/environment-builder/).
203+
204+
Environments are tightly integrated with the Hub, allowing you to **push new environments directly** so the community can easily pull, reuse, and adapt them for their own use cases.
205+
189206
## A simple example
190207

191-
The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the Echo environment rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
208+
> [!NOTE]
209+
> You can explore more ready-to-use example scripts in the [`examples/scripts/openenv/`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/) directory.
210+
211+
The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the [Echo environment](https://meta-pytorch.org/OpenEnv/environments/echo/) rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
192212

193213
```python
194214
from envs.echo_env import EchoEnv, EchoAction
@@ -325,19 +345,17 @@ Below is the reward curve from training:
325345

326346
<iframe src="https://trl-lib-trackio.hf.space?project=openenv&metrics=train/rewards/reward_from_env/mean&runs=qgallouedec-1761202871&sidebar=hidden&navbar=hidden" style="width:600px; height:500px; border:0;"></iframe>
327347

328-
To learn more about how to create custom environments, see the [OpenEnv documentation](https://github.com/meta-pytorch/OpenEnv/blob/main/src/envs/README.md).
329-
330348
## Advanced Example
331349

332-
Let's level this up a bit by training a model to interact with a more complex environment. We'll use the game word guessing game [wordle](https://www.nytimes.com/games/wordle/index.html) from the `textarena` environment.
350+
Let's level this up a bit by training a model to interact with a more complex environment. We'll use the game word guessing game [wordle](https://www.nytimes.com/games/wordle/index.html) from the [`TextArena`](https://meta-pytorch.org/OpenEnv/environments/textarena/) environment.
333351

334352
### The TextArena Environment
335353

336354
[TextArena](https://huggingface.co/papers/2504.11442) is an open-source collection of competitive text-based games designed to evaluate reasoning skills in LLMs using textual games like Wordle, Snake, Tic-Tac-Toe, and more. Research has shown that such games improve model performance on reasoning tasks.
337355

338-
![image of textarena](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/text_arena_evals.png)
356+
![image of TextArena](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/text_arena_evals.png)
339357

340-
We will use the `textarena` environment to train a model to play Wordle. The environment is a simple text based response environment that allows the model to interact with the game by making guesses and receive feedback on them.
358+
We will use the `TextArena` environment to train a model to play Wordle. The environment is a simple text based response environment that allows the model to interact with the game by making guesses and receive feedback on them.
341359

342360
### Wordle
343361

@@ -563,7 +581,7 @@ You can also manually start the TextArena environment in a Docker container befo
563581
docker run -d -p 8001:8001 registry.hf.space/burtenshaw-textarena:latest
564582
```
565583

566-
Then connect to it using `--env-url http://localhost:8001`.
584+
Then connect to it using `--env-mode docker-local--env-host localhost --env-port 8001`.
567585

568586
### Results
569587

0 commit comments

Comments
 (0)