You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/openenv.md
+31-13Lines changed: 31 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,15 @@
1
1
# OpenEnv Integration for Training LLMs with Environments
2
2
3
-
## Overview
4
-
5
3
[OpenEnv](https://github.com/meta-pytorch/OpenEnv) is an open-source framework from Meta's PyTorch team for defining, deploying, and interacting with environments in reinforcement learning (RL) and agentic workflows. It offers [Gymnasium-style APIs](https://gymnasium.farama.org) (e.g., `reset()` and `step()`) to interface with environments in a standard manner, and supports running these environments as backend servers (for example, via HTTP or containerised execution). You can find a collection of ready-to-use OpenEnv environments on the [Hugging Face Hub](https://huggingface.co/collections/openenv/environment-hub).
6
4
7
5
In this guide, we’ll focus on **how to integrate OpenEnv with TRL**, but feel free to explore the links above to dive deeper into OpenEnv itself.
8
6
9
7
> [!NOTE]
10
8
> You can explore ready-to-use example scripts in the [`examples/scripts/openenv/`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/) directory.
11
9
10
+
> [!NOTE]
11
+
> Explore the [OpenEnv docs](https://meta-pytorch.org/OpenEnv/) for more details.
12
+
12
13
## Installation
13
14
14
15
To use OpenEnv with TRL, install the framework:
@@ -54,7 +55,7 @@ def rollout_func(
54
55
55
56
The typical pattern when combining OpenEnv with TRL looks like this:
56
57
57
-
1. Start or connect to an OpenEnv environment (e.g., an HTTP endpoint or Dockerized env).
58
+
1. Start or connect to an OpenEnv environment (e.g., a Dockerized env or HTTP endpoint).
58
59
2. Generate completions from your model — either via `trl.experimental.openenv.generate_rollout_completions` when using colocated vLLM, or by hitting your inference server when using vLLM in server mode.
59
60
3. Step through the environment using each completion to compute rewards or metrics.
60
61
4. Add environment results (e.g., `env_reward`) to the rollout result dict.
@@ -105,16 +106,16 @@ args = GRPOConfig(
105
106
You can run OpenEnv environments in three different ways:
106
107
107
108
- We can load the environment from the Hugging Face Hub and execute it as a Docker container.
108
-
- We can launch the environment directly using Uvicorn in Python, which you need on Google Colab.
109
109
- We can connect to a hosted environment running on the Hugging Face Hub.
110
+
- We can launch the environment directly using Uvicorn in Python.
110
111
111
112
<hfoptionsid="env_mode">
112
113
113
114
<hfoptionid="docker">
114
115
115
116
**Load from Hugging Face Hub***(recommended)*
116
117
117
-
We can use the `from_hub` method to load the environment from the hub. This method will automatically start a Docker container for the environment on your local machine. `openenv/echo-env` is the repo_id of the space on the hub.
118
+
We can use the [`from_hub`](https://meta-pytorch.org/OpenEnv/core/#core.http_env_client.HTTPEnvClient.from_hub) method to load the environment from the hub. This method will automatically start a Docker container for the environment on your local machine. [`openenv/echo-env`](https://huggingface.co/spaces/openenv/echo_env) is the repo_id of the space on the hub.
118
119
119
120
```python
120
121
env = EchoEnv.from_hub("openenv/echo-env")
@@ -144,6 +145,10 @@ Here, we map the ports from 8001 to 8000 to make space for a vLLM server, but yo
> **Currently**, it is recommended to **duplicate the Space to your own account** to avoid potential concurrency issues.
173
+
166
174
</hfoption>
167
175
168
176
<hfoptionid="local">
169
177
170
178
**Local Python process**
171
179
172
-
You can start the server manually as a local Python process. For more details about the available environments, refer to the [OpenEnv repository](https://github.com/meta-pytorch/OpenEnv/tree/main/src/envs).
180
+
You can start the server manually as a local Python process. For more details about the available environments, refer to the [OpenEnv catalog](https://meta-pytorch.org/OpenEnv/environments/).
The best way to explore the **current catalog of maintained environments** is by visiting the official OpenEnv [catalog](https://huggingface.co/collections/openenv/environment-hub).
201
+
202
+
Custom environments are also supported. To learn how to create your own, check out the guide on [Building Your Own Environment with OpenEnv](https://meta-pytorch.org/OpenEnv/environment-builder/).
203
+
204
+
Environments are tightly integrated with the Hub, allowing you to **push new environments directly** so the community can easily pull, reuse, and adapt them for their own use cases.
205
+
189
206
## A simple example
190
207
191
-
The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the Echo environment rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
208
+
> [!NOTE]
209
+
> You can explore more ready-to-use example scripts in the [`examples/scripts/openenv/`](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/) directory.
210
+
211
+
The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the [Echo environment](https://meta-pytorch.org/OpenEnv/environments/echo/) rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
192
212
193
213
```python
194
214
from envs.echo_env import EchoEnv, EchoAction
@@ -325,19 +345,17 @@ Below is the reward curve from training:
To learn more about how to create custom environments, see the [OpenEnv documentation](https://github.com/meta-pytorch/OpenEnv/blob/main/src/envs/README.md).
329
-
330
348
## Advanced Example
331
349
332
-
Let's level this up a bit by training a model to interact with a more complex environment. We'll use the game word guessing game [wordle](https://www.nytimes.com/games/wordle/index.html) from the `textarena` environment.
350
+
Let's level this up a bit by training a model to interact with a more complex environment. We'll use the game word guessing game [wordle](https://www.nytimes.com/games/wordle/index.html) from the [`TextArena`](https://meta-pytorch.org/OpenEnv/environments/textarena/) environment.
333
351
334
352
### The TextArena Environment
335
353
336
354
[TextArena](https://huggingface.co/papers/2504.11442) is an open-source collection of competitive text-based games designed to evaluate reasoning skills in LLMs using textual games like Wordle, Snake, Tic-Tac-Toe, and more. Research has shown that such games improve model performance on reasoning tasks.
337
355
338
-

356
+

339
357
340
-
We will use the `textarena` environment to train a model to play Wordle. The environment is a simple text based response environment that allows the model to interact with the game by making guesses and receive feedback on them.
358
+
We will use the `TextArena` environment to train a model to play Wordle. The environment is a simple text based response environment that allows the model to interact with the game by making guesses and receive feedback on them.
341
359
342
360
### Wordle
343
361
@@ -563,7 +581,7 @@ You can also manually start the TextArena environment in a Docker container befo
563
581
docker run -d -p 8001:8001 registry.hf.space/burtenshaw-textarena:latest
564
582
```
565
583
566
-
Then connect to it using `--env-url http://localhost:8001`.
584
+
Then connect to it using `--env-mode docker-local--env-host localhost --env-port 8001`.
0 commit comments