Skip to content

Commit a4872d9

Browse files
Update OpenEnv docs (#4328)
1 parent 3f66564 commit a4872d9

File tree

1 file changed

+5
-7
lines changed

1 file changed

+5
-7
lines changed

docs/source/openenv.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,21 @@
22

33
## Overview
44

5-
[OpenEnv](https://github.com/meta-pytorch/OpenEnv) is a framework from Meta to integrate external environments with RL training loops. It provides [Gymnasium-style APIs](https://gymnasium.farama.org) (`reset()`, `step()`, `state()`) and a simple HTTP protocol for interacting with environments running as Docker containers. You can find OpenEnv environments on the Hugging Face Hub under dedicated [orgs](https://huggingface.co/openenv).
5+
[OpenEnv](https://github.com/meta-pytorch/OpenEnv) is an open-source framework from Meta's PyTorch team for defining, deploying, and interacting with environments in reinforcement learning (RL) and agentic workflows. It offers [Gymnasium-style APIs](https://gymnasium.farama.org) (e.g., `reset()` and `step()`) to interface with environments in a standard manner, and supports running these environments as backend servers (for example via HTTP or containerised execution). You can find a collection of ready-to-use OpenEnv environments on the [Hugging Face Hub](https://huggingface.co/collections/openenv/environment-hub).
66

7-
[OpenEnv](https://github.com/meta-pytorch/OpenEnv) is an open-source framework from Meta's PyTorch team for defining, deploying and interacting with environments in RL/agentic workflows. It offers [Gymnasium-style APIs](https://gymnasium.farama.org) (e.g., `reset()` and `step()`) to interface with environments in a standard manner, and supports running these environments as backend servers (for example via HTTP or containerised execution). A collection of ready-to-use OpenEnv environments is available on the [Hugging Face Hub](https://huggingface.co/collections/openenv/environment-hub).
8-
9-
Here, we’ll focus on the **integration of OpenEnv with TRL**, but check out the above resources to learn more about them.
7+
In this guide, we’ll focus on **how to integrate OpenEnv with TRL**, but feel free to explore the links above to dive deeper into OpenEnv itself.
108

119
## Installation
1210

1311
To use OpenEnv with TRL, install the framework:
1412

1513
```bash
16-
pip install git+https://github.com/meta-pytorch/OpenEnv.git
14+
pip install openenv-core
1715
```
1816

1917
## Using `rollout_func` with OpenEnv environments
2018

21-
TRL's [`GRPOTrainer`] supports _custom rollout logic_ through the `rollout_func` argument. This lets you override the trainer's default text-generation loop and directly interact with OpenEnv environments — for example, to compute environment-based rewards instead of purely model-based ones.
19+
TRL's [`GRPOTrainer`] supports _custom rollout logic_ through the `rollout_func` argument. This lets you override the trainer's default text-generation loop and directly interact with OpenEnv environments — for instance, to compute environment-driven rewards instead of relying solely on model-based signals.
2220

2321
### Rollout Function Signature
2422

@@ -69,7 +67,7 @@ By using OpenEnv in this loop, you can:
6967

7068
## A simple example
7169

72-
The [echo.py](../../examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the Echo environment rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
70+
The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the Echo environment rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
7371

7472
```python
7573
from envs.echo_env import EchoEnv, EchoAction

0 commit comments

Comments
 (0)