Skip to content

Commit

Permalink
Merge branch 'main' into crafter
Browse files Browse the repository at this point in the history
  • Loading branch information
huangshiyu13 authored Mar 22, 2024
2 parents d2564da + 534d54d commit cf4661a
Show file tree
Hide file tree
Showing 43 changed files with 2,067 additions and 446 deletions.
37 changes: 19 additions & 18 deletions Gallery.md

Large diffs are not rendered by default.

11 changes: 9 additions & 2 deletions Project.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@
Here is a list of research projects that use OpenRL.
If you use OpenRL in your research projects, feel free to tell us about it and join the list.

### LLMArena

Description: LLMArena is a novel and easily extensible framework for evaluating the diverse capabilities of LLM in multi-agent dynamic environments.

- Paper: [LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments](https://arxiv.org/abs/2402.16499)
- Authors: Junzhe Chen, Xuming Hu, Shuodi Liu, Shiyu Huang, Wei-Wei Tu, Zhaofeng He, Lijie Wen

### TiZero

Description: TiZero is a reinforcement learning agent for Google Research Football full game, trained with distributed self-play.
Expand All @@ -18,7 +25,7 @@ However, in many practical applications, it is important to develop reasonable a
In this paper, we propose an on-policy framework for discovering multiple strategies for the same task.
Experimental results show that our method efficiently finds diverse strategies in a wide variety of reinforcement learning tasks.

- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAMAS Extended Abstract 2023)
- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Ting Chen, Jun Zhu
- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAAI 2024)
- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Tim Pearce, Wei-Wei Tu, Ting Chen, Jun Zhu


46 changes: 24 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)

OpenRL-v0.1.10 is updated on Oct 27, 2023
OpenRL-v0.2.1 is updated on Dec 20, 2023

The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
OpenRL, you can switch to the stable branch.
Expand Down Expand Up @@ -58,6 +58,8 @@ Currently, the features supported by OpenRL include:

- Reinforcement learning training support for natural language tasks (such as dialogue)

- Support [DeepSpeed](https://github.com/microsoft/DeepSpeed)

- Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of
various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment.

Expand Down Expand Up @@ -120,6 +122,7 @@ Environments currently supported by OpenRL (for more details, please refer to [G
- [DeepMind Control](https://shimmy.farama.org/environments/dm_control/)
- [Snake](http://www.jidiai.cn/env_detail?envid=1)
- [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)
- [EnvPool](https://github.com/sail-sg/envpool)
- [GridWorld](./examples/gridworld/)
- [Super Mario Bros](https://github.com/Kautenja/gym-super-mario-bros)
- [Gym Retro](https://github.com/openai/retro)
Expand Down Expand Up @@ -160,19 +163,19 @@ Here we provide a table for the comparison of OpenRL and existing popular RL lib
OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks
through a unified and user-friendly interface.

| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | Bilingual Document |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |
| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:--------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## Installation

Expand Down Expand Up @@ -334,7 +337,7 @@ If you are using OpenRL in your research project, you are also welcome to join t
- Join the [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) group to discuss
OpenRL usage and development with us.
- Join the [Discord](https://discord.gg/qMbVT2qBhr) group to discuss OpenRL usage and development with us.
- Send an E-mail to: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
- Send an E-mail to: [huangsy1314@163.com](huangsy1314@163.com)
- Join the [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions).

The OpenRL framework is still under continuous development and documentation.
Expand All @@ -352,7 +355,7 @@ At present, OpenRL is maintained by the following maintainers:
- Yiwen Sun([@YiwenAI](https://github.com/YiwenAI))

Welcome more contributors to join our maintenance team (send an E-mail
to [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
to [huangsy1314@163.com](huangsy1314@163.com)
to apply for joining the OpenRL team).

## Supporters
Expand All @@ -376,12 +379,11 @@ to apply for joining the OpenRL team).
If our work has been helpful to you, please feel free to cite us:

```latex
@misc{openrl2023,
title={OpenRL},
author={OpenRL Contributors},
publisher = {GitHub},
howpublished = {\url{https://github.com/OpenRL-Lab/openrl}},
year={2023},
@article{huang2023openrl,
title={OpenRL: A Unified Reinforcement Learning Framework},
author={Huang, Shiyu and Chen, Wentse and Sun, Yiwen and Bie, Fuqing and Tu, Wei-Wei},
journal={arXiv preprint arXiv:2312.16189},
year={2023}
}
```

Expand Down
29 changes: 15 additions & 14 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)

OpenRL-v0.1.10 is updated on Oct 27, 2023
OpenRL-v0.2.1 is updated on Dec 20, 2023

The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
OpenRL, you can switch to the stable branch.
Expand All @@ -51,6 +51,7 @@ OpenRL基于PyTorch进行开发,目标是为强化学习研究社区提供一
- 支持通过专家数据进行离线强化学习训练
- 支持自博弈训练
- 支持自然语言任务(如对话任务)的强化学习训练
- 支持[DeepSpeed](https://github.com/microsoft/DeepSpeed)
- 支持[竞技场](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html)功能,可以在多智能体对抗性环境中方便地对各种智能体(甚至是[及第平台](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html#openrl)上提交的智能体)进行评测。
- 支持从[Hugging Face](https://huggingface.co/)上导入模型和数据。支持加载Hugging Face上[Stable-baselines3的模型](https://openrl-docs.readthedocs.io/zh/latest/sb3/index.html)来进行测试和训练。
- 提供用户自有环境接入OpenRL的[详细教程](https://openrl-docs.readthedocs.io/zh/latest/custom_env/index.html).
Expand Down Expand Up @@ -96,6 +97,7 @@ OpenRL目前支持的环境(更多详情请参考 [Gallery](Gallery.md)):
- [DeepMind Control](https://shimmy.farama.org/environments/dm_control/)
- [Snake](http://www.jidiai.cn/env_detail?envid=1)
- [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)
- [EnvPool](https://github.com/sail-sg/envpool)
- [GridWorld](./examples/gridworld/)
- [Super Mario Bros](https://github.com/Kautenja/gym-super-mario-bros)
- [Gym Retro](https://github.com/openai/retro)
Expand Down Expand Up @@ -128,18 +130,18 @@ OpenRL-Lab将持续维护和更新OpenRL,欢迎大家加入我们的[开源社

这里我们提供了一个表格,比较了OpenRL和其他常用的强化学习库。 OpenRL采用模块化设计和高层次的抽象,使得用户可以通过统一的简单易用的接口完成各种任务的训练。

| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | 双语文档 |
| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## 安装
Expand Down Expand Up @@ -294,7 +296,7 @@ openrl --mode train --env CartPole-v1
- 加入 [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)
群组,与我们一起讨论OpenRL的使用和开发。
- 加入 [Discord](https://discord.gg/qMbVT2qBhr) 群组,与我们一起讨论OpenRL的使用和开发。
- 发送邮件到: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
- 发送邮件到: [huangsy1314@163.com](huangsy1314@163.com)
- 加入 [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions)

OpenRL框架目前还在持续开发和文档建设,欢迎加入我们让该项目变得更好:
Expand All @@ -309,7 +311,7 @@ OpenRL框架目前还在持续开发和文档建设,欢迎加入我们让该
- [Shiyu Huang](https://huangshiyu13.github.io/)([@huangshiyu13](https://github.com/huangshiyu13))
- Wenze Chen([@Chen001117](https://github.com/Chen001117))

欢迎更多的贡献者加入我们的维护团队 (发送邮件到[huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)申请加入OpenRL团队)。
欢迎更多的贡献者加入我们的维护团队 (发送邮件到[huangsy1314@163.com](huangsy1314@163.com)申请加入OpenRL团队)。

## 支持者

Expand All @@ -332,12 +334,11 @@ OpenRL框架目前还在持续开发和文档建设,欢迎加入我们让该
如果我们的工作对你有帮助,欢迎引用我们:

```latex
@misc{openrl2023,
title={OpenRL},
author={OpenRL Contributors},
publisher = {GitHub},
howpublished = {\url{https://github.com/OpenRL-Lab/openrl}},
year={2023},
@article{huang2023openrl,
title={OpenRL: A Unified Reinforcement Learning Framework},
author={Huang, Shiyu and Chen, Wentse and Sun, Yiwen and Bie, Fuqing and Tu, Wei-Wei},
journal={arXiv preprint arXiv:2312.16189},
year={2023}
}
```

Expand Down
11 changes: 9 additions & 2 deletions examples/atari/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,19 @@ Then install auto-rom via:
or:
```shell
pip install autorom

AutoROM --accept-license
```

or, if you can not download the ROMs, you can download them manually from [Google Drive](https://drive.google.com/file/d/1agerLX3fP2YqUCcAkMF7v_ZtABAOhlA7/view?usp=sharing).
Then, you can install the ROMs via:
```shell
pip install autorom
AutoROM --source-file <path-to-Roms.tar.gz>
````


## Usage

```shell
python train_ppo.py --config atari_ppo.yaml
python train_ppo.py
```
13 changes: 9 additions & 4 deletions examples/atari/atari_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,27 @@ seed: 0
lr: 2.5e-4
critic_lr: 2.5e-4
episode_length: 128
ppo_epoch: 4
gamma: 0.99
ppo_epoch: 3
gain: 0.01
use_linear_lr_decay: true
use_share_model: true
entropy_coef: 0.01
hidden_size: 512
num_mini_batch: 4
clip_param: 0.1
num_mini_batch: 8
clip_param: 0.2
value_loss_coef: 0.5
max_grad_norm: 10

run_dir: ./run_results/
experiment_name: atari_ppo

log_interval: 1
use_recurrent_policy: false
use_valuenorm: true
use_adv_normalize: true

wandb_entity: openrl-lab
experiment_name: atari_ppo

vec_info_class:
id: "EPS_RewardInfo"
6 changes: 3 additions & 3 deletions examples/atari/train_ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,11 @@

def train():
cfg_parser = create_config_parser()
cfg = cfg_parser.parse_args()
cfg = cfg_parser.parse_args(["--config", "atari_ppo.yaml"])

# create environment, set environment parallelism to 9
env = make(
"ALE/Pong-v5", env_num=9, cfg=cfg, asynchronous=True, env_wrappers=env_wrappers
"ALE/Pong-v5", env_num=16, cfg=cfg, asynchronous=True, env_wrappers=env_wrappers
)

# create the neural network
Expand All @@ -56,7 +56,7 @@ def train():
env, cfg=cfg, device="cuda" if "macOS" not in get_system_info()["OS"] else "cpu"
)
# initialize the trainer
agent = Agent(net, use_wandb=True)
agent = Agent(net, use_wandb=True, project_name="Pong-v5")
# start training, set total number of training steps to 20000

agent.train(total_time_steps=5000000)
Expand Down
20 changes: 20 additions & 0 deletions examples/envpool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## Installation


Install envpool with:

``` shell
pip install envpool
```

Note 1: envpool only supports Linux operating system.

## Usage

You can use `OpenRL` to train Cartpole (envpool) via:

``` shell
PYTHON_PATH train_ppo.py
```

You can also add custom wrappers in `envpool_wrapper.py`. Currently we have `VecAdapter` and `VecMonitor` wrappers.
Loading

0 comments on commit cf4661a

Please sign in to comment.