Merge branch 'main' into crafter

OpenRL-Lab · Mar 22, 2024 · cf4661a · cf4661a
2 parents d2564da + 534d54d
commit cf4661a
Show file tree

Hide file tree

Showing 43 changed files with 2,067 additions and 446 deletions.
diff --git a/Gallery.md b/Gallery.md
diff --git a/Project.md b/Project.md
@@ -3,6 +3,13 @@
 Here is a list of research projects that use OpenRL. 
 If you use OpenRL in your research projects, feel free to tell us about it and join the list.
 
+### LLMArena
+
+Description:  LLMArena is a novel and easily extensible framework for evaluating the diverse capabilities of LLM in multi-agent dynamic environments.
+
+- Paper: [LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments](https://arxiv.org/abs/2402.16499)
+- Authors: Junzhe Chen, Xuming Hu, Shuodi Liu, Shiyu Huang, Wei-Wei Tu, Zhaofeng He, Lijie Wen
+
 ### TiZero
 
 Description: TiZero is a reinforcement learning agent for Google Research Football full game, trained with distributed self-play.
@@ -18,7 +25,7 @@ However, in many practical applications, it is important to develop reasonable a
 In this paper, we propose an on-policy framework for discovering multiple strategies for the same task.
 Experimental results show that our method efficiently finds diverse strategies in a wide variety of reinforcement learning tasks.
 
-- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAMAS Extended Abstract 2023)
-- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Ting Chen, Jun Zhu
+- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAAI 2024)
+- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Tim Pearce, Wei-Wei Tu, Ting Chen, Jun Zhu
 
 
diff --git a/README.md b/README.md
@@ -28,7 +28,7 @@
 [![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
 [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)
 
-OpenRL-v0.1.10 is updated on Oct 27, 2023
+OpenRL-v0.2.1 is updated on Dec 20, 2023
 
 The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
 OpenRL, you can switch to the stable branch.
@@ -58,6 +58,8 @@ Currently, the features supported by OpenRL include:
 
 - Reinforcement learning training support for natural language tasks (such as dialogue)
 
+- Support [DeepSpeed](https://github.com/microsoft/DeepSpeed)
+
 - Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of
   various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment.
 
@@ -120,6 +122,7 @@ Environments currently supported by OpenRL (for more details, please refer to [G
 - [DeepMind Control](https://shimmy.farama.org/environments/dm_control/)
 - [Snake](http://www.jidiai.cn/env_detail?envid=1)
 - [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)
+- [EnvPool](https://github.com/sail-sg/envpool)
 - [GridWorld](./examples/gridworld/)
 - [Super Mario Bros](https://github.com/Kautenja/gym-super-mario-bros)
 - [Gym Retro](https://github.com/openai/retro)
@@ -160,19 +163,19 @@ Here we provide a table for the comparison of OpenRL and existing popular RL lib
 OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks
 through a unified and user-friendly interface.
 
-|                              Library                               |      NLP/RLHF      |     Multi-agent      |  Self-Play Training  |     Offline RL     | Bilingual Document | 
-|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:| 
-|         **[OpenRL](https://github.com/OpenRL-Lab/openrl)**         | :heavy_check_mark: |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: |
-|  [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)  |        :x:         |         :x:          |         :x:          |        :x:         |        :x:         |
-| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) |        :x:         |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: |        :x:         |
-|        [DI-engine](https://github.com/opendilab/DI-engine/)        |        :x:         |  :heavy_check_mark:  | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
-|           [Tianshou](https://github.com/thu-ml/tianshou)           |        :x:         | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
-|       [MARLlib](https://github.com/Replicable-MARL/MARLlib)        |        :x:         |  :heavy_check_mark:  | not fullly supported |        :x:         |        :x:         |
-|   [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy)    |        :x:         |  :heavy_check_mark:  |         :x:          |        :x:         |        :x:         |
-|            [RL4LMs](https://github.com/allenai/RL4LMs)             | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
-|              [trlx](https://github.com/CarperAI/trlx)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
-|             [trl](https://github.com/huggingface/trl)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
-|       [TimeChamber](https://github.com/inspirai/TimeChamber)       |        :x:         |         :x:          |  :heavy_check_mark:  |        :x:         |        :x:         |
+|                              Library                               |      NLP/RLHF      |     Multi-agent      |  Self-Play Training  |     Offline RL     |      [DeepSpeed](https://github.com/microsoft/DeepSpeed)       | 
+|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:--------------------:| 
+|         **[OpenRL](https://github.com/OpenRL-Lab/openrl)**         | :heavy_check_mark: |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: |  :heavy_check_mark:  |
+|  [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)  |        :x:         |         :x:          |         :x:          |        :x:         |         :x:          |
+| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) |        :x:         |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: |         :x:          |
+|        [DI-engine](https://github.com/opendilab/DI-engine/)        |        :x:         |  :heavy_check_mark:  | not fullly supported | :heavy_check_mark: |         :x:          |
+|           [Tianshou](https://github.com/thu-ml/tianshou)           |        :x:         | not fullly supported | not fullly supported | :heavy_check_mark: |           :x:           |
+|       [MARLlib](https://github.com/Replicable-MARL/MARLlib)        |        :x:         |  :heavy_check_mark:  | not fullly supported |        :x:         |         :x:          |
+|   [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy)    |        :x:         |  :heavy_check_mark:  |         :x:          |        :x:         |         :x:          |
+|            [RL4LMs](https://github.com/allenai/RL4LMs)             | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |         :x:          |
+|              [trlx](https://github.com/CarperAI/trlx)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :heavy_check_mark:          |
+|             [trl](https://github.com/huggingface/trl)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |         :heavy_check_mark:          |
+|       [TimeChamber](https://github.com/inspirai/TimeChamber)       |        :x:         |         :x:          |  :heavy_check_mark:  |        :x:         |         :x:          |
 
 ## Installation
 
@@ -334,7 +337,7 @@ If you are using OpenRL in your research project, you are also welcome to join t
 - Join the [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) group to discuss
   OpenRL usage and development with us.
 - Join the [Discord](https://discord.gg/qMbVT2qBhr) group to discuss OpenRL usage and development with us.
-- Send an E-mail to: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
+- Send an E-mail to: [huangsy1314@163.com](huangsy1314@163.com)
 - Join the [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions).
 
 The OpenRL framework is still under continuous development and documentation.
@@ -352,7 +355,7 @@ At present, OpenRL is maintained by the following maintainers:
 - Yiwen Sun([@YiwenAI](https://github.com/YiwenAI))
 
 Welcome more contributors to join our maintenance team (send an E-mail
-to [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
+to [huangsy1314@163.com](huangsy1314@163.com)
 to apply for joining the OpenRL team).
 
 ## Supporters
@@ -376,12 +379,11 @@ to apply for joining the OpenRL team).
 If our work has been helpful to you, please feel free to cite us:
 
 ```latex
-@misc{openrl2023,
-    title={OpenRL},
-    author={OpenRL Contributors},
-    publisher = {GitHub},
-    howpublished = {\url{https://github.com/OpenRL-Lab/openrl}},
-    year={2023},
+@article{huang2023openrl,
+  title={OpenRL: A Unified Reinforcement Learning Framework},
+  author={Huang, Shiyu and Chen, Wentse and Sun, Yiwen and Bie, Fuqing and Tu, Wei-Wei},
+  journal={arXiv preprint arXiv:2312.16189},
+  year={2023}
 }
 ```
 

diff --git a/README_zh.md b/README_zh.md
@@ -29,7 +29,7 @@
 [![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
 [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)
 
-OpenRL-v0.1.10 is updated on Oct 27, 2023
+OpenRL-v0.2.1 is updated on Dec 20, 2023
 
 The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
 OpenRL, you can switch to the stable branch.
@@ -51,6 +51,7 @@ OpenRL基于PyTorch进行开发，目标是为强化学习研究社区提供一
 - 支持通过专家数据进行离线强化学习训练
 - 支持自博弈训练
 - 支持自然语言任务（如对话任务）的强化学习训练
+- 支持[DeepSpeed](https://github.com/microsoft/DeepSpeed)
 - 支持[竞技场](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html)功能，可以在多智能体对抗性环境中方便地对各种智能体（甚至是[及第平台](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html#openrl)上提交的智能体）进行评测。
 - 支持从[Hugging Face](https://huggingface.co/)上导入模型和数据。支持加载Hugging Face上[Stable-baselines3的模型](https://openrl-docs.readthedocs.io/zh/latest/sb3/index.html)来进行测试和训练。
 - 提供用户自有环境接入OpenRL的[详细教程](https://openrl-docs.readthedocs.io/zh/latest/custom_env/index.html).
@@ -96,6 +97,7 @@ OpenRL目前支持的环境（更多详情请参考 [Gallery](Gallery.md)）：
 - [DeepMind Control](https://shimmy.farama.org/environments/dm_control/)
 - [Snake](http://www.jidiai.cn/env_detail?envid=1)
 - [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)
+- [EnvPool](https://github.com/sail-sg/envpool)
 - [GridWorld](./examples/gridworld/)
 - [Super Mario Bros](https://github.com/Kautenja/gym-super-mario-bros)
 - [Gym Retro](https://github.com/openai/retro)
@@ -128,18 +130,18 @@ OpenRL-Lab将持续维护和更新OpenRL，欢迎大家加入我们的[开源社
 
 这里我们提供了一个表格，比较了OpenRL和其他常用的强化学习库。 OpenRL采用模块化设计和高层次的抽象，使得用户可以通过统一的简单易用的接口完成各种任务的训练。
 
-|                               强化学习库                                |    自然语言任务/RLHF     |        多智能体训练        |        自博弈训练         |       离线强化学习       |        双语文档        | 
+|                               强化学习库                                |    自然语言任务/RLHF     |        多智能体训练        |        自博弈训练         |       离线强化学习       |     [DeepSpeed](https://github.com/microsoft/DeepSpeed)      | 
 |:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:| 
 |         **[OpenRL](https://github.com/OpenRL-Lab/openrl)**         | :heavy_check_mark: |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: |
 |  [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)  |        :x:         |         :x:          |         :x:          |        :x:         |        :x:         |
 | [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) |        :x:         |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: |        :x:         |
-|        [DI-engine](https://github.com/opendilab/DI-engine/)        |        :x:         |  :heavy_check_mark:  | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
-|           [Tianshou](https://github.com/thu-ml/tianshou)           |        :x:         | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
+|        [DI-engine](https://github.com/opendilab/DI-engine/)        |        :x:         |  :heavy_check_mark:  | not fullly supported | :heavy_check_mark: |        :x:         |
+|           [Tianshou](https://github.com/thu-ml/tianshou)           |        :x:         | not fullly supported | not fullly supported | :heavy_check_mark: |        :x:         |
 |       [MARLlib](https://github.com/Replicable-MARL/MARLlib)        |        :x:         |  :heavy_check_mark:  | not fullly supported |        :x:         |        :x:         |
 |   [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy)    |        :x:         |  :heavy_check_mark:  |         :x:          |        :x:         |        :x:         |
 |            [RL4LMs](https://github.com/allenai/RL4LMs)             | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
-|              [trlx](https://github.com/CarperAI/trlx)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
-|             [trl](https://github.com/huggingface/trl)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
+|              [trlx](https://github.com/CarperAI/trlx)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         | :heavy_check_mark: |
+|             [trl](https://github.com/huggingface/trl)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         | :heavy_check_mark: |
 |       [TimeChamber](https://github.com/inspirai/TimeChamber)       |        :x:         |         :x:          |  :heavy_check_mark:  |        :x:         |        :x:         |
 
 ## 安装
@@ -294,7 +296,7 @@ openrl --mode train --env CartPole-v1
 - 加入 [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)
   群组，与我们一起讨论OpenRL的使用和开发。
 - 加入 [Discord](https://discord.gg/qMbVT2qBhr) 群组，与我们一起讨论OpenRL的使用和开发。
-- 发送邮件到: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
+- 发送邮件到: [huangsy1314@163.com](huangsy1314@163.com)
 - 加入 [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions)
 
 OpenRL框架目前还在持续开发和文档建设，欢迎加入我们让该项目变得更好：
@@ -309,7 +311,7 @@ OpenRL框架目前还在持续开发和文档建设，欢迎加入我们让该
 - [Shiyu Huang](https://huangshiyu13.github.io/)([@huangshiyu13](https://github.com/huangshiyu13))
 - Wenze Chen([@Chen001117](https://github.com/Chen001117))
 
-欢迎更多的贡献者加入我们的维护团队 (发送邮件到[huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)申请加入OpenRL团队)。
+欢迎更多的贡献者加入我们的维护团队 (发送邮件到[huangsy1314@163.com](huangsy1314@163.com)申请加入OpenRL团队)。
 
 ## 支持者
 
@@ -332,12 +334,11 @@ OpenRL框架目前还在持续开发和文档建设，欢迎加入我们让该
 如果我们的工作对你有帮助，欢迎引用我们:
 
 ```latex
-@misc{openrl2023,
-    title={OpenRL},
-    author={OpenRL Contributors},
-    publisher = {GitHub},
-    howpublished = {\url{https://github.com/OpenRL-Lab/openrl}},
-    year={2023},
+@article{huang2023openrl,
+  title={OpenRL: A Unified Reinforcement Learning Framework},
+  author={Huang, Shiyu and Chen, Wentse and Sun, Yiwen and Bie, Fuqing and Tu, Wei-Wei},
+  journal={arXiv preprint arXiv:2312.16189},
+  year={2023}
 }
 ```
 

diff --git a/examples/atari/README.md b/examples/atari/README.md
@@ -8,12 +8,19 @@ Then install auto-rom via:
 or:
 ```shell
 pip install autorom
-
 AutoROM --accept-license
 ```
 
+or, if you can not download the ROMs, you can download them manually from [Google Drive](https://drive.google.com/file/d/1agerLX3fP2YqUCcAkMF7v_ZtABAOhlA7/view?usp=sharing).
+Then, you can install the ROMs via:
+```shell
+pip install autorom
+AutoROM --source-file <path-to-Roms.tar.gz>
+````
+
+
 ## Usage
 
 ```shell
-python train_ppo.py --config atari_ppo.yaml
+python train_ppo.py
 ```
diff --git a/examples/atari/atari_ppo.yaml b/examples/atari/atari_ppo.yaml
@@ -2,22 +2,27 @@ seed: 0
 lr: 2.5e-4
 critic_lr: 2.5e-4
 episode_length: 128
-ppo_epoch: 4
+gamma: 0.99
+ppo_epoch: 3
 gain: 0.01
 use_linear_lr_decay: true
 use_share_model: true
 entropy_coef: 0.01
 hidden_size: 512
-num_mini_batch: 4
-clip_param: 0.1
+num_mini_batch: 8
+clip_param: 0.2
 value_loss_coef: 0.5
+max_grad_norm: 10
+
 run_dir: ./run_results/
-experiment_name: atari_ppo
+
 log_interval: 1
 use_recurrent_policy: false
 use_valuenorm: true
 use_adv_normalize: true
+
 wandb_entity: openrl-lab
+experiment_name: atari_ppo
 
 vec_info_class:
   id: "EPS_RewardInfo"
diff --git a/examples/atari/train_ppo.py b/examples/atari/train_ppo.py
@@ -43,11 +43,11 @@
 
 def train():
     cfg_parser = create_config_parser()
-    cfg = cfg_parser.parse_args()
+    cfg = cfg_parser.parse_args(["--config", "atari_ppo.yaml"])
 
     # create environment, set environment parallelism to 9
     env = make(
-        "ALE/Pong-v5", env_num=9, cfg=cfg, asynchronous=True, env_wrappers=env_wrappers
+        "ALE/Pong-v5", env_num=16, cfg=cfg, asynchronous=True, env_wrappers=env_wrappers
     )
 
     # create the neural network
@@ -56,7 +56,7 @@ def train():
         env, cfg=cfg, device="cuda" if "macOS" not in get_system_info()["OS"] else "cpu"
     )
     # initialize the trainer
-    agent = Agent(net, use_wandb=True)
+    agent = Agent(net, use_wandb=True, project_name="Pong-v5")
     # start training, set total number of training steps to 20000
 
     agent.train(total_time_steps=5000000)

diff --git a/examples/envpool/README.md b/examples/envpool/README.md
@@ -0,0 +1,20 @@
+## Installation
+
+
+Install envpool with:
+
+``` shell
+pip install envpool
+```
+
+Note 1: envpool only supports Linux operating system.
+
+## Usage
+
+You can use `OpenRL` to train Cartpole (envpool) via:
+
+``` shell
+PYTHON_PATH train_ppo.py
+```
+
+You can also add custom wrappers in `envpool_wrapper.py`. Currently we have `VecAdapter` and `VecMonitor` wrappers.