rlearn.dev

A Reinforcement Learning Library [dev]

In the future, this repo will move to rlearn 创建rlearn_dev的原因是: 最初开发rlearn.py时使用了自定义的非gym环境，包袱过重

Installation

pip install -e .

Versions

draft: Draft version, used for designing algorithms, API-not-stable but working
naive: Naive version, raw implementation with raw or minor optimizations, could be used for benchmarking
main: Stable version, ready for production, with more docs and tests

Features

showcase

MuJoco-ant-v5

使用rlearn_dev的naive版本SAC训练mujoco Ant-v5，含主动添加的噪音下，训练得分6000+ trained by naive.SAC, with deterministic=False, score=6000+

对比第三方知名库ElegantRL benchmark: 用GPU并行环境Isaac Gym 训练机器人Ant，3小时6000分，最高12000分（代码开源

训练log:

2024-11-23 07:45:24 | INFO   | Episode 2019/5000 [4134912]: Average Reward: 5611.59677, detail: [6339.41974365 6330.48039567 6412.41894184 6317.87476665 6403.85789968
   99.67731813 6430.00267087 6559.04241766]
2024-11-23 07:45:41 | INFO   | Episode 2020/5000 [4136960]: Average Reward: 6320.81176, detail: [6131.89531841 6141.40146106 6371.71031779 6146.98573506 6462.2939436
 6572.73806615 6467.93920164 6271.53004489]
2024-11-23 07:45:59 | INFO   | Episode 2021/5000 [4139008]: Average Reward: 6108.65877, detail: [6366.46026193 6499.5197772  3587.1468893  6650.29738074 6514.41845971
 6469.3829467  6313.48620102 6468.5582827 ]
2024-11-23 07:46:17 | INFO   | Episode 2022/5000 [4141056]: Average Reward: 6082.90059, detail: [3445.91923514 6416.88338573 6495.99878767 6572.26091719 6577.24471379
 6474.37533201 6336.68020873 6343.8421299 ]
2024-11-23 07:46:35 | INFO   | Episode 2023/5000 [4143104]: Average Reward: 6054.10073, detail: [6386.18024652 6042.66135112 6577.68305068 6432.44589925 6414.15561133
 6458.69392698 3451.87516831 6669.11062416]
2024-11-23 07:46:53 | INFO   | Episode 2024/5000 [4145152]: Average Reward: 6399.13154, detail: [6017.71464908 6581.0859381  6549.04273248 6514.8605336  6270.06884443
 6471.11665318 6546.70169277 6242.46127681]

Methods

state	Agent	version	env	description	demo
✅	C51	naive	VecEnv	Categorical DQN	demo
✅	QAC	naive	Env	Q Actor-Critic	demo
✅	DDPG	naive	VecEnv	DDPG	demo
✅	TD3	naive	VecEnv	Twin Delayed DDPG	demo
✅	SAC	naive	VecEnv	Soft Actor-Critic	demo
✅	MCPG	basic	Env	Monte-Carlo REINFORCE	demo
✅	PPO	naive	VecEnv	Proximal Policy Optimization	demo

Usages

import gymnasium as gym
from rlearn_dev.methods.sac.naive.agent import SACAgent as Agent

def make_env(env_id, seed, idx, capture_video, run_name):
    def _make_env():
        if capture_video and idx == 0:
            env = gym.make(env_id, render_mode="rgb_array")
            env = gym.wrappers.RecordVideo(
                env,
                f"videos/{run_name}",
                name_prefix=f"sac_{env_id}",
                episode_trigger=lambda episode_idx: episode_idx % 100 == 0,
                video_length=0, # 0 means infinite
            )
        else:
            env = gym.make(env_id)
        env = gym.wrappers.RecordEpisodeStatistics(env)
        env.action_space.seed(seed)
        return env

    return _make_env

def test_sac_naive_autotune():
    capture_video = True
    # capture_video = False
    run_name = 'test'
    num_envs = 5
    env_id = 'Hopper-v4'
    env = gym.vector.SyncVectorEnv([make_env(env_id, 36, i, capture_video, run_name)
                                     for i in range(num_envs)])
    g_seed = 36
    config = {
        'gamma': 0.99,
        'batch_size': 64,
        'tau': 0.005,
        'actor_lr': 0.0003,
        'critic_lr': 0.0003,
        'buffer_size': 100_000,
        'exploration_noise': 0.1,
        'autotune': True, # True
    }
    agent = Agent(env, config=config, seed=g_seed)
    learn_config = {
        'max_episodes': 30_000//256,
        'max_episode_steps': 256,
        'max_total_steps': 1000_000,
        'verbose_freq': 1,
    }
    agent.learn(**learn_config)
    env.close() 
    
if __name__ == '__main__':
    test_sac_naive_autotune()

TODO

Methods

SACT, SACT-v2
PPG (phasic policy gradient)
PPO-RND
tabular methods

Common

可能遇到的问题 Frequently Asked Questions

无法录制视频 Unable to record videos

最新版本的gymnasium依赖的moviepy有一些问题 Latest version of gymnasium depends on moviepy with some issues

1. pip install --upgrade decorator==4.0.2
2. pip uninstall moviepy decorator
3. pip install moviepy

Reference

Common

PPO

DDPG

Continuous Control with Deep Reinforcement Learning

TD3

SAC

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

C51

Categorical DQN

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
docs		docs
examples/ant		examples/ant
rlearn_dev		rlearn_dev
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rlearn.dev

Installation

Versions

Features

showcase

MuJoco-ant-v5

Methods

Usages

TODO

Methods

Common

可能遇到的问题 Frequently Asked Questions

无法录制视频 Unable to record videos

Reference

Common

PPO

DDPG

TD3

SAC

C51

Tutorials for Reinforcement Learning

About

Releases

Packages

Languages

License

gseismic/rlearn_dev.py

Folders and files

Latest commit

History

Repository files navigation

rlearn.dev

Installation

Versions

Features

showcase

MuJoco-ant-v5

Methods

Usages

TODO

Methods

Common

可能遇到的问题 Frequently Asked Questions

无法录制视频 Unable to record videos

Reference

Common

PPO

DDPG

TD3

SAC

C51

Tutorials for Reinforcement Learning

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages