A Reinforcement Learning Library [dev]
In the future, this repo will move to rlearn
创建rlearn_dev
的原因是: 最初开发rlearn.py时使用了自定义的非gym环境,包袱过重
pip install -e .
- draft: Draft version, used for designing algorithms, API-not-stable but working
- naive: Naive version, raw implementation with raw or minor optimizations, could be used for benchmarking
- main: Stable version, ready for production, with more docs and tests
使用rlearn_dev
的naive版本SAC训练mujoco Ant-v5,含主动添加的噪音下,训练得分6000+
trained by naive.SAC, with deterministic=False, score=6000+
对比第三方知名库ElegantRL benchmark: 用GPU并行环境Isaac Gym 训练机器人Ant,3小时6000分,最高12000分(代码开源
训练log:
2024-11-23 07:45:24 | INFO | Episode 2019/5000 [4134912]: Average Reward: 5611.59677, detail: [6339.41974365 6330.48039567 6412.41894184 6317.87476665 6403.85789968
99.67731813 6430.00267087 6559.04241766]
2024-11-23 07:45:41 | INFO | Episode 2020/5000 [4136960]: Average Reward: 6320.81176, detail: [6131.89531841 6141.40146106 6371.71031779 6146.98573506 6462.2939436
6572.73806615 6467.93920164 6271.53004489]
2024-11-23 07:45:59 | INFO | Episode 2021/5000 [4139008]: Average Reward: 6108.65877, detail: [6366.46026193 6499.5197772 3587.1468893 6650.29738074 6514.41845971
6469.3829467 6313.48620102 6468.5582827 ]
2024-11-23 07:46:17 | INFO | Episode 2022/5000 [4141056]: Average Reward: 6082.90059, detail: [3445.91923514 6416.88338573 6495.99878767 6572.26091719 6577.24471379
6474.37533201 6336.68020873 6343.8421299 ]
2024-11-23 07:46:35 | INFO | Episode 2023/5000 [4143104]: Average Reward: 6054.10073, detail: [6386.18024652 6042.66135112 6577.68305068 6432.44589925 6414.15561133
6458.69392698 3451.87516831 6669.11062416]
2024-11-23 07:46:53 | INFO | Episode 2024/5000 [4145152]: Average Reward: 6399.13154, detail: [6017.71464908 6581.0859381 6549.04273248 6514.8605336 6270.06884443
6471.11665318 6546.70169277 6242.46127681]
state | Agent | version | env | description | demo |
---|---|---|---|---|---|
✅ | C51 | naive | VecEnv | Categorical DQN | demo |
✅ | QAC | naive | Env | Q Actor-Critic | demo |
✅ | DDPG | naive | VecEnv | DDPG | demo |
✅ | TD3 | naive | VecEnv | Twin Delayed DDPG | demo |
✅ | SAC | naive | VecEnv | Soft Actor-Critic | demo |
✅ | MCPG | basic | Env | Monte-Carlo REINFORCE | demo |
✅ | PPO | naive | VecEnv | Proximal Policy Optimization | demo |
import gymnasium as gym
from rlearn_dev.methods.sac.naive.agent import SACAgent as Agent
def make_env(env_id, seed, idx, capture_video, run_name):
def _make_env():
if capture_video and idx == 0:
env = gym.make(env_id, render_mode="rgb_array")
env = gym.wrappers.RecordVideo(
env,
f"videos/{run_name}",
name_prefix=f"sac_{env_id}",
episode_trigger=lambda episode_idx: episode_idx % 100 == 0,
video_length=0, # 0 means infinite
)
else:
env = gym.make(env_id)
env = gym.wrappers.RecordEpisodeStatistics(env)
env.action_space.seed(seed)
return env
return _make_env
def test_sac_naive_autotune():
capture_video = True
# capture_video = False
run_name = 'test'
num_envs = 5
env_id = 'Hopper-v4'
env = gym.vector.SyncVectorEnv([make_env(env_id, 36, i, capture_video, run_name)
for i in range(num_envs)])
g_seed = 36
config = {
'gamma': 0.99,
'batch_size': 64,
'tau': 0.005,
'actor_lr': 0.0003,
'critic_lr': 0.0003,
'buffer_size': 100_000,
'exploration_noise': 0.1,
'autotune': True, # True
}
agent = Agent(env, config=config, seed=g_seed)
learn_config = {
'max_episodes': 30_000//256,
'max_episode_steps': 256,
'max_total_steps': 1000_000,
'verbose_freq': 1,
}
agent.learn(**learn_config)
env.close()
if __name__ == '__main__':
test_sac_naive_autotune()
- SACT, SACT-v2
- PPG (phasic policy gradient)
- PPO-RND
- tabular methods
- add more docs and tests
- add more papers and references
- abstract nnblock
- make
EnvPlayer
more flexible - more unified interfaces
- self-contained gymnasium-like(v1) envs
- read source code sb3-off-policy
- process TimeLimit & modify ReplayBuffer gym-time-limit
- support multi-agent
- support gymnasium-v1
- support distributed training
- parallel DataCollector
最新版本的gymnasium依赖的moviepy有一些问题 Latest version of gymnasium depends on moviepy with some issues
1. pip install --upgrade decorator==4.0.2
2. pip uninstall moviepy decorator
3. pip install moviepy