强化学习：原理与Python实现

全球第一本配套 TensorFlow 2 代码的强化学习教程书

中国第一本配套 TensorFlow 2 代码的纸质算法书

现已提供 TensorFlow 2 和 PyTorch 1 对照代码

本书介绍强化学习理论及其 Python 实现。

理论完备：全书用一套完整的数学体系，严谨地讲授强化学习的理论基础，主要定理均给出证明过程。各章内容循序渐进，覆盖了所有主流强化学习算法，包括资格迹等非深度强化学习算法和柔性执行者/评论者等深度强化学习算法。
案例丰富：在您最爱的操作系统（包括 Windows、macOS、Linux）上，基于 Python 3.10、Gym 0.23 和 TensorFlow 2 / PyTorch 1，实现强化学习算法。全书实现统一规范，体积小、重量轻。第 1～9 章给出了算法的配套实现，环境部分只依赖于 Gym 的最小安装，在没有 GPU 的计算机上也可运行；第 10～12 章介绍了多个热门综合案例，涵盖 Gym 的完整安装和自定义扩展，在有普通 GPU 的计算机上即可运行。

2020年更新

本书深度强化学习部分新增基于 TensorFlow 2 和 PyTorch 1 的算法对照实现。两个版本实现均和正文伪代码严格对应，两个版本仅在智能体部分实现不同，程序结构和智能体参数完全相同。

2022年更新

由于GitHub更改了Jupyter Notebook的显示方式为默认不显示输出，所以我将Jupyter Notebook转成了HTML文件，存在./html文件夹下以方便查阅程序输出。

初识强化学习查看代码：useGym
Markov决策过程查看代码：useBellman CliffWalking
有模型数值迭代查看代码：FrozenLake
回合更新价值迭代查看代码：Blackjack
时序差分价值迭代查看代码：Taxi
函数近似方法查看代码：MountainCar
回合更新策略梯度方法查看代码：CartPole
执行者/评论者方法查看代码：Acrobot
连续动作空间的确定性策略查看代码：Pendulum
综合案例：电动游戏查看代码：Breakout Pong Seaquest
综合案例：棋盘游戏查看代码：TicTacToe Reversi boardgame2
综合案例：自动驾驶查看代码：AirSimNH

QQ群

QQ群：948110103（勘误报错可发此群，其他问题提问前请先Google，群主和管理员不提供免费咨询服务）
多任务群：696984257（非小白群，多任务强化学习+强化元学习+终身强化学习+迁移强化学习，勘误报错勿发此群，提问前请先Google）
关于入群验证问题：由于QQ的bug，即使正确输入答案，也可能会验证失败。这时更换设备重试、更换输入法重试、改日重试均可能解决问题。如果答案中有英文字母，清注意大小写。
纸板书前言中给出的QQ群（935702193和243613392）已满，不再新增群成员，谢谢理解。

书籍勘误与更新

2019年08月第1版第1次印刷：查看勘误与更新拼多多上的盗版都是这个版次的，建议退掉，然后到天猫/京东/当当上买新版
2019年11月第1版第2次印刷：查看勘误与更新
2019年12月第1版第3次印刷：查看勘误与更新
2020年09月第1版第4次印刷：查看勘误与更新
2020年11月第1版第5次印刷：查看勘误与更新
2021年01月第1版第6次印刷：查看勘误与更新
2021年10月第1版第7次印刷：查看勘误与更新
电子版不提供勘误与更新。

判断纸质版书籍版次的方法 / 确定纸质书印刷时间的方法

“前言”之前有1页是“图书在版编目（CIP）数据”。这页下部的表格中有一项是“版次”，该项标明当前书是什么时候第几次印刷的。

本书数学符号表

下载PDF

本书电子版

本书不仅有纸质版销售，也有电子版销售。不过，电子版没有提供配套的勘误与更新资源，而且公式展示不美观，对阅读带来困难。所以推荐购买纸质版。电子版销售平台包括但不限于：

Kindle电子书：https://www.amazon.cn/dp/B07X936G34/
京东读书：https://e.jd.com/30513215.html
华章课堂：http://www.hzcourse.com/web/refbook/detail/8397/226

热心读者 Anesck 对本书知识点的梳理评注

第1章第2章第3章第4章第5章第6章第7章第8章第9章

常见问题

问：Windows系统下安装TensorFlow或PyTorch失败。答：请在Windows 10/11里安装Visual Studio 2022（如果有旧版本的Visual Studio请先彻底卸载）。更多细节和安装问题请自行Google。PyTorch安装可参阅：https://mp.weixin.qq.com/s/uRx1XOPrfFOdMlRU6I-eyA
问：在Visual Studio或Visual Studio Code或PyCharm里面运行代码失败，比如找不到函数display()。答：本repo代码是配套Jupyter Notebook环境的，只能在Jupyter Notebook里运行。推荐您安装最新版本的Anaconda并直接运行下载来的Notebook。（display()函数是Jupyter Notebook里才有的函数。）不需要安装Visual Studio Code或PyCharm。更多细节或其他错误请自行Google。
问：运行的结果和repo里带的结果不完全一样。答：本repo中涉及到TensorFlow或PyTorch的代码，附带的结果都是用CPU跑的。GPU运算本来就不能精确复现（更多细节请自行Google）。Gym 0.22版本重新实现了seeding机制，有些结果是用Gym <=0.21版本跑的，而没有用Gym 0.22重跑。如果发现Gym 0.22 CPU版本运行结果不正常请发送勘误报错，谢谢。
问：GPU会比CPU跑的快么？答：没有用到TensorFlow和PyTorch的代码，不会用到GPU。用到TensorFlow和PyTorch的代码，由于网络一般不大，GPU反而可能更慢。PyTorch代码使用GPU时要把Tensor对象放在GPU上（可能需要修改代码）。

Reinforcement Learning: Theory and Python Implementation

The First Reinforcement Learning Tutorial Book with TensorFlow 2 Implementation

Codes with both TensorFlow 2 and PyTorch 1

This is a tutorial book on reinforcement learning, with explanation of theory and Python implementation.

Theory: Starting from a uniform mathematical framework, this book derives the theory and algorithms of reinforcement learning, including all major algorithms such as eligibility traces and soft actor-critic algorithms.
Practice: Every chapter is accompanied by high quality implementation based on Python 3.10, Gym 0.23, and TensorFlow 2 / PyTorch 1. Codes in first 9 chapters only depends on minimum installation of Gym, and can be run in a laptop without GPU. Codes in the last 3 chapters can be run in a laptop with a normal GPU. All codes are compatible with Windows, Linux, and macOS.

Please email me if you are interested in publishing this book in other languages. English version will be published by Springer Nature.

Table of Codes

Chapter	Environment & Closed-Form Policy	Agent
2	CliffWalking-v0	Bellman
3	FrozenLake-v1	DP
4	Blackjack-v1	MC
5	Taxi-v3	SARSA, ExpectedSARSA, QL, DoubleQL, SARSA(λ)
6	MountainCar-v0	SARSA, SARSA(λ), DQN tf torch, DoubleDQN tf torch, DuelDQN tf torch
7	CartPole-0	VPG tf torch, VPGwBaseline tf torch, OffPolicyVPG tf torch, OffPolicyVPGwBaseline tf torch
8	Acrobot-v1	QAC tf torch, AdvantageAC tf torch, EligibilityTraceAC tf torch, PPO tf torch, NPG tf torch, TRPO tf torch, OffPAC tf torch
9	Pendulum-v1	DDPG tf torch, TD3 tf torch
10	LunarLander-v2	SQL tf torch, SAC tf torch, SACwA tf torch
10	LunarLanderContinuous-v2	SACwA tf torch
11	BipedalWalker-v3	ES, ARS
12	PongNoFrameskip-v4	CategoricalDQN tf torch, QR-DQN tf torch, IQN tf torch
13	BernoulliMAB-v0	UCB
13	GaussianMAB-v0	UCB
14	TicTacToe-v0	AlphaZero tf torch
15	HumanoidBulletEnv-v0	BehaviorClone tf torch, GAIL tf torch
16	Tiger-v0	VI

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
chapter01_intro		chapter01_intro
chapter02_mdp		chapter02_mdp
chapter03_dp		chapter03_dp
chapter04_mc		chapter04_mc
chapter05_td		chapter05_td
chapter06_approx		chapter06_approx
chapter07_pg		chapter07_pg
chapter08_ac		chapter08_ac
chapter09_dpg		chapter09_dpg
chapter10_atari		chapter10_atari
chapter11_alphazero		chapter11_alphazero
chapter12_drive		chapter12_drive
errata		errata
html		html
notebooks		notebooks
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

强化学习：原理与Python实现

目录

Reinforcement Learning: Theory and Python Implementation

Table of Codes

Table of Contents

About

Releases

Packages

Languages

xs818818/rl-book

Folders and files

Latest commit

History

Repository files navigation

强化学习：原理与Python实现

目录

Reinforcement Learning: Theory and Python Implementation

Table of Codes

Table of Contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages