Welcome! This course is jointly taught by UC Berkeley and the Tsinghua-Berkeley Shenzhen Institute (TBSI).
- Prof. Scott Moura (UC Berkeley) <smoura [at] berkeley.edu>
- Co-Instructor Saehong Park (UC Berkeley) <sspark [at] berkeley.edu>
- TA Xinyi Zhou (TBSI) <zxyyx48 [at] 163.com>
China Time | California Time |
---|---|
July 7, 8, 9, 10 (Tu-F) | July 6, 7, 8, 9 (M-Th) |
July 14, 15, 16, 17 (Tu-F); | July 13, 14, 15, 16, 17 (M-Th) |
all at 08:30-10:05 China Time | all at 5:30pm PT - 7:05pm PT |
Day | Topic | Speaker | Pre-recorded Lecture | Slides / Notes | Real-time Lecture Recordings |
---|---|---|---|---|---|
1 | 1a. Introduction - Course Org | Scott Moura | Zoom Recording PW: 1e*OV@Re | LEC1a Slides | Recording Link PW: 9L%JePa= |
1b. Introduction – History of RL | Scott Moura | Zoom Recording PW: 1k.E69^o | LEC1a Slides | ||
1c. Optimal Control Intro | Scott Moura | Zoom Recording PW: 2B&=2@*@ | |||
2 | 2a. Dynamic Programming | Scott Moura | Zoom Recording PW: 3F*1rg%? | LEC2a Notes | Recording Link PW: 8Q?#51=J |
2b. Case Study: Linear Quadratic Regulator (LQR) | Scott Moura | Zoom Recording PW: 5Y#4=58& | LEC2b Notes | ||
3 | 3a. Policy Evaluation & Policy Improvement | Scott Moura | Zoom Recording PW: 9N@%H4&@ | LEC3a Notes | Recording Link PW: 1A@@0G63 |
3b. Policy Iteration Algo | Scott Moura | Zoom Recording PW: 6y+!+6#9 | LEC3b Notes | ||
3c. Case Study: LQR | Scott Moura | Zoom Recording PW: 6D@YkC&= | LEC3c Notes | ||
4 | 4a. Approximate DP: TD Error & Value Function Approx. | Scott Moura | Zoom Recording PW: 6v&78$We | LEC4a Notes | Recording Link PW: 4t=#ye7T |
4b. Case Study: LQR | Scott Moura | Zoom Recording PW: 1O^fh.8+ | LEC4b Notes | Installation Recording PW: 2s+83!eQ | |
4c. Online RL with ADP | Scott Moura | Zoom Recording PW: 0q=.4378 | LEC4c Notes | ||
5 | 5a. Actor-Critic Method | Scott Moura | Zoom Recording PW: 2y!@@#$7 | LEC5a Notes | Recording Link PW: 1Z^6B28+ |
5b. Case Study: Offshore Wind | Scott Moura | LEC5b Notes | |||
6 | 6a. Markov Decision Process | Saehong Park | Zoom Recording PW:5L=*%&2i | LEC6 Notes | Recording Link PW: 4L*=91?@ |
6b. Q-Learning | Saehong Park | Zoom Recording PW: 3K!+fj^V | |||
7 | 7a. Policy Optimization | Saehong Park | Zoom Recording PW: 0W$fa0$M | LEC7a Notes | Recording Link PW: 9j++=3$5 |
7b. Policy Gradient | Saehong Park | Zoom Recording PW: 2N++5&I3 | LEC7b Notes | ||
7c. Policy Gradient | Saehong Park | Zoom Recording PW: 3j%n80** | LEC7c Notes | ||
8 | 8a. Actor Critic | Saehong Park | Zoom Recording PW: 2F!WI9$8 | LEC8a Notes | Recording Link PW: 0W$+=9P* |
8b. Actor Critic | Saehong Park | Zoom Recording PW: 9r$HH%59 | LEC8b Notes | ||
8c. RL for Energy Systems: Battery Fast-charging | Saehong Park | Zoom Recording PW: 9r$HH%59 | Slides |
- Optimal Control
- Dynamic Programming
- Principal of Optimality & Value Functions
- Case Study: Linear Quadratic Regulator (LQR)
- Principal of Optimality & Value Functions
- Policy Evaluation & Policy Improvement
- Policy Iteration Algo & Variants
- Case Study: LQR
- Approximate Dynamic Programming (ADP)
- Temporal Difference (TD) Error
- Value Function Approximation
- Case Study: LQR
- Online RL with ADP
- Actor-Critic Method
- Case Study: Offshore Wind
- Q-Learning
- Q-learning algorithm
- Advanced Q-learning algorithm, i.e., DQN
- Policy Gradient
- Policy Optimization
- Vanilla policy gradient (REINFORCE)
- Actor-Critic using Policy Gradient
- Actor-Critic using Policy Gradient
- Advanced Actor-Critic algorithm, i.e., DDPG
- RL for energy systems
- Case Study: Battery Fast-charging
- 2020 Lecture Notes [Updated 2020-7-16]
- 2019 Lecture Notes
- Tenrsorflow review [Updated 2020-7-12]
- Homework [Updated 2020-7-16]