This repository contains materials for the workshop sessions of the DRL course, taught on Sharif University of Technology (SUT), fall 2024. Each week covers different aspects of reinforcement learning, from basic concepts to advanced topics.
Week | Topic | Methods | Environment | Action Space | Key Tools |
---|---|---|---|---|---|
1 | Python Basics | N/A | N/A | N/A | Python basics, libraries setup |
2 | Agents and Environments | Custom RL framework, OpenAI Gym | N/A | N/A | OpenAI Gym |
3 | Tabular-based RL | DP (Value/Policy Iteration), MC | FrozenLake | Discrete | NumPy, OpenAI Gym |
4 | Temporal Difference (TD) Methods | SARSA, Q-Learning, Expected SARSA | Cliff-World | Discrete | NumPy, Matplotlib |
5 | Function Approximation (Linear) | Linear Approximation | CartPole | Discrete | scikit-learn (RBFSampler) |
6 | Tile Coding and ANNs | Tile Coding, ANN with Dropout | MountainCar | Discrete | scikit-learn, Keras |
7 | CNNs and LSTMs | CNN, LSTM | CIFAR, Tesla Stock Prices | N/A | Keras, TensorFlow |
8 | Optimal Control (ADP) | ADP, On/Off-policy, LQR Gains | Quanser Helicopter | Continuous | Custom simulators |
9 | Advanced ADP and Deep Value-based RL | Weighted Residuals, DQN | CartPole | Discrete | TensorFlow, OpenAI Gym |
10 | Deep Value-based RL Extensions | Double DQN, Dueling DQN, D3QN with PER | CartPole | Discrete | TensorFlow, OpenAI Gym |
11 | Policy Gradient Methods 1 | REINFORCE, VPG | CartPole | Discrete | TensorFlow, Keras |
12 | Policy Gradient Methods 2 | Actor-Critic, A3C, A2C | CartPole, Pendulum | Discrete/Continuous | TensorFlow, Threading |
13 | Advanced Actor-Critic 1 | DDPG, TD3 | Pendulum | Continuous | TensorFlow, Keras |
14 | Advanced Actor-Critic 2 | SAC, PPO-Clip | Pendulum | Continuous | TensorFlow, Keras |
- Chapter 1: Introduction
- Python fundamentals for reinforcement learning
- Basic programming concepts and tools needed for the course
- Part 1: Building a simple RL framework from scratch
- Part 2: Introduction to OpenAI Gym
- Understanding the interaction between agents and environments
- Chapter 2: Tabular-based Reinforcement Learning
- Part 1: Dynamic Programming
- Value Iteration
- Policy Iteration
- Implementation on FrozenLake environment
- Part 2: Monte Carlo Methods
- On-policy First-visit MC
- Off-policy MC with importance sampling
- Applications on FrozenLake environment
- Implementation of various TD algorithms on Cliff-world:
- SARSA
- Q-Learning
- Expected SARSA
- Chapter 3: Function Approximation Methods
- Linear approximation of action-value function
- Implementation on CartPole using RBFSampler from scikit-learn
- Part 1: Tile Coding
- Implementation on MountainCar environment
- Part 2: Artificial Neural Networks with Keras
- House price prediction
- Dropout techniques for overfitting prevention
- Convolutional Neural Networks (CNNs)
- Implementation with Keras on CIFAR dataset
- Long Short-Term Memory (LSTM)
- Tesla stock price prediction
- Chapter 4: Optimal Control
- Approximate Dynamic Programming (ADP)
- Continuous-time systems
- Case study: Quanser helicopter
- Linear Quadratic Regulator (LQR)
- Finding gains using ADP (on-policy and off-policy approaches)
- Part 1: ADP using weighted residuals method
- Chapter 5: Deep Value-based RL
- Part 2: Deep Q-Network (DQN) implementation
- Solving CartPole environment
- Double DQN.
- Dueling Double DQN.
- Sumtree for prioritized experience replay (PER).
- D3QN with PER to solve CartPole.
- Chapter 6: Policy Gradient Methods
- Introduction to TensorFlow
- REINFORCE and Vanilla Policy Gradient (VPG) methods to solve CartPole.
- Introduction to Actor-Critic.
- Threading and multiprocessing in Python.
- Asynchronous Advantage Actor-Critic (A3C).
- Advantage Actor-Critic (A2C).
- Chapter 7: Advanced Actor-Critic
- Deep Deterministic Policy Gradient (DDPG).
- Twin Delayed Deep Deterministic Policy Gradient (TD3).
- Soft Actor-Critic (SAC).
- Proximal Policy Optimization (PPO) with Clipping.
- Basic Python programming knowledge
- Understanding of basic machine learning concepts
- Familiarity with neural networks
- Basic understanding of control theory
- Clone this repository
- Install required dependencies (requirements.txt will be provided)
- Follow the weekly materials in order
- Complete the TODO sections, exercises and assignments
- Python 3.x
- OpenAI Gym
- TensorFlow/Keras
- scikit-learn
- NumPy
- Matplotlib
- Other specific requirements will be listed in requirements.txt
If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.
This repository is licensed under the MIT License.