Skip to content

A mini-project to make a deep RL agent to play pong in the openAI gym. This was made in the summer school CCNSS.

Notifications You must be signed in to change notification settings

Immiora/deepPongRL

 
 

Repository files navigation

DeepPongRL

A mini-project to make a deep RL agent to play pong in the openAI gym. This was made in the summer school CCNSS.

Training of the RL agent using policy gradients.

Trainig was done in OpenAI Gym environment with Python 2.7

Used packages:

  • OpenAI Gym
  • Keras (TF backend)

Example trained

problem_statement We would like to train an RL agent to win at Pong game

policy gradients A. Karpathy introduced a policy gradient algorith for training an RL agent. The agent learns by processing pixel information from each frame of the game. The optimal policy (prob(action|image)) is calculated by adding a reward function to the neural network gradient.

deep q-learning As an alternative to policy gradients we looked at the deep Q-learning algorithm.

best model performance One of our best models (Policy gradient, MLP 1 hidden layer, 200 hidden units, relu activation) learnt to beat the built-in AI agent in OpenAI Pong environment.

best model training The policy gradient model was trained on ~5000 episodes and almost reached 0-reward.

model comparison We did some preliminary model comparison and observed that shallower models (1-layer) converged faster (final performance subject to number of training episodes). We also saw that policy gradient training resulted in faster and more graduate training compared to deep Q-learning.

training details Our final note concerned GPU vs CPU computational time differences. We report that smaller networks are trained faster on a CPU unit.

About

A mini-project to make a deep RL agent to play pong in the openAI gym. This was made in the summer school CCNSS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%