Skip to content

Comparison of Deep Q-Network and Proximal Policy Optimization on Simple Pong Environment

Notifications You must be signed in to change notification settings

sinanutkuulu/Reinforcement-Learning-in-Pong-DQN

Repository files navigation

Pong-Game-DQN

Analysis of Deep Q-Network algorithm on Simple Pong Environment

State Representation

I defined agent’s states like the following:

  • Position of the left paddle on the y-axis
  • Position of the right paddle on the y-axis
  • Position of the ball on the y-axis
  • Position of the ball on the x-axis
  • Velocity of the ball on the x-direction
  • Velocity of the ball on the y-direction

Reward Function Definition

A set of predefined constants delineate the rewards and penalties for different in-game events as following:

  1. Game End Condition: If the game concludes , the function checks the outcome: - If the agent scores, the +10 reward is granted.
  • If the opponent scores, the -10 penalty is deducted.
  1. In-Game Rewards/Penalties: For ongoing games:
  • The function first checks for the ball's collision with the agent's paddle. If the ball hits the center of the paddle, a reward of +0.1 is given; otherwise, a penalty of -0.1 is applied.
  • If there's no collision, the function evaluates the agent's movement towards or away from the ball, using the difference in vertical distance between the ball and the paddle. Depending on the movement direction, a corresponding reward of +0.5 or penalty of - 0.5 is assigned.

Results

Screenshot 2023-08-31 at 17 33 19

Test of agent (DQN) against nominal player

Screenshot 2023-08-31 at 17 30 04

About

Comparison of Deep Q-Network and Proximal Policy Optimization on Simple Pong Environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages