This repository contains an implementation of reinforcement learning algorithms applied to the classic side-scrolling platform game Dangerous Dave. The game is a Python clone of the original game, which can be found at mwolfart/dangerous-dave.
Dangerous Dave is a challenging side-scrolling platform game where the player controls the character Dave as he navigates through levels filled with enemies and obstacles. In this project, we aim to apply reinforcement learning techniques to train an agent to play Dangerous Dave effectively. Drawing inspiration from successful RL techniques used in similar games, such as Montezuma's Revenge, we focus on enhancing exploration and learning efficiency through advanced methods like Random Network Distillation (RND) and Proximal Policy Optimization (PPO).
Here's a demo of the trained agent playing Dangerous Dave:
-
Clone the repository:
git clone https://github.com/suvigyavijay/dangerous-dave-rl.git
-
Navigate to the project directory:
cd ddave-rl
-
Create a virtual environment and install the dependencies:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt
To test the game, run the following command:
python game.py
To test the environment, run the following command:
python env.py
To train the reinforcement learning agent, run the following command:
python agent.py --train --model-type ppo
The repository currently implements the following reinforcement learning algorithms:
- Proximal Policy Optimization (PPO): A policy gradient method that aims to optimize the policy in a stable manner by constraining the update step.
- Random Network Distillation (RND): An exploration strategy that uses a random neural network to predict the intrinsic reward signal.
- algo.cfg: Configuration file for the RL algorithms, specifying parameters such as total timesteps and common settings.
- game.cfg: Configuration file for the game environment, specifying details such as map dimensions, reward structures, and termination conditions.
agent.py
: Contains the implementation of the reinforcement learning agent.algo.cfg
: Configuration file for the RL algorithms.algos/
: Directory containing the implementation of RL algorithms.ppo.py
: Implementation of the PPO algorithm.rnd.py
: Implementation of the RND algorithm.utils.py
: Utility functions for the RL algorithms.
ddave/
: Contains the game assets and logic for Dangerous Dave.__init__.py
: Initializes theddave
module.helper.py
: Helper functions for the game.levels/
: Directory containing level design files.1.txt
: Configuration for Level 1.2.txt
: Configuration for Level 2.3.txt
: Configuration for Level 3.
tiles/
: Directory containing image resources for the game.game/
: Subdirectory with image files for game elements.ui/
: Subdirectory with image files for the user interface.
utils.py
: Additional utility functions specific to Dangerous Dave.
env.py
: Defines the environment for the Dangerous Dave game.game.cfg
: Configuration file for the game environment.game.py
: Contains the main game loop for Dangerous Dave.requirements.txt
: Lists the dependencies required to run the project.
The game environment is designed with both image-based and text-based observation spaces, allowing various types of observation inputs for reinforcement learning algorithms. Key configurable parameters include trophy score, item score, observation space representation, random agent respawn, grid world configuration, false door episode termination, and current level.
We used a vectorized environment to run the game and collect samples with different policies. Specifically, we used 64 parallel environments (NUM_ENVS = 64
) and trained them on a 64-core processor using a customized VecEnv wrapper from stable baselines. This setup allowed us to efficiently collect diverse experiences and significantly speed up the training process. One round of training takes around 3 hours with this configuration. Using this setup, the agent was able to solve Level 1 and began learning Level 2 in Dangerous Dave.
- mwolfart for the original Dangerous Dave Python clone.
- Gymnasium for creating the reinforcement learning environment.
- OpenAI Spinning Up for RL resources.
- Reinforcement learning with prediction-based rewards for RND exploration strategy.
- Proximal Policy Optimization for understanding PPO algorithm.
- Go-Explore: a New Approach for Hard-Exploration Problems for inspiration on exploration strategies.
- CleanRL for inspiration on PPO and RND implementations.
- Stable Baselines3 for Vector environment wrapper.