Skip to content

Latest commit

 

History

History
129 lines (112 loc) · 6.01 KB

File metadata and controls

129 lines (112 loc) · 6.01 KB

ME5406_Project2_Dynamic_Obstacle_Grid_FHL maven

This repository contains the implementation of reinforcement learning algorithms like PPO and A2C, to solve the problem: Dynamic Obstacle Avoidance in Generalized Environment. And test the generalization and migration of the trained model using these algorithms.

Project Description

The objective of this project is using Deep Reinforcement Learning techniques to implement the Dynamic Obstacle Avoidance in Generalized Environment. The problem is essentially a grid-world scenario in which the agent’s target is to go from the start point, go through the room by exit which was randomly setalong the wall, and reach the goal which set in another room, while avoiding crashing into dynamic obstacles in the environment. Meanwhile, the adding of the field of views enables the agent to have the ability of partial or fully observation. It has to be mentioned that the generalization ability oftrained model is tested during the process.

The available environments are: ThreeRooms-Dynamic-Obstacles-21x21-v0 and FourRooms-Dynamic-Obstacles-21x21-v0:

  • 📈 Tensorboard Logging
  • 📜 Local Reproducibility via Seeding
  • 🎮 Videos of Gameplay Capturing
  • 🧫 Experiment Management with Weights and Biases

Project Preparation maven

Virtual Environment Creation

First, create the virtual environment using Anoconda and activate the created environment in Ubuntu 18.04.

$ conda create -n obstacle_grid python=3.6
$ source activate obstacle_grid

Requirements Install maven maven

The project is based on the python version Python 3.6.8. For the requirements, a new virtual environmrnt is recommended. You should install the required packages in requirements.txt using:

pip install -r requirements.txt

To use GPU acceleration, make sure to install the appropriate CUDA version; the installation command for the project is:

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

Project Structure

  • ME5406 Project2
    • ./algoithms
      • This folder contains the implemention of PPO and A2C.
      • ppo.py - realization of PPO-Clip, with hyper-parameters tuning options.
      • a2c.py - realization of A2C, with hyper-parameters tuning options.
    • ./custom_env
      • This folder contains the environment construction details of the project.
      • env.py - specific construction of the environment can be found in this file.
    • ./plot
      • This folder contains some scripts for the demonstration of metrics during training and evaluation process.
    • ./resuls
      • This folder contains some of the plot results for the task of comparison, tuning hyperparameter, etc.
    • ./storage
      • This folder includes the trained model, tensorborad log, and csv log.
      • trained models
    • ./utils
      • Some useful tools like data storge format transfer are contained in this folder.

Project Execution maven

The main scripts of the project are: train.py, evaluate.py, and visualize.py. For the detailed usage please refer to the parser in the corresponding files. The example of training, evaluation and visualization can be illustrated as:

set --prod-mode to be True if you want to use production mode with wandb.

Trian

  • A2C Agent
python train.py --env 'ThreeRoom' --algo a2c --frames-per-proc 8
python train.py --env 'ThreeRoom' --algo a2c --frames-per-proc 8 --memory --recurrence 2
  • PPO Agent
python train.py --env 'ThreeRoom' --algo ppo --frames-per-proc 128
python train.py --env 'ThreeRoom' --algo ppo --frames-per-proc 128 --memory --recurrence 2

Evaluate

  • Evaluate in 3-room Environment
python evaluate.py --eval_env 'ThreeRoom' --algo ppo --recurrence 1
python evaluate.py --eval_env 'ThreeRoom' --algo ppo --memory --recurrence 2
  • Evaluate in 4-room Environment
python evaluate.py --eval_env 'FourRoom' --algo ppo --recurrence 1
python evaluate.py --eval_env 'FourRoom' --algo ppo --memory --recurrence 2

Visualize

  • Visualize in 3-room environment
python visualize.py --env 'ThreeRoom' --algo ppo --recurrence 1
python visualize.py --env 'ThreeRoom' --algo ppo --memory --recurrence 2
  • Visualize in 4-room environment
python visualize.py --env 'FourRoom' --algo ppo --recurrence 1
python visualize.py --env 'FourRoom' --algo ppo --memory --recurrence 2

Tensorboard & WanDB

During training, logs are recorder in Tensorboard and Weights & Biases, and an example of using tensorboard:

cd storage/ppo_4
tensorboard --logdir=./ --host=127.0.0.1

Result Display maven

For video illustration, please refer to video, this video explains some problems, possible analysis and conclusions through voice.

3-room Environment

A2C Agent PPO Agent

4-room Environment

PPO Agent PPO+LSTM2 Agent PPO+LSTM4 Agent