Reacher Udacity

In this project, the agent is a double-jointed robotic arm that must learn how to reach the goal position represented by a green sphere. To achieve this goal, the method used was Deep Deterministic Policy Gradient, implemented with Prioritized Experience Replay.

The environment

The agent receives a reward for every time step that it successfully reaches the goal location.

An observation of the environment is composed by a vector with 33 elements, which represents each joint's position, rotation, linear and angular velocity. To interact with the environment, the agent is capable of applying torque to each of its joints. The intensity of the applied torque must be between -1 and 1.

This is an episodic task with a continuous action space. An episode ends after reaching 1002 time steps. This could be easily transformed into a continuous task by simply ignoring the terminal states and instead collected a number of time steps representing each episode.

This project is considered solved if the agent achieves an average reward of 30.0 for the next 100 episodes. In this implementation, the best result obtained was an average reward of 37.36 after 412 episodes.

Dependencies

This project is a requirement from the Udacity Deep Reinforcement Learning Nanodegree . The environment is provided by Udacity. It depends on the following packages:

Python 3.6
Numpy
PyTorch
Unity ML-Agents Beta v0.4

Getting Started

Linux (Debian-based)

Install python3.6 (any version above is not compatible with the unity ml-agents version needed for this environment)

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.6-full

(Optional) Create a virtual environment for this project

cd <parent folder of venv>
python3.6 -m venv <name of the env>
source <path to venv>/bin/activate

Install the python dependencies

python3 -m pip install numpy torch

Download the Unity ML-Agents release file for version Beta v0.4. Then, unzip it at folder of your choosing
Build Unity ML-Agents

cd <path ml-agents>/python
python3 -m pip install .

Clone this repository and download the environment created by Udacity and unzip it at the world folder

git clone https://github.com/jhonasiv/reacher-udacity
cd reacher-udacity
mkdir world
wget https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Linux.zip
unzip Reacher_Linux.zip -d world

Demo

After training the agent until a rolling average reward of 35.0 was reached for 100 episodes, this is how it looks.

Agent trained with an average score of 35.0

Running the application

Execute the main.py file
```
python3 src/main.py
```
For more information on the available command line arguments, use:
```
python3 src/main.py --help
```
- Some notable cli arguments:
  - --eval: runs the application in evaluation mode, skipping training step, model_path must be set
  - --buffer_size: maximum size of the experience buffer.
  - --a_lr: learning rate for the actor
  - --c_lr: learning rate for the critic

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ckpt		ckpt
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md
Report.md		Report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reacher Udacity

Table of Contents

The environment

Dependencies

Getting Started

Linux (Debian-based)

Demo

Running the application

About

Releases

Packages

Languages

jhonasiv/reacher-udacity

Folders and files

Latest commit

History

Repository files navigation

Reacher Udacity

Table of Contents

The environment

Dependencies

Getting Started

Linux (Debian-based)

Demo

Running the application

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages