Discussion

In environment footage, captured by human player

⚠ Note: This is the code for my article Meta-Reinforcement Learning on FloydHub. This repository is for DeepMind Lab and the Harlow task environment. For the git submodule containing all the tensorflow code and the DeepMind Lab wrapper, see this repository. For the two-step task see this repository instead.⚠

Here, we try to reproduce the simulations regarding the harlow task as described in the two papers:

To reproduce the Harlow Task, we used DeepMind Lab, a 3D learning environment that provides a suite of challenging 3D navigation and puzzle-solving tasks for learning agents. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning. For more info about DeepMind Lab, you can checkout their repo here.

Discussion

I answer questions and give more informations here:

Getting Started

Clone the repository:

$ git clone https://github.com/mtrazzi/harlow.git

Change current directory to harlow/python:

$ cd harlow/python

Fetch the git submodule meta_rl:

python$ git submodule init
python$ git submodule update --remote

Change current directory to the root of the repo:

python$ cd ..

Make sure that everything is correctly setup in WORKSPACE and python.BUILD (cf. section Configure the repo).
Install dependencies and build with bazel:

harlow$ sh install.sh
harlow$ sh build.sh

Train the meta-RL Agent:

harlow$ sh train.sh

Useful commands

To run the harlow environment as human, run:

harlow$ bazel run :game -- --level_script=contributed/psychlab/harlow

For a live example of the harlow agent, run

harlow$ bazel run :python_harlow --define graphics=sdl --incompatible_remove_native_http_archive=false -- --level_script contributed/psychlab/harlow  --width=640 --height=480

Directory structure

harlow
├── WORKSPACE
├── python.BUILD
└── python
    ├── meta-rl
    │   ├── harlow.py
    │   └── meta_rl
    │       ├── worker.py
    │       └── ac_network.py
    └── dmlab_module.c
└── data
    └── brady_konkle_oliva2008
        ├── 0001.png
        ├── 0002.png
        └── README.md

Most of our code is in the repository python where you'll find our git submodule meta-rl, that contains the three most important files: harlow.py, meta_rl/worker.py and meta_rl/ac_network.py.

⚠ For more details about those three important files, check out the README of the meta-rl repo, that also contains more information about the different architectures we tried and the on-going projects. ⚠

Apart from that, the other essential files are:

The Lua script for the Harlow Task Environment: game_scripts/levels/contributed/psychlab/factories/harlow_factory.lua
The file dmlab_module.c that creates a Python API to use DeepMind Lab.
The folder data/brady_konkle_oliva2008 that you can tweak to use your own dataset using the instructions in Dataset (the instructions in the README.md are to download the data for this paper).

Results

Goal

We tried to reproduce the results from Prefrontal cortex as a meta-reinforcement learning system (see Simulation 5 in Methods). We launched n=5 trainings using 5 different seeds, with the same hyperparameters as the paper, to compare to the results obtained by Wang et al.

Main differences

we removed the CNN.
we replaced the stacked LSTM with a 48 units LSTM (same as for the Harlow task).
we drastically reduced the action space so that the agent could do only left and right actions (that would directly target the center of the image.
we add artificial NO-OPS to force the agent to fixate for multiple frames (and to remove the noise at the beginning of an episode).
we used a dataset of 42 pictures, instead of 1000 images samples form ImageNet.
we used only 1 thread, on one CPU, instead of 32 threads on 32 GPUs.

For each seed, the training consisted in ~10k episodes (instead of 10^5 episodes per thread (32 threads) in the paper). The reason for our number of episodes choice is that, in our case, the learning seemed to have reached a threshold after ~7k episodes for all seeds.

Dataset

For our dataset we used profile pictures of our friends at 42 (software enginneering education), resized to 256x256 (to tweak the dataset to your own needs, see here).

Example of a run of the agent on the dataset (after training):

What the agent sees for the run above (after pre-processing):

Reward Curve

Here is the reward curve (one color per seed) after ~10k episodes (which took approximately 3 days to train) on FloydHub's CPU:

Additional results

I added a repo containing checkpoints (for tensorboard) for the 5 seeds here, and the corresponding curves for rewards, policy loss, entropy loss in this repository.

Configure the repo

Dataset

To tweak the dataset:

add your own images in data/brady_konkle_oliva2008/. Use the following names: 0001.png, 0002.png, etc. The sizes of the images must be 256x256, like in the original brady_konkle_oliva2008 dataset.
change DATASET_SIZE in game_scripts/datasets/brady_konkle_oliva2008.lua to your number of images.
change TRAIN_BATCH and TEST_BATCH to the number of images you will use respectively in train and test in game_scripts/levels/contributed/psychlab/factories/harlow_factory.lua.
change DATASET_SIZE in python/meta-rl/harlow.py to your number of images.
change DATASET_SIZE in python/meta-rl/meta_rl/ac_network.py to your number of images.

Linux/Python3 vs. MacOS/Python2.7

The DeepMind Lab release supports Python2.7, but you can find some documentation for Python3 here.

Currently, our branch python2 supports Python2.7 and MacOS, and our branch master supports Python3.6 and Linux.

The branch python2 should work on iMac's available at 42 (software engineering education).
The branch master was tested on FloydHub's instances (using Tensorflow 1.12 and CPU). To change for GPU, change tf.device("/cpu:0") with tf.device("/device:GPU:0") in harlow.py.

Dependencies

All the pip packages should be either installed on FloydHub or installed with install.sh.

However, if you want to run this repository on your machine, here are the requirements:

numpy==1.16.2
tensorflow==1.12.0
six==1.12.0
scipy==1.2.1
skimage==0.0
setuptools==40.8.0
Pillow==5.4.1

Credits

This work uses awjuliani's Meta-RL implementation.

I couldn't have done without my dear friend Kevin Costa, and the additional details provided kindly by Jane Wang.

Name		Name	Last commit message	Last commit date
Latest commit History 400 Commits
assets		assets
assets_oa		assets_oa
data		data
deepmind		deepmind
docs		docs
engine		engine
examples		examples
game_scripts		game_scripts
lua_tests		lua_tests
public		public
python		python
q3map2		q3map2
testing		testing
third_party		third_party
.floydexpt		.floydexpt
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
BUILD		BUILD
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
WORKSPACE		WORKSPACE
build.sh		build.sh
dmlab.lds		dmlab.lds
eigen.BUILD		eigen.BUILD
floyd.yml		floyd.yml
get_started_workspace.ipynb		get_started_workspace.ipynb
glib.BUILD		glib.BUILD
install.sh		install.sh
jpeg.BUILD		jpeg.BUILD
libxml.BUILD		libxml.BUILD
lua.BUILD		lua.BUILD
png.BUILD		png.BUILD
python.BUILD		python.BUILD
sdl.BUILD		sdl.BUILD
six.BUILD		six.BUILD
train.sh		train.sh
zlib.BUILD		zlib.BUILD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discussion

Getting Started

Useful commands

Directory structure

Results

Goal

Main differences

Dataset

Reward Curve

Additional results

Configure the repo

Dataset

Linux/Python3 vs. MacOS/Python2.7

Dependencies

Credits

About

Releases

Packages

Languages

License

mtrazzi/harlow

Folders and files

Latest commit

History

Repository files navigation

Discussion

Getting Started

Useful commands

Directory structure

Results

Goal

Main differences

Dataset

Reward Curve

Additional results

Configure the repo

Dataset

Linux/Python3 vs. MacOS/Python2.7

Dependencies

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages