In environment footage, captured by human player
⚠ Note: This is the code for my article Meta-Reinforcement Learning on FloydHub. This repository is for DeepMind Lab and the Harlow task environment. For the git submodule containing all the tensorflow code and the DeepMind Lab wrapper, see this repository. For the two-step task see this repository instead.⚠
Here, we try to reproduce the simulations regarding the harlow task as described in the two papers:
- Learning to Reinforcement Learn, Wang et al., 2016
- Prefrontal cortex as a meta-reinforcement learning system, Wang et al., 2018
To reproduce the Harlow Task, we used DeepMind Lab, a 3D learning environment that provides a suite of challenging 3D navigation and puzzle-solving tasks for learning agents. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning. For more info about DeepMind Lab, you can checkout their repo here.
I answer questions and give more informations here:
- Clone the repository:
$ git clone https://github.com/mtrazzi/harlow.git
- Change current directory to
harlow/python
:
$ cd harlow/python
- Fetch the git submodule
meta_rl
:
python$ git submodule init
python$ git submodule update --remote
- Change current directory to the root of the repo:
python$ cd ..
-
Make sure that everything is correctly setup in
WORKSPACE
andpython.BUILD
(cf. section Configure the repo). -
Install dependencies and build with bazel:
harlow$ sh install.sh
harlow$ sh build.sh
- Train the meta-RL Agent:
harlow$ sh train.sh
To run the harlow environment as human, run:
harlow$ bazel run :game -- --level_script=contributed/psychlab/harlow
For a live example of the harlow agent, run
harlow$ bazel run :python_harlow --define graphics=sdl --incompatible_remove_native_http_archive=false -- --level_script contributed/psychlab/harlow --width=640 --height=480
harlow
├── WORKSPACE
├── python.BUILD
└── python
├── meta-rl
│ ├── harlow.py
│ └── meta_rl
│ ├── worker.py
│ └── ac_network.py
└── dmlab_module.c
└── data
└── brady_konkle_oliva2008
├── 0001.png
├── 0002.png
└── README.md
Most of our code is in the repository python
where you'll find our git submodule meta-rl
, that contains the three most important files: harlow.py
, meta_rl/worker.py
and meta_rl/ac_network.py
.
⚠ For more details about those three important files, check out the README of the meta-rl
repo, that also contains more information about the different architectures we tried and the on-going projects. ⚠
Apart from that, the other essential files are:
- The Lua script for the Harlow Task Environment:
game_scripts/levels/contributed/psychlab/factories/harlow_factory.lua
- The file
dmlab_module.c
that creates a Python API to use DeepMind Lab. - The folder data/brady_konkle_oliva2008 that you can tweak to use your own dataset using the instructions in Dataset (the instructions in the README.md are to download the data for this paper).
We tried to reproduce the results from Prefrontal cortex as a meta-reinforcement learning system (see Simulation 5 in Methods). We launched n=5 trainings using 5 different seeds, with the same hyperparameters as the paper, to compare to the results obtained by Wang et al.
- we removed the CNN.
- we replaced the stacked LSTM with a 48 units LSTM (same as for the Harlow task).
- we drastically reduced the action space so that the agent could do only left and right actions (that would directly target the center of the image.
- we add artificial
NO-OPS
to force the agent to fixate for multiple frames (and to remove the noise at the beginning of an episode). - we used a dataset of 42 pictures, instead of 1000 images samples form ImageNet.
- we used only 1 thread, on one CPU, instead of 32 threads on 32 GPUs.
For each seed, the training consisted in ~10k episodes (instead of 10^5 episodes per thread (32 threads) in the paper). The reason for our number of episodes choice is that, in our case, the learning seemed to have reached a threshold after ~7k episodes for all seeds.
For our dataset we used profile pictures of our friends at 42 (software enginneering education), resized to 256x256 (to tweak the dataset to your own needs, see here).
Example of a run of the agent on the dataset (after training):
What the agent sees for the run above (after pre-processing):
Here is the reward curve (one color per seed) after ~10k episodes (which took approximately 3 days to train) on FloydHub's CPU:
I added a repo containing checkpoints (for tensorboard) for the 5 seeds here, and the corresponding curves for rewards, policy loss, entropy loss in this repository.
To tweak the dataset:
- add your own images in data/brady_konkle_oliva2008/. Use the following names:
0001.png
,0002.png
, etc. The sizes of the images must be 256x256, like in the originalbrady_konkle_oliva2008
dataset. - change
DATASET_SIZE
ingame_scripts/datasets/brady_konkle_oliva2008.lua
to your number of images. - change
TRAIN_BATCH
andTEST_BATCH
to the number of images you will use respectively in train and test ingame_scripts/levels/contributed/psychlab/factories/harlow_factory.lua
. - change
DATASET_SIZE
inpython/meta-rl/harlow.py
to your number of images. - change
DATASET_SIZE
inpython/meta-rl/meta_rl/ac_network.py
to your number of images.
The DeepMind Lab release supports Python2.7, but you can find some documentation for Python3 here.
Currently, our branch python2
supports Python2.7 and MacOS, and our branch master
supports Python3.6 and Linux.
- The branch
python2
should work on iMac's available at 42 (software engineering education). - The branch
master
was tested on FloydHub's instances (usingTensorflow 1.12
andCPU
). To change forGPU
, changetf.device("/cpu:0")
withtf.device("/device:GPU:0")
inharlow.py
.
All the pip packages should be either installed on FloydHub or installed with install.sh
.
However, if you want to run this repository on your machine, here are the requirements:
numpy==1.16.2
tensorflow==1.12.0
six==1.12.0
scipy==1.2.1
skimage==0.0
setuptools==40.8.0
Pillow==5.4.1
This work uses awjuliani's Meta-RL implementation.
I couldn't have done without my dear friend Kevin Costa, and the additional details provided kindly by Jane Wang.