MiniWoB-starter-agent

This codebase implements an extension of the universe starter agent provided by OpenAI that is able to solve a number of MiniWoB environments.

Dependencies

Getting Started

conda create --name miniwob-starter-agent python=3.5
source activate miniwob-starter-agent

sudo apt-get install -y numpy golang libjpeg-turbo8-dev make tmux htop cmake golang libjpeg-dev
sudo apt-get install -y python-numpy python-dev zlib1g-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
pip install "gym[atari]"
pip install universe
pip install six
pip install tensorflow
conda install -y -c https://conda.binstar.org/menpo opencv3
conda install -y numpy
conda install -y scipy

sudo apt-get -y remove tmux
sudo apt-get install wget tar libevent-dev libncurses-dev
VERSION=2.6 && mkdir ~/tmux-src && wget -qO- https://github.com/tmux/tmux/releases/download/${VERSION}/tmux-${VERSION}.tar.gz | tar xvz -C ~/tmux-src && cd ~/tmux-src/tmux*
./configure && make -j"$(nproc)" && sudo make install
cd && rm -rf ~/tmux-src

pip install opencv-python

pip install numpy --upgrade --ignore-installed

Notice that it is important to build the newDockerImage and to replace the runtimes.yml file located at universe' installation path with the file inside this repository. There are also shell files available for an automatic installation on Ubuntu or MacOS.

MiniWoB ClickTest

python train.py --num-workers 2 --env-id wob.mini.ClickTest-v0 --log-dir /tmp/clicktest

The command above will train an agent on the ClickTest task through VNC protocol. It will see two workers that will be learning in parallel (--num-workers flag) and will output intermediate results into given directory.

The code will launch the following processes:

worker-0 - a process that runs policy gradient
worker-1 - a process identical to process-1, that uses different random noise from the environment
ps - the parameter server, which synchronizes the parameters among the different workers
tb - a tensorboard process for convenient display of the statistics of learning

Once you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing tmux a in the console. Once in the tmux session, you can see all your windows with ctrl-b w. To switch to window number 0, type: ctrl-b 0. Look up tmux documentation for more commands.

To access TensorBoard to see various monitoring metrics of the agent, open http://localhost:12345/ in a browser.

The VNC environments are hosted on the Google cloud and have an interface that's different from a conventional Atari Gym environment; luckily, with the help of several wrappers (which are used within envs.py file) the experience should be similar to the agent as if it was played locally. The problem itself is more difficult because the observations and actions are delayed due to the latency induced by the network.

More interestingly, you can also peek at what the agent is doing with a VNCViewer.

You can use your system viewer as open vnc://localhost:5900 (or open vnc://${docker_ip}:5900) or connect TurboVNC to that ip/port. VNC password is "openai".

Note that the default behavior of train.py is to start the remotes on a local machine. Take a look at https://github.com/openai/universe/blob/master/doc/remotes.rst for documentation on managing your remotes. Pass additional -r flag to point to pre-existing instances.

For best performance, it is recommended for the number of workers to not exceed available number of CPU cores.

You can stop the experiment with tmux kill-session command.

Atari and Flashgames

The agent within this extension is still able to operate on atari-environments. To enable the use of flashgame-environments the flashgames.json file must be added to universe's installation path.

Evaluation

To evaluate the performance of the algorithm it was compared to a random agent and a human player solving the MiniWoB tasks. Since this method is not capable of NLP most of the environments had to be excluded. Furthermore some other environments turned out to be unstable which made the evaluation of this task impossible, too. There were 25 Environments remaining for evaluation after all on which the agent was trained for 12 hours each, unless it was finished beforehand.

On 35% of the tested tasks the agent was able to compete with a human player, while it did even outperform a human player on 22% of the evaluated tasks.

The logs of this experiments can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
imgs		imgs
newDockerImage		newDockerImage
.gitignore		.gitignore
README.md		README.md
a3c.py		a3c.py
action_space.py		action_space.py
envs.py		envs.py
model.py		model.py
runtimes.yml		runtimes.yml
train.py		train.py
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniWoB-starter-agent

Dependencies

Getting Started

MiniWoB ClickTest

Atari and Flashgames

Evaluation

About

Releases

Packages

Languages

tritter1612/miniwob-starter-agent

Folders and files

Latest commit

History

Repository files navigation

MiniWoB-starter-agent

Dependencies

Getting Started

MiniWoB ClickTest

Atari and Flashgames

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages