Skip to content

A starter agent that can solve a number of MiniWoB environments

Notifications You must be signed in to change notification settings

tritter1612/miniwob-starter-agent

Repository files navigation

MiniWoB-starter-agent

This codebase implements an extension of the universe starter agent provided by OpenAI that is able to solve a number of MiniWoB environments.

MiniWoB

Dependencies

Getting Started

conda create --name miniwob-starter-agent python=3.5
source activate miniwob-starter-agent

sudo apt-get install -y numpy golang libjpeg-turbo8-dev make tmux htop cmake golang libjpeg-dev
sudo apt-get install -y python-numpy python-dev zlib1g-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
pip install "gym[atari]"
pip install universe
pip install six
pip install tensorflow
conda install -y -c https://conda.binstar.org/menpo opencv3
conda install -y numpy
conda install -y scipy

sudo apt-get -y remove tmux
sudo apt-get install wget tar libevent-dev libncurses-dev
VERSION=2.6 && mkdir ~/tmux-src && wget -qO- https://github.com/tmux/tmux/releases/download/${VERSION}/tmux-${VERSION}.tar.gz | tar xvz -C ~/tmux-src && cd ~/tmux-src/tmux*
./configure && make -j"$(nproc)" && sudo make install
cd && rm -rf ~/tmux-src

pip install opencv-python

pip install numpy --upgrade --ignore-installed

Notice that it is important to build the newDockerImage and to replace the runtimes.yml file located at universe' installation path with the file inside this repository. There are also shell files available for an automatic installation on Ubuntu or MacOS.

MiniWoB ClickTest

python train.py --num-workers 2 --env-id wob.mini.ClickTest-v0 --log-dir /tmp/clicktest

The command above will train an agent on the ClickTest task through VNC protocol. It will see two workers that will be learning in parallel (--num-workers flag) and will output intermediate results into given directory.

The code will launch the following processes:

  • worker-0 - a process that runs policy gradient
  • worker-1 - a process identical to process-1, that uses different random noise from the environment
  • ps - the parameter server, which synchronizes the parameters among the different workers
  • tb - a tensorboard process for convenient display of the statistics of learning

Once you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing tmux a in the console. Once in the tmux session, you can see all your windows with ctrl-b w. To switch to window number 0, type: ctrl-b 0. Look up tmux documentation for more commands.

To access TensorBoard to see various monitoring metrics of the agent, open http://localhost:12345/ in a browser.

tensorboardIdentifyShape

The VNC environments are hosted on the Google cloud and have an interface that's different from a conventional Atari Gym environment; luckily, with the help of several wrappers (which are used within envs.py file) the experience should be similar to the agent as if it was played locally. The problem itself is more difficult because the observations and actions are delayed due to the latency induced by the network.

More interestingly, you can also peek at what the agent is doing with a VNCViewer.

ClickTestVNC

You can use your system viewer as open vnc://localhost:5900 (or open vnc://${docker_ip}:5900) or connect TurboVNC to that ip/port. VNC password is "openai".

Note that the default behavior of train.py is to start the remotes on a local machine. Take a look at https://github.com/openai/universe/blob/master/doc/remotes.rst for documentation on managing your remotes. Pass additional -r flag to point to pre-existing instances.

For best performance, it is recommended for the number of workers to not exceed available number of CPU cores.

You can stop the experiment with tmux kill-session command.

Atari and Flashgames

The agent within this extension is still able to operate on atari-environments. To enable the use of flashgame-environments the flashgames.json file must be added to universe's installation path.

Evaluation

To evaluate the performance of the algorithm it was compared to a random agent and a human player solving the MiniWoB tasks. Since this method is not capable of NLP most of the environments had to be excluded. Furthermore some other environments turned out to be unstable which made the evaluation of this task impossible, too. There were 25 Environments remaining for evaluation after all on which the agent was trained for 12 hours each, unless it was finished beforehand.

Evaluation

On 35% of the tested tasks the agent was able to compete with a human player, while it did even outperform a human player on 22% of the evaluated tasks.

IdentifyShape

The logs of this experiments can be found here.

About

A starter agent that can solve a number of MiniWoB environments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published