Skip to content

Sheng-J/DOM-Q-NET

Repository files navigation

DOM-Q-NET: Grounded RL on Structured Language

"DOM-Q-NET: Grounded RL on Structured Language" International Conference on Learning Representations (2019). Sheng Jia, Jamie Kiros, Jimmy Ba. [arxiv] [openreview]
Architecture

Demo

Trained multitask agent: https://www.youtube.com/watch?v=eGzTDIvX4IY
Facebook login: https://www.youtube.com/watch?v=IQytRUKmWhs&t=2s

Requirement

Need to download selenium & install chrome driver for selenium..

Installation

  1. Clone this repo
  2. Download MiniWoB++ environment from the original repo https://github.com/stanfordnlp/miniwob-plusplus
    and copy miniwob-plusplus/html folder to miniwob/html in this repo
  3. In fact, this html folder could be stored anywhere, but remember to perform one of the following actions:
  • Set environment variable "WOB_PATH" to
    file://"your-path-to-miniwob-plusplus"/html/miniwob
    E.g. "your-path-to-miniwob-plusplus" is "/h/sheng/DOM-Q-NET/miniwob
  • Directly modify the base_url on line 33 of instance.py to
    "your-path-to-miniwob-plusplus"/html/miniwob
    In my case, base_url='file:///h/sheng/DOM-Q-NET/miniwob/html/miniwob/'

Run experiment

Experiment launch files are stored under runs For example,

cd runs/hard2medium9tasks/
sh run1.sh

will launch a 11 multi-task (social-media search-engine login-user enter-password click-checkboxes click-option enter-dynamic-text enter-text email-inbox-delete click-tab-2 navigation-tree) experiment.

Multitask Assumptions

State & Action restrictions

Item Maximum number of items
DOM tree leaves (action space) 160
DOM tree 200
Instruction tokens 16

Attribute embeddings & vocabulary

Attribute max vocabulary Embedding dimension
Tag 100 16
Text (shared with instructions) 600 48
Class 100 16
  • UNKnown tokens
    These are assigned to a random vector such that the cosine distance with the text attribute can yield 1.0 for the direct alignment.

Acknowledgement

Credit to Dopamine for the implementation of prioritized replay used in dstructs/dopamine_segtree.py