Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Commit

Permalink
finished the readme file.
Browse files Browse the repository at this point in the history
  • Loading branch information
mojtaba-komeili committed Aug 12, 2021
1 parent e17af41 commit 01f3666
Showing 1 changed file with 10 additions and 7 deletions.
17 changes: 10 additions & 7 deletions parlai/crowdsourcing/tasks/wizard_of_internet/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Wizard of Internet

NOTE: this is still WIP: some of the related modules are not here yet.
This is the crowdsourcing task from the Internet-Augmented Dialogue Generation paper ([link](https://arxiv.org/abs/2107.07566)).
It uses [Mephisto](https://github.com/facebookresearch/Mephisto) platform to collect dialogue data using human workers on Amazon Mechanical Turk.

* NOTE: the filter for talking about pandemic is removed.
* Note update the wait times for onboarding task `TUTORIAL_WAIT_TIMES`
* TODO: make onboarding state numberings consistent between python and react.
* Functions output type
* Explain template persona vs curated persona.
* Mention the files needed as part of the conf/yaml file (remember `data` folder is in `.gitignore`)
## How to use
Haivng setup your ParlAI and Mephisto environment properly (make sure you can run Mephisto demos), you should be able to run this task easily. Most of the cofigurations for running task are in `conf/dev.yaml` file. Note the files needed in the `data` directory:
*sample_personas.txt* and *sample_locations* are needed to create the curated personas.

You need to have a functional search server running, and sets its address in `search_server` in the `conf/dev.yaml` file. You may set the server up to search internet or any knowledge source of your choosing.
This server responds to the search requests sent by the worker who takes *wizard* role during this task:
It receieves a json with two keys: `q` and `n`, which are a string that is the search query, and an integer that is the number of pages to return, respectively.
It sends its response also as a json under a key named `response` which has a list of documents retirieved for the recieved search query. Each document is a mapping (dictionary) of *string->string* with at least 3 fields: `url`, `title`, and `content` (see [SearchEngineRetriever](https://github.com/facebookresearch/ParlAI/blob/70ee4a2c63008774fc9e66a8392847554920a14d/parlai/agents/rag/retrieve_api.py#L73) for more info on how this task interacts with the search server).

0 comments on commit 01f3666

Please sign in to comment.