This repository has been archived by the owner on Nov 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e17af41
commit 01f3666
Showing
1 changed file
with
10 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,13 @@ | ||
# Wizard of Internet | ||
|
||
NOTE: this is still WIP: some of the related modules are not here yet. | ||
This is the crowdsourcing task from the Internet-Augmented Dialogue Generation paper ([link](https://arxiv.org/abs/2107.07566)). | ||
It uses [Mephisto](https://github.com/facebookresearch/Mephisto) platform to collect dialogue data using human workers on Amazon Mechanical Turk. | ||
|
||
* NOTE: the filter for talking about pandemic is removed. | ||
* Note update the wait times for onboarding task `TUTORIAL_WAIT_TIMES` | ||
* TODO: make onboarding state numberings consistent between python and react. | ||
* Functions output type | ||
* Explain template persona vs curated persona. | ||
* Mention the files needed as part of the conf/yaml file (remember `data` folder is in `.gitignore`) | ||
## How to use | ||
Haivng setup your ParlAI and Mephisto environment properly (make sure you can run Mephisto demos), you should be able to run this task easily. Most of the cofigurations for running task are in `conf/dev.yaml` file. Note the files needed in the `data` directory: | ||
*sample_personas.txt* and *sample_locations* are needed to create the curated personas. | ||
|
||
You need to have a functional search server running, and sets its address in `search_server` in the `conf/dev.yaml` file. You may set the server up to search internet or any knowledge source of your choosing. | ||
This server responds to the search requests sent by the worker who takes *wizard* role during this task: | ||
It receieves a json with two keys: `q` and `n`, which are a string that is the search query, and an integer that is the number of pages to return, respectively. | ||
It sends its response also as a json under a key named `response` which has a list of documents retirieved for the recieved search query. Each document is a mapping (dictionary) of *string->string* with at least 3 fields: `url`, `title`, and `content` (see [SearchEngineRetriever](https://github.com/facebookresearch/ParlAI/blob/70ee4a2c63008774fc9e66a8392847554920a14d/parlai/agents/rag/retrieve_api.py#L73) for more info on how this task interacts with the search server). |