Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[TOD][Datasets][Easy] MetalWoz into ParlAI (User + System utterances) #4183

Merged
merged 73 commits into from
Dec 23, 2021

Conversation

moyapchen
Copy link
Contributor

@moyapchen moyapchen commented Nov 16, 2021

Code for process MetalWoz into System + User Simulator teachers

Getting it to be in the Conversations format is a pain, so I don't even try here. (It's documented this way in the User Simulator paper as well)


Datasets added in this substack:

  • Google SGD
    • Google SGD Simulation Splits (In-domain, Out-domain)
  • MetalWoz
  • MSR_E2E
  • Multidogo
  • MultiWoz V2.2
  • Taskmaster
  • Taskmaster2
  • Taskmaster3 (TicketTalk)

Test plan:
Regression test, parlai dd of dataset

Moya Chen added 17 commits November 15, 2021 20:15
See documentation block in `tod_agents.py`
As noted in the README, this agent takes data generated from `tod_world_script.py` and dumps it out to a teacher.

(Note that I tried setting up a regression test for this teacher, but I ran into issues getting it to save the output directory to not be something that included my local homedir name in it..)
See documentation block in `tod_agents.py`

(I'm not 100% sure if `conftest.py` is a right file to change, though I did notice that `pytest.ini` was necessary to get pytest to run.)
See documentation in `tod_world_script.py` for usage.
Refactor Google SGD away from old format into TOD Conversations format.

Datasets added in this substack:
* *Google SGD*
   * Google SGD Simulation Splits (In-domain, Out-domain)
* MetalWoz
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Code for processing Google SGD into In-domain and Out-domain data via `build.py`, using via agents.

Datasets added in this substack:
* Google SGD
   * **Google SGD Simulation Splits (In-domain, Out-domain)**
* MetalWoz
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Code for process MetalWoz into System + User Simulator teachers

Getting it to be in the Conversations format is a pain, so I don't even try here. (It's documented this way in the paper as well)

----------------------------
Datasets added in this substack:
* Google SGD
   * Google SGD Simulation Splits (In-domain, Out-domain)
* **MetalWoz**
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamped

Base automatically changed from simpler_tod_5b_google_sgd_sim_splits to main December 22, 2021 21:46
@moyapchen moyapchen merged commit 0720f4a into main Dec 23, 2021
@moyapchen moyapchen deleted the simpler_tod_5c_metalwoz branch December 23, 2021 01:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants