Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[TOD][Dataset][Easyish] Google Simulation Splits #4182

Merged
merged 62 commits into from
Dec 22, 2021

Conversation

moyapchen
Copy link
Contributor

@moyapchen moyapchen commented Nov 16, 2021

Code for processing Google SGD into In-domain and Out-domain data via build.py, using via agents.

Datasets added in this substack:

  • Google SGD
    • Google SGD Simulation Splits (In-domain, Out-domain)
  • MetalWoz
  • MSR_E2E
  • Multidogo
  • MultiWoz V2.2
  • Taskmaster
  • Taskmaster2
  • Taskmaster3 (TicketTalk)

Test plan:
Regression test, parlai dd of dataset

Moya Chen added 16 commits November 15, 2021 20:15
See documentation block in `tod_agents.py`
As noted in the README, this agent takes data generated from `tod_world_script.py` and dumps it out to a teacher.

(Note that I tried setting up a regression test for this teacher, but I ran into issues getting it to save the output directory to not be something that included my local homedir name in it..)
See documentation block in `tod_agents.py`

(I'm not 100% sure if `conftest.py` is a right file to change, though I did notice that `pytest.ini` was necessary to get pytest to run.)
See documentation in `tod_world_script.py` for usage.
Refactor Google SGD away from old format into TOD Conversations format.

Datasets added in this substack:
* *Google SGD*
   * Google SGD Simulation Splits (In-domain, Out-domain)
* MetalWoz
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Code for processing Google SGD into In-domain and Out-domain data via `build.py`, using via agents.

Datasets added in this substack:
* Google SGD
   * **Google SGD Simulation Splits (In-domain, Out-domain)**
* MetalWoz
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth just doing one extra verification this reproduces the paper folds, just in case.

Stamped.

Base automatically changed from simpler_tod_5a_google_sgd to main December 22, 2021 18:08
…ta and new one; relized I was missing a +1 in the episode length count
@moyapchen
Copy link
Contributor Author

moyapchen commented Dec 22, 2021

Ran an eval on the new dataset with the old models; get the same numbers.

Also with parlai dd -t google_sgd_simulation_splits:OutDomainSystemTeacher

train: 12:53:39 | loaded 2303 episodes with a total of 60589 examples
valid: 12:52:55 | loaded 768 episodes with a total of 20324 examples
test: 12:52:23 | loaded 768 episodes with a total of 20130 examples
test (on UserSimulatorTeacher): 12:27:44 | loaded 768 episodes with a total of 9681 examples

vs `parlai dd -t fb:google_sgd_rl_splits:OutDomainSystemTeacher'

train: 12:22:41 | loaded 2303 episodes with a total of 60589 examples
valid: 12:23:44 | loaded 768 episodes with a total of 20324 examples
test: 12:24:02 | loaded 768 episodes with a total of 20130 examples

@moyapchen moyapchen merged commit 0848a71 into main Dec 22, 2021
@moyapchen moyapchen deleted the simpler_tod_5b_google_sgd_sim_splits branch December 22, 2021 21:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants