Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[TOD][Datasets][Easy] MetalWoz into ParlAI (User + System utterances) #4183

Merged
merged 73 commits into from
Dec 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
e365e48
[TOD] Core converesation structure, serialization, const tokens
Nov 15, 2021
c939174
[Tod] Agents, teacher metrics, and tests for these
Nov 16, 2021
3bf655f
[TOD] Tod json structure to teacher task
Nov 16, 2021
6cb4b86
[TOD] Core converesation structure, serialization, const tokens
Nov 15, 2021
1480def
fix test by adding init folder
Nov 16, 2021
de84801
[Tod] Agents, teacher metrics, and tests for these
Nov 16, 2021
638eb28
[TOD] World, world metrics, script, tests
Nov 16, 2021
0e3f492
hmmm... hoping stacks don't bite me. (change that was kept in upper d…
Nov 16, 2021
0643a62
Merge branch 'simpler_tod_1_core_only' into simpler_tod_2_agents_teac…
Nov 16, 2021
37aced2
minor, remove commented out print
Nov 16, 2021
4f91279
Merge branch 'simpler_tod_2_agents_teachers' into simpler_tod_3_world
Nov 16, 2021
b05930f
comment
Nov 16, 2021
5086e85
more comment updates (not sure if it actually helps clarity..)
Nov 16, 2021
1e30035
Merge branch 'simpler_tod_3_world' into simpler_tod_4_tod_json
Nov 16, 2021
9a25fc5
[TOD][Dataset][Easy] Google SGD in TOD Conversations format
Nov 16, 2021
faa2356
[TOD][Dataset][Easyish] Google Simulation Splits
Nov 16, 2021
9426997
[TOD][Datasets][Easy] MetalWoz
Nov 16, 2021
51ed1a9
Merge branch 'main' into simpler_tod_1_core_only
Nov 16, 2021
a6508be
Merge branch 'simpler_tod_1_core_only' into simpler_tod_2_agents_teac…
Nov 16, 2021
eebc36b
Merge branch 'simpler_tod_2_agents_teachers' into simpler_tod_3_world
Nov 16, 2021
3675781
use same version of black as in the pre-commit hook
Nov 16, 2021
086c91c
Merge branch 'simpler_tod_2_agents_teachers' into simpler_tod_3_world
Nov 16, 2021
0bc961e
use same version of black as in the pre-commit hook
Nov 16, 2021
ed26407
Merge branch 'simpler_tod_3_world' into simpler_tod_4_tod_json
Nov 16, 2021
677df09
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5a_google_sgd
Nov 16, 2021
24ee898
black with version from pre-commit hook
Nov 16, 2021
3ca7ae3
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5a_google_sgd
Nov 16, 2021
3145e0e
Shouldn't worry about tod_json being in task_list
Nov 16, 2021
1b2a3fb
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5a_google_sgd
Nov 16, 2021
f44b17b
add to task list; run lint with right version of black
Nov 16, 2021
43474c4
Merge branch 'simpler_tod_5a_google_sgd' into simpler_tod_5b_google_s…
Nov 16, 2021
d290ecd
Merge branch 'simpler_tod_5b_google_sgd_sim_splits' into simpler_tod_…
Nov 17, 2021
7c3ccf5
lint with right version
Nov 17, 2021
dfc4989
Merge branch 'main' into simpler_tod_2_agents_teachers
Nov 29, 2021
2f15448
address eric comments; add new readme + more documentation
Nov 30, 2021
abd1c7e
Merge branch 'simpler_tod_2_agents_teachers' into simpler_tod_3_world
Nov 30, 2021
5d0197d
minor wording change
Nov 30, 2021
39792a8
Merge branch 'simpler_tod_2_agents_teachers' into simpler_tod_3_world
Nov 30, 2021
76bfa89
add more documtnation to world tests (following comment on teacher te…
Nov 30, 2021
73c5c7a
minor comment update
Nov 30, 2021
f6acccb
Merge branch 'simpler_tod_3_world' into simpler_tod_4_tod_json
Nov 30, 2021
dc4b70e
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5a_google_sgd
Nov 30, 2021
1299b68
Merge branch 'simpler_tod_5a_google_sgd' into simpler_tod_5b_google_s…
Nov 30, 2021
58965d3
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5c_metalwoz
Nov 30, 2021
55aa3ca
Merge branch 'simpler_tod_5b_google_sgd_sim_splits' into simpler_tod_…
Nov 30, 2021
7ab9d70
update to respect actual count of episodes (I think this might have i…
Dec 1, 2021
c6c728d
Merge branch 'main' into simpler_tod_2_agents_teachers
Dec 1, 2021
b3283d0
Merge branch 'simpler_tod_2_agents_teachers' into simpler_tod_3_world
Dec 1, 2021
85ab0fd
Merge branch 'simpler_tod_3_world' into simpler_tod_4_tod_json
Dec 1, 2021
0969aa1
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5a_google_sgd
Dec 1, 2021
1869cee
Merge branch 'simpler_tod_5a_google_sgd' into simpler_tod_5b_google_s…
Dec 1, 2021
609f930
Merge branch 'simpler_tod_5b_google_sgd_sim_splits' into simpler_tod_…
Dec 1, 2021
0580ff0
Merge branch 'main' into simpler_tod_2_agents_teachers
Dec 2, 2021
e00accf
Merge branch 'simpler_tod_2_agents_teachers' into simpler_tod_3_world
Dec 2, 2021
701da8d
Merge branch 'simpler_tod_3_world' into simpler_tod_4_tod_json
Dec 2, 2021
d519dc2
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5a_google_sgd
Dec 2, 2021
c7c1c64
Merge branch 'simpler_tod_5a_google_sgd' into simpler_tod_5b_google_s…
Dec 2, 2021
828f44f
Merge branch 'simpler_tod_5b_google_sgd_sim_splits' into simpler_tod_…
Dec 2, 2021
9466144
regen after changing tod teacher logic to respect episode/examples le…
Dec 2, 2021
1392d99
regen after changing tod teacher logic to respect episode/examples le…
Dec 2, 2021
71b5af8
Merge branch 'simpler_tod_5a_google_sgd' into simpler_tod_5b_google_s…
Dec 2, 2021
9da65a6
Merge branch 'simpler_tod_5b_google_sgd_sim_splits' into simpler_tod_…
Dec 2, 2021
7b24acf
Merge branch 'main' into simpler_tod_3_world
Dec 18, 2021
e3fa063
Merge branch 'simpler_tod_3_world' into simpler_tod_4_tod_json
Dec 18, 2021
2384563
Merge branch 'simpler_tod_4_tod_json' into simpler_tod_5a_google_sgd
Dec 18, 2021
d9ba7e4
Merge branch 'main' into simpler_tod_5a_google_sgd
Dec 22, 2021
acd6ffe
not sure why this comment keeps not being merged correctly ugh...
Dec 22, 2021
a753a6d
Merge branch 'simpler_tod_5a_google_sgd' into simpler_tod_5b_google_s…
Dec 22, 2021
66d8bf8
Merge branch 'simpler_tod_5b_google_sgd_sim_splits' into simpler_tod_…
Dec 22, 2021
0f49cb5
noticed a different in episode lengths between old version of this da…
Dec 22, 2021
0fb3ecb
Merge branch 'main' into simpler_tod_5b_google_sgd_sim_splits
Dec 22, 2021
66e09ee
Merge branch 'simpler_tod_5b_google_sgd_sim_splits' into simpler_tod_…
Dec 22, 2021
00ae154
Merge branch 'main' into simpler_tod_5c_metalwoz
Dec 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 84 additions & 33 deletions parlai/tasks/metalwoz/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,74 +4,125 @@
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

from parlai.core.opt import Opt
from parlai.core.params import ParlaiParser
from parlai.core.teachers import DialogTeacher
from parlai.utils.io import PathManager
from parlai.utils.data import DatatypeHelper
from .build import build
import os
import pandas as pd
import hashlib
from typing import Optional


class MetalWozTeacher(DialogTeacher):
class MetalWozTeacherBase(DialogTeacher):
@classmethod
def add_cmdline_args(
cls, parser: ParlaiParser, partial_opt: Optional[Opt] = None
) -> ParlaiParser:
super().add_cmdline_args(parser, partial_opt)
parser.add_argument(
"--metalwoz-domains", nargs="+", help="Use only a subset of the domains"
)
return parser

def _path(self, opt):
fold = DatatypeHelper.fold(opt['datatype'])
if fold == 'train' or fold == 'valid':
folder = os.path.join(opt['datapath'], 'metalwoz', 'train')
fold = DatatypeHelper.fold(opt["datatype"])
if fold == "train" or fold == "valid":
folder = os.path.join(opt["datapath"], "metalwoz", "train")
else:
folder = os.path.join(opt['datapath'], 'metalwoz', 'test')
folder = os.path.join(opt["datapath"], "metalwoz", "test")
return folder, fold

def __init__(self, opt, shared=None):
if shared is None:
build(opt)
folder, fold = self._path(opt)
self.fold = fold
opt['datafile'] = os.path.join(folder, fold)
opt["datafile"] = os.path.join(folder, fold)
super().__init__(opt, shared)

def _hash(self, string):
return int(hashlib.sha1(string.encode('utf-8')).hexdigest(), 16) % 10

def setup_data(self, datapath):
def load_data(self, datapath):
folder, fold = os.path.split(datapath)
with PathManager.open(os.path.join(folder, 'tasks.txt')) as taskf:
with PathManager.open(os.path.join(folder, "tasks.txt")) as taskf:
tasks_table = pd.read_json(taskf, lines=True)

dfolder = os.path.join(folder, 'dialogues')
dfolder = os.path.join(folder, "dialogues")

data = []

for filename in PathManager.ls(dfolder):
domain = filename.replace(".txt", "")
if (
self.opt["metalwoz_domains"]
and domain not in self.opt["metalwoz_domains"]
):
continue
fullfn = os.path.join(dfolder, filename)
with PathManager.open(fullfn) as dataf:
data.append(pd.read_json(dataf, lines=True))
lines = pd.read_json(dataf, lines=True)
lines = lines.merge(tasks_table, on="task_id")
data.append(lines.to_dict("records"))

data = pd.concat(data, axis=0)
data = data.sample(frac=1.0, random_state=83741) # metal in l33t numbers, lol
data = data.merge(tasks_table, on='task_id')
data['fold'] = data['domain_x'].apply(self._hash)
# Quick check to make sure we didn't fat-finger the spelling of some domain
if self.opt["metalwoz_domains"]:
assert len(data) == len(self.opt["metalwoz_domains"])

for _, row in data.iterrows():
if fold == 'valid' and row['fold'] != 9:
continue
if fold == 'train' and row['fold'] == 9:
continue
texts = [row['bot_role']] + list(row['turns'])
if "test" in self.fold:
flat = []
for domain in data:
flat.extend(domain)
return flat

return DatatypeHelper.split_subset_data_by_fold(self.fold, data, 0.8, 0.1, 0.1)


class SystemTeacher(MetalWozTeacherBase):
def setup_data(self, datapath):
data = self.load_data(datapath)
for row in data:
texts = [row["bot_role"]] + list(row["turns"])
prompts, labels = texts[::2], texts[1::2]
for i, (prompt, label) in enumerate(zip(prompts, labels)):
yield {
'text': prompt,
'label': label,
'bot_role': row['bot_role'],
'bot_prompt': row['bot_prompt'],
'user_role': row['user_role'],
'user_prompt': row['user_prompt'],
'utterance_id': row['id'],
'domain': row['domain_x'],
'task_id': row['task_id'],
"text": prompt,
"label": label,
"bot_role": row["bot_role"],
"bot_prompt": row["bot_prompt"],
"user_role": row["user_role"],
"user_prompt": row["user_prompt"],
"utterance_id": row["id"],
"domain": row["domain_x"],
"task_id": row["task_id"],
}, i == 0


class UserSimulatorTeacher(MetalWozTeacherBase):
def setup_data(self, datapath):
data = self.load_data(datapath)
for row in data:
texts = list(row["turns"])
prompts, labels = (
[f"{row['user_role']}\n{texts[0]}"] + texts[2::2],
texts[1::2],
)
for i, (prompt, label) in enumerate(zip(prompts, labels)):
yield {
"text": prompt,
"label": label,
"bot_role": row["bot_role"],
"bot_prompt": row["bot_prompt"],
"user_role": row["user_role"],
"user_prompt": row["user_prompt"],
"utterance_id": row["id"],
"domain": row["domain_x"],
"task_id": row["task_id"],
}, i == 0


class MetalWozTeacher(SystemTeacher):
pass


class DefaultTeacher(MetalWozTeacher):
pass
26 changes: 13 additions & 13 deletions parlai/tasks/metalwoz/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,32 @@

RESOURCES = [
DownloadableFile(
'https://download.microsoft.com/download/E/B/8/EB84CB1A-D57D-455F-B905-3ABDE80404E5/metalwoz-v1.zip',
'metalwoz-v1.zip',
'2a2ae3b25760aa2725e70bc6480562fa5d720c9689a508d28417631496d6764f',
"https://download.microsoft.com/download/E/B/8/EB84CB1A-D57D-455F-B905-3ABDE80404E5/metalwoz-v1.zip",
"metalwoz-v1.zip",
"2a2ae3b25760aa2725e70bc6480562fa5d720c9689a508d28417631496d6764f",
),
DownloadableFile(
'https://download.microsoft.com/download/0/c/4/0c4a8893-cbf9-4a43-a44a-09bab9539234/metalwoz-test-v1.zip',
'metalwoz-test-v1.zip',
'6722d1d9ec05334dd801972767ae3bdefcd15f71bf73fea1d098f214a96a7c6c',
"https://download.microsoft.com/download/0/c/4/0c4a8893-cbf9-4a43-a44a-09bab9539234/metalwoz-test-v1.zip",
"metalwoz-test-v1.zip",
"6722d1d9ec05334dd801972767ae3bdefcd15f71bf73fea1d098f214a96a7c6c",
),
]


def build(opt):
dpath = os.path.join(opt['datapath'], 'metalwoz')
version = '1.0'
dpath = os.path.join(opt["datapath"], "metalwoz")
version = "1.0"

if not build_data.built(dpath, version_string=version):
if build_data.built(dpath):
build_data.remove_dir(dpath)
build_data.make_dir(dpath)
build_data.make_dir(os.path.join(dpath, 'train', 'dialogues'))
build_data.make_dir(os.path.join(dpath, 'test', 'dialogues'))
build_data.make_dir(os.path.join(dpath, "train", "dialogues"))
build_data.make_dir(os.path.join(dpath, "test", "dialogues"))

# Download the data.
RESOURCES[0].download_file(os.path.join(dpath, 'train'))
RESOURCES[1].download_file(os.path.join(dpath, 'test'))
RESOURCES[0].download_file(os.path.join(dpath, "train"))
RESOURCES[1].download_file(os.path.join(dpath, "test"))

build_data.untar(os.path.join(dpath, 'test'), 'dstc8_metalwoz_heldout.zip')
build_data.untar(os.path.join(dpath, "test"), "dstc8_metalwoz_heldout.zip")
build_data.mark_done(dpath, version_string=version)
4 changes: 4 additions & 0 deletions parlai/tasks/metalwoz/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,7 @@

class TestDefaultTeacher(AutoTeacherTest):
task = "metalwoz"


class TestUserSimulatorTeacher(AutoTeacherTest):
task = "metalwoz:UserSimulatorTeacher"
78 changes: 78 additions & 0 deletions parlai/tasks/metalwoz/test/metalwoz_UserSimulatorTeacher_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
acts:
- - bot_prompt: Offer to help the user book a flight to Greece.
bot_role: You are a bot designed to book flights
domain: BOOKING_FLIGHT
episode_done: false
eval_labels:
- Hello, I want to book a flight.
id: metalwoz:UserSimulatorTeacher
task_id: b5ca362f
text: 'You are interacting with a bot designed to book flights

Hello how may I help you?'
user_prompt: You would like to know how to get a flight to Greece. If the bot
starts booking you a flight to Greece, tell them you were only curious and that
you do not wish to actually book the flight.
user_role: You are interacting with a bot designed to book flights
utterance_id: e0868745
- - bot_prompt: Offer to help the user book a flight to Greece.
bot_role: You are a bot designed to book flights
domain: BOOKING_FLIGHT
episode_done: false
eval_labels:
- Yes that's where I want to go.
id: metalwoz:UserSimulatorTeacher
task_id: b5ca362f
text: Can I help you book a flight to greece?
user_prompt: You would like to know how to get a flight to Greece. If the bot
starts booking you a flight to Greece, tell them you were only curious and that
you do not wish to actually book the flight.
user_role: You are interacting with a bot designed to book flights
utterance_id: e0868745
- - bot_prompt: Offer to help the user book a flight to Greece.
bot_role: You are a bot designed to book flights
domain: BOOKING_FLIGHT
episode_done: false
eval_labels:
- I want to leave from Dallas.
id: metalwoz:UserSimulatorTeacher
task_id: b5ca362f
text: Perfect, and when would you like to leave?
user_prompt: You would like to know how to get a flight to Greece. If the bot
starts booking you a flight to Greece, tell them you were only curious and that
you do not wish to actually book the flight.
user_role: You are interacting with a bot designed to book flights
utterance_id: e0868745
- - bot_prompt: Offer to help the user book a flight to Greece.
bot_role: You are a bot designed to book flights
domain: BOOKING_FLIGHT
episode_done: false
eval_labels:
- I want to leave by next Friday.
id: metalwoz:UserSimulatorTeacher
task_id: b5ca362f
text: Alright, i have your flight leaving from dallas and landing in athens. When
would you like to fly there?
user_prompt: You would like to know how to get a flight to Greece. If the bot
starts booking you a flight to Greece, tell them you were only curious and that
you do not wish to actually book the flight.
user_role: You are interacting with a bot designed to book flights
utterance_id: e0868745
- - bot_prompt: Offer to help the user book a flight to Greece.
bot_role: You are a bot designed to book flights
domain: BOOKING_FLIGHT
episode_done: true
eval_labels:
- Wait please don't do that, I was only curious. I don't actually want to book
a flight
id: metalwoz:UserSimulatorTeacher
task_id: b5ca362f
text: Alright booking your flight between now and next friday. You will be alerted
when your flight is book and vased on the lowest available price.
user_prompt: You would like to know how to get a flight to Greece. If the bot
starts booking you a flight to Greece, tell them you were only curious and that
you do not wish to actually book the flight.
user_role: You are interacting with a bot designed to book flights
utterance_id: e0868745
num_episodes: 2319
num_examples: 14067
70 changes: 70 additions & 0 deletions parlai/tasks/metalwoz/test/metalwoz_UserSimulatorTeacher_train.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
acts:
- - bot_prompt: Tell the user you don't have the information they are asking for
bot_role: You are a bot designed to fetch information from the internet
domain: LOOK_UP_INFO
episode_done: false
id: metalwoz:UserSimulatorTeacher
labels:
- I would like to ask a question Can you provide general information?
task_id: 4fdf58c3
text: 'You are interacting with a bot designed to fetch information from the internet

Hello how may I help you?'
user_prompt: Ask the bot a question about common world knowledge
user_role: You are interacting with a bot designed to fetch information from the
internet
utterance_id: 2454253d
- - bot_prompt: Tell the user you don't have the information they are asking for
bot_role: You are a bot designed to fetch information from the internet
domain: LOOK_UP_INFO
episode_done: false
id: metalwoz:UserSimulatorTeacher
labels:
- What is the meaning of life?
task_id: 4fdf58c3
text: 'yes'
user_prompt: Ask the bot a question about common world knowledge
user_role: You are interacting with a bot designed to fetch information from the
internet
utterance_id: 2454253d
- - bot_prompt: Tell the user you don't have the information they are asking for
bot_role: You are a bot designed to fetch information from the internet
domain: LOOK_UP_INFO
episode_done: false
id: metalwoz:UserSimulatorTeacher
labels:
- Does life have meaning?
task_id: 4fdf58c3
text: Sorry, I don't have the information that you are asking.
user_prompt: Ask the bot a question about common world knowledge
user_role: You are interacting with a bot designed to fetch information from the
internet
utterance_id: 2454253d
- - bot_prompt: Tell the user you don't have the information they are asking for
bot_role: You are a bot designed to fetch information from the internet
domain: LOOK_UP_INFO
episode_done: false
id: metalwoz:UserSimulatorTeacher
labels:
- How many Oreos can a gorilla eat?
task_id: 4fdf58c3
text: Sorry, I don't have the information that you are asking.
user_prompt: Ask the bot a question about common world knowledge
user_role: You are interacting with a bot designed to fetch information from the
internet
utterance_id: 2454253d
- - bot_prompt: Tell the user you don't have the information they are asking for
bot_role: You are a bot designed to fetch information from the internet
domain: LOOK_UP_INFO
episode_done: true
id: metalwoz:UserSimulatorTeacher
labels:
- Does my butt look big?
task_id: 4fdf58c3
text: Sorry, I don't have the information that you are asking.
user_prompt: Ask the bot a question about common world knowledge
user_role: You are interacting with a bot designed to fetch information from the
internet
utterance_id: 2454253d
num_episodes: 30287
num_examples: 185359
Loading