-
Notifications
You must be signed in to change notification settings - Fork 2.1k
SaferDialogues parallel test and valid set with BAD dataset #4623
Conversation
70875d8
to
c458039
Compare
parlai/tasks/saferdialogues/build.py
Outdated
|
||
RESOURCES = [ | ||
DownloadableFile( | ||
f'http://parl.ai/downloads/saferdialogues/saferdialogues_{SAFERDIALOGUES_VERSION}.tar.gz', | ||
f'saferdialogues_{SAFERDIALOGUES_VERSION}.tar.gz', | ||
'7ecf062fdf3e0ddfd16c52998341f543331e0dbaa1fcdb27026e8b0b80446fed', | ||
'3d1bc731fb0c63d9f61a52b3d5f1aab0911cda1ca38f49822c79c18a2dc8e834', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will the new teacher be overriding the previous version (the one with the old valid and test split), I wonder if it would be nicer to keep both splits (one for reproducibility in your paper, and one here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it's not overriding it, the new file just includes all the data (original for reproducibility) and the new data for the new teacher that this PR introduces. The default task points to the original teacher + data from the paper. To use the new data where the splits are parallel with the BAD dataset, need to explicitly use SaferDialoguesBAD
teacher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great just one q
@@ -1,9 +1,10 @@ | |||
#!/usr/bin/env python3 | |||
|
|||
# Copyright (c) Meta Platforms, Inc. and affiliates. | |||
# Copyright (c) Facebook, Inc. and its affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to update our copyrights @stephenroller ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we do
@@ -1,86 +1,66 @@ | |||
acts: | |||
- - episode_done: true | |||
eval_labels: | |||
- I agree with that, they're kids after all. | |||
- i am also a fun maker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did the train/val/test values change in the original dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, it was just shuffled but should be updated so it's identical
Can you rebase/merge main in? |
77f513e
to
b8365cb
Compare
@@ -10,13 +10,13 @@ | |||
import parlai.core.build_data as build_data | |||
import os | |||
|
|||
SAFERDIALOGUES_VERSION = "v0.1" | |||
SAFERDIALOGUES_VERSION = "v0.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL what happened to version 0.2 ~ 0.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!! 🚀
Adding a new teacher
SaferDialoguesBAD
that has a different test and valid set that is parallel with the BAD dataset (train set is the same amongst both teachers)Added test for the new teacher, pytest passes.