Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[task] Fix Cornell Movies #3627

Merged
merged 3 commits into from
Apr 30, 2021
Merged

[task] Fix Cornell Movies #3627

merged 3 commits into from
Apr 30, 2021

Conversation

stephenroller
Copy link
Contributor

Patch description
Cornell movies wasn't properly respecting odd/even conversations, and would produce turns with no labels. This patch fixes that. It also takes the opportunity to migrate the teacher to DialogTeacher, away from our deprecated format.

Manually confirmed that the data folds are the same, and the "less data" now was just blank lines.

Also fix a small typo in TA that surfaced the bug in the first place.

Fixes #3626.

Testing steps
Updated teacher test, and manual train loop

@stephenroller stephenroller requested a review from spencerp April 28, 2021 12:48
@stephenroller stephenroller changed the title Fix Cornell Movies [task] Fix Cornell Movies Apr 28, 2021
Copy link
Contributor

@spencerp spencerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to add build back before merging.

Also, have there been any published results that used the duplicate data?

parlai/tasks/cornell_movie/agents.py Show resolved Hide resolved
text: The "real you".
num_episodes: 8310
num_examples: 16759
num_examples: 13914
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya but the difference didn't have labels, so they weren't being used for metrics anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting. So it was just eating compute but not directly affecting the model training. Presumably this was effectively reducing the batch size, then.

@stephenroller stephenroller merged commit c757eb0 into master Apr 30, 2021
@stephenroller stephenroller deleted the cornellfix branch April 30, 2021 13:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to train any agent with conell_movie dataset
3 participants