Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix target_train_data_fraction overriding pretrain_data_fraction #1070

Merged
merged 10 commits into from
Apr 23, 2020

Conversation

pyeres
Copy link
Contributor

@pyeres pyeres commented Apr 16, 2020

This PR addresses the bug described in issue #1066 ("Results with pretrain_data_fraction args").

This PR...

  • Adds a new private field to the Task object (_instance_generators) to store instance generators with the appropriate phase-specific data fraction settings.
  • Adds new methods, getter get_instance_generator and setter set_instance_generator, to Task to access the _instance_generators.
  • Updates preprocessing, training, and evaluation code to use the new getter and setter.
  • Updates a test that uses evaluation logic to use the new getter and setter.
  • Adds tests for Task's that an exception is raised when training instance generator is requested without specifying the phase (because phase determines data fraction).

Validation: the config provided in issue #1066 was run with these code changes. Results logged for the task (MNLI) using a pretraining data fraction were more in line with expectation: after 25k steps training accuracy was ~100%, while val and test accuracy were ~75%.

@pyeres pyeres linked an issue Apr 16, 2020 that may be closed by this pull request
@pyeres pyeres changed the title [WIP] Fix target_train_data_fraction overriding pretrain_data_fraction Fix target_train_data_fraction overriding pretrain_data_fraction Apr 16, 2020
@pyeres pyeres marked this pull request as ready for review April 16, 2020 17:45
@HaokunLiu HaokunLiu mentioned this pull request Apr 16, 2020
Copy link
Contributor

@sleepinyourhat sleepinyourhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some naming issues that I'd recommend fixing, but the logic all looks reasonable. Merge when ready.

jiant/tasks/tasks.py Outdated Show resolved Hide resolved
jiant/trainer.py Outdated Show resolved Hide resolved
jiant/trainer.py Show resolved Hide resolved
@pyeres pyeres merged commit 14fae87 into master Apr 23, 2020
@pyeres pyeres deleted the py/fix_data_fraction branch April 23, 2020 17:46
@jeswan jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020
leo-liuzy pushed a commit to leo-liuzy/dynamic_jiant that referenced this pull request Nov 11, 2020
…-mll#1070)

* add private instance generator field to Task w/ getter and setter

* update build_tasks to use new instance generator Task field setter

* update trainer to get phase-appropriate instance generators

* update evaluate to use new Task instance generator getter

* update pred writing task to use new instance generator getter/setter

* add tests for new Task instance generator functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jiant-v1-legacy Relevant to versions <= v1.3.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Results with pretrain_data_fraction args
3 participants