Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to Trainer #6605

Merged
merged 5 commits into from
Aug 20, 2020
Merged

Add tests to Trainer #6605

merged 5 commits into from
Aug 20, 2020

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Aug 19, 2020

This PR moves the tests of the various data_collator in test_data_collator.py and adds tests of the Trainer on a simple regression problem. While testing, a few problems were uncovered:

  • The number of epochs is documented as a float but used as an int, fixed the documentation.
  • There was one more step done than specified by the argument max_steps.
  • The evaluation loss was wrong whenever the evaluation dataset length is not a round multiple of the batch size.

Those three things are also fixed in the PR. With the regression infrastructure, we can add more tests (for custom data collator, optimizers, schedulers etc...) since each training is fast. Will do in follow-up PRs as this one was starting to be of a decent size already.

@codecov
Copy link

codecov bot commented Aug 19, 2020

Codecov Report

Merging #6605 into master will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6605      +/-   ##
==========================================
+ Coverage   80.21%   80.30%   +0.08%     
==========================================
  Files         156      156              
  Lines       28178    28205      +27     
==========================================
+ Hits        22604    22650      +46     
+ Misses       5574     5555      -19     
Impacted Files Coverage Δ
src/transformers/configuration_reformer.py 100.00% <ø> (ø)
src/transformers/modeling_bert.py 88.42% <ø> (ø)
src/transformers/modeling_utils.py 88.05% <ø> (ø)
src/transformers/training_args.py 91.26% <ø> (+10.67%) ⬆️
src/transformers/configuration_utils.py 95.97% <100.00%> (+0.02%) ⬆️
src/transformers/data/data_collator.py 90.90% <100.00%> (ø)
src/transformers/generation_tf_utils.py 86.21% <100.00%> (-0.26%) ⬇️
src/transformers/modeling_albert.py 83.50% <100.00%> (+0.13%) ⬆️
src/transformers/modeling_distilbert.py 97.84% <100.00%> (+0.02%) ⬆️
src/transformers/modeling_longformer.py 92.02% <100.00%> (+0.07%) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a86321...3f89b2d. Read the comment docs.

@julien-c
Copy link
Member

Re. the eval loss, did you also run test_trainer_distributed.py on a multi-gpu machine?

@sgugger
Copy link
Collaborator Author

sgugger commented Aug 19, 2020

No, I don't have a multi-GPU machine setup. It does not seem like this test uses the eval_loss anywhere, it only computes a metric.

@julien-c
Copy link
Member

No, I don't have a multi-GPU machine setup.

You can use the office machines! Not necessarily related to this PR but to keep in mind to run this test once in a while

return self.length

def __getitem__(self, i):
# Workaround the fact the default data collator wants tensors of ints except for the labels.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, the default_data_collator should accept float tensors as inputs for input_embeds at least, so I could use this here.

Copy link
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

randomly read this, nice tests!

@sgugger
Copy link
Collaborator Author

sgugger commented Aug 20, 2020

Not necessarily related to this PR but to keep in mind to run this test once in a while

It was indeed broken not due to this PR, fixed. Know how to run it periodically now :-)

@sgugger sgugger merged commit 573bdb0 into master Aug 20, 2020
@sgugger sgugger deleted the test_trainer branch August 20, 2020 15:13
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants