Add tests to Trainer #6605

sgugger · 2020-08-19T20:00:38Z

This PR moves the tests of the various data_collator in test_data_collator.py and adds tests of the Trainer on a simple regression problem. While testing, a few problems were uncovered:

The number of epochs is documented as a float but used as an int, fixed the documentation.
There was one more step done than specified by the argument max_steps.
The evaluation loss was wrong whenever the evaluation dataset length is not a round multiple of the batch size.

Those three things are also fixed in the PR. With the regression infrastructure, we can add more tests (for custom data collator, optimizers, schedulers etc...) since each training is fast. Will do in follow-up PRs as this one was starting to be of a decent size already.

codecov · 2020-08-19T20:03:05Z

Codecov Report

Merging #6605 into master will increase coverage by 0.08%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #6605      +/-   ##
==========================================
+ Coverage   80.21%   80.30%   +0.08%     
==========================================
  Files         156      156              
  Lines       28178    28205      +27     
==========================================
+ Hits        22604    22650      +46     
+ Misses       5574     5555      -19

Impacted Files	Coverage Δ
src/transformers/configuration_reformer.py	`100.00% <ø> (ø)`
src/transformers/modeling_bert.py	`88.42% <ø> (ø)`
src/transformers/modeling_utils.py	`88.05% <ø> (ø)`
src/transformers/training_args.py	`91.26% <ø> (+10.67%)`	⬆️
src/transformers/configuration_utils.py	`95.97% <100.00%> (+0.02%)`	⬆️
src/transformers/data/data_collator.py	`90.90% <100.00%> (ø)`
src/transformers/generation_tf_utils.py	`86.21% <100.00%> (-0.26%)`	⬇️
src/transformers/modeling_albert.py	`83.50% <100.00%> (+0.13%)`	⬆️
src/transformers/modeling_distilbert.py	`97.84% <100.00%> (+0.02%)`	⬆️
src/transformers/modeling_longformer.py	`92.02% <100.00%> (+0.07%)`	⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a86321...3f89b2d. Read the comment docs.

julien-c · 2020-08-19T20:09:57Z

Re. the eval loss, did you also run test_trainer_distributed.py on a multi-gpu machine?

sgugger · 2020-08-19T20:18:50Z

No, I don't have a multi-GPU machine setup. It does not seem like this test uses the eval_loss anywhere, it only computes a metric.

julien-c · 2020-08-19T20:26:35Z

No, I don't have a multi-GPU machine setup.

You can use the office machines! Not necessarily related to this PR but to keep in mind to run this test once in a while

sgugger · 2020-08-19T22:28:37Z

tests/test_trainer.py

+        return self.length
+
+    def __getitem__(self, i):
+        # Workaround the fact the default data collator wants tensors of ints except for the labels.


Thinking about this more, the default_data_collator should accept float tensors as inputs for input_embeds at least, so I could use this here.

sshleifer

randomly read this, nice tests!

sgugger · 2020-08-20T14:58:47Z

Not necessarily related to this PR but to keep in mind to run this test once in a while

It was indeed broken not due to this PR, fixed. Know how to run it periodically now :-)

* Add tests to Trainer * Test if removing long breaks everything * Remove ugly hack * Fix distributed test * Use float for number of epochs

This reverts commit bf0c455.

Add tests to Trainer

0423d28

sgugger requested review from julien-c and patrickvonplaten August 19, 2020 20:00

sgugger commented Aug 19, 2020

View reviewed changes

sshleifer approved these changes Aug 20, 2020

View reviewed changes

sgugger added 3 commits August 20, 2020 10:05

Test if removing long breaks everything

f7f040b

Remove ugly hack

558db32

Fix distributed test

5e60742

Use float for number of epochs

3f89b2d

sgugger merged commit 573bdb0 into master Aug 20, 2020

sgugger deleted the test_trainer branch August 20, 2020 15:13

Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020

Add tests to Trainer (huggingface#6605)

f061896

* Add tests to Trainer * Test if removing long breaks everything * Remove ugly hack * Fix distributed test * Use float for number of epochs

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "Add tests to Trainer (huggingface#6605)"

4137c11

This reverts commit bf0c455.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests to Trainer #6605

Add tests to Trainer #6605

sgugger commented Aug 19, 2020 •

edited

Loading

codecov bot commented Aug 19, 2020 •

edited

Loading

julien-c commented Aug 19, 2020

sgugger commented Aug 19, 2020

julien-c commented Aug 19, 2020

sgugger Aug 19, 2020

sshleifer left a comment

sgugger commented Aug 20, 2020

Add tests to Trainer #6605

Add tests to Trainer #6605

Conversation

sgugger commented Aug 19, 2020 • edited Loading

codecov bot commented Aug 19, 2020 • edited Loading

Codecov Report

julien-c commented Aug 19, 2020

sgugger commented Aug 19, 2020

julien-c commented Aug 19, 2020

sgugger Aug 19, 2020

Choose a reason for hiding this comment

sshleifer left a comment

Choose a reason for hiding this comment

sgugger commented Aug 20, 2020

sgugger commented Aug 19, 2020 •

edited

Loading

codecov bot commented Aug 19, 2020 •

edited

Loading