-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests to Trainer #6605
Add tests to Trainer #6605
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6605 +/- ##
==========================================
+ Coverage 80.21% 80.30% +0.08%
==========================================
Files 156 156
Lines 28178 28205 +27
==========================================
+ Hits 22604 22650 +46
+ Misses 5574 5555 -19
Continue to review full report at Codecov.
|
Re. the eval loss, did you also run test_trainer_distributed.py on a multi-gpu machine? |
No, I don't have a multi-GPU machine setup. It does not seem like this test uses the eval_loss anywhere, it only computes a metric. |
You can use the office machines! Not necessarily related to this PR but to keep in mind to run this test once in a while |
tests/test_trainer.py
Outdated
return self.length | ||
|
||
def __getitem__(self, i): | ||
# Workaround the fact the default data collator wants tensors of ints except for the labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, the default_data_collator
should accept float tensors as inputs for input_embeds
at least, so I could use this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randomly read this, nice tests!
It was indeed broken not due to this PR, fixed. Know how to run it periodically now :-) |
* Add tests to Trainer * Test if removing long breaks everything * Remove ugly hack * Fix distributed test * Use float for number of epochs
* Add tests to Trainer * Test if removing long breaks everything * Remove ugly hack * Fix distributed test * Use float for number of epochs
This reverts commit bf0c455.
This PR moves the tests of the various
data_collator
intest_data_collator.py
and adds tests of the Trainer on a simple regression problem. While testing, a few problems were uncovered:max_steps
.Those three things are also fixed in the PR. With the regression infrastructure, we can add more tests (for custom data collator, optimizers, schedulers etc...) since each training is fast. Will do in follow-up PRs as this one was starting to be of a decent size already.