-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GA loss bugs and add unit test #35121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! These solutions make sense and ran the tests myself. Left some nits for less-leeway on the test closeness. cc @ArthurZucker
tests/trainer/test_trainer.py
Outdated
diff_broken = [abs(base - grad) for base, grad in zip(base_loss_callback.losses, broken_loss_callback.losses)] | ||
|
||
# all diff truth should be quite close | ||
self.assertLess(max(diff_truth), 0.3, f"Difference {max(diff_truth)} is not within 0.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be a bit more aggressive and do 0.15 (this passes). I still feel that's quite big but I can't figure out why (my tests showed 0.001 should be doable).
self.assertLess(max(diff_truth), 0.3, f"Difference {max(diff_truth)} is not within 0.3") | |
self.assertLess(max(diff_truth), 0.15, f"Difference {max(diff_truth)} is not within 0.15") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange that I tested on both Mac and Windows that max(diff_truth)
is 0.144. So maybe 0.15 may failed on some other machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I use TinyStories to narrow down gap to the same as you. The code is submitted.
tests/trainer/test_trainer.py
Outdated
diff_broken = [abs(base - grad) for base, grad in zip(base_loss_callback.losses, broken_loss_callback.losses)] | ||
|
||
# all diff truth should be quite close | ||
self.assertLess(max(diff_truth), 0.3, f"Difference {max(diff_truth)} is not within 0.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertLess(max(diff_truth), 0.3, f"Difference {max(diff_truth)} is not within 0.3") | |
self.assertLess(max(diff_truth), 0.2, f"Difference {max(diff_truth)} is not within 0.2") |
Similarly we can be aggressive here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to reduce the gap to 1e-4 by padding all input labels to the same length. However, this method did not work for the GPT-2 model. I will continue to explore other solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @muellerzr but this does not solve:
FAILED examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_speech_recognition_seq2seq - TypeError: Wav2Vec2Model.forward() got an unexpected keyword argument 'num_items_in_batch'
so I am not sure I understand. Related to #35113 and #35128.
We can't merge with the broken test
@ArthurZucker The Wav2Vec2Model bug is because SpeechEncoderDecoderModel takes variable argument as forward paramters: transformers/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py Line 457 in c8c8dff
But it dispatches the paramater to it's encoder and decode which doesn't accept variable argument: transformers/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py Lines 489 to 493 in c8c8dff
I think the better solution is to modify it's decode to accept variable argument. I proposed a new commit and the test succeed. |
Finally all unit test passed. Please check again. @muellerzr @ArthurZucker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @techkang !
I did not dive enough on the test, my bad 🤗
Merging ASAP and doing the patch
set_seed(42) | ||
import datasets | ||
|
||
model_name = "distilgpt2" | ||
model_name = "nickypro/tinyllama-110M" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, would be nice if we had a safetensors model here but alright.
* fix GA bugs and add unit test * narrow down model loss unit test diff gap * format code to make ruff happy * send num_items_in_batch argument to decoder * fix GA loss bug in BertLMHeadModel * use TinyStories-33M to narrow down diff gap * fotmat code * missing .config * avoid add extra args --------- Co-authored-by: kangsheng <kangsheng@meituan.com>
What does this PR do?
There are two ways to fix GA loss bugs:
num_items_in_batch
in loss function defined by model. In this case,model_accepts_loss_kwargs
isTrue
.compute_loss_func
.However, previes unit test only test for the second condition. So I introduced a new unit test to cover the first condition and fix the bugs by the way.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@muellerzr @ArthurZucker