When wrapping T5 (from huggingfaces' transformers) with Pytorch-lightning, the loss changes. Meanwhile different max length (all of them are large than the actual length) of the source sentence lead to different loss value? #12533
Labels
needs triage
Waiting to be triaged by maintainers
🐛 Bug
Bug #1: Different losses when using T5 (from huggingfaces' transformers) with Pytorch-lightning or not. I think they should be the same.
Bug #2: Different losses when using different
max_input_len
(all of them are larger than actual sentence length) with pytorch-lightning, but huggingfaces' transformers could output the same loss.max_input_len
: here . The loss is the same.To Reproduce
See the description above.
Expected behavior
See the description above.
Environment
conda
,pip
, source): condatorch.__config__.show()
:Additional context
The text was updated successfully, but these errors were encountered: