When wrapping T5 (from huggingfaces' transformers) with Pytorch-lightning, the loss changes. Meanwhile different max length (all of them are large than the actual length) of the source sentence lead to different loss value? #12533

SinclairCoder · 2022-03-30T15:09:56Z

🐛 Bug

Bug #1: Different losses when using T5 (from huggingfaces' transformers) with Pytorch-lightning or not. I think they should be the same.

Test the result of huggingfaces' transformers: here
Test the result of pytorch-lightning: here

Bug #2: Different losses when using different max_input_len (all of them are larger than actual sentence length) with pytorch-lightning, but huggingfaces' transformers could output the same loss.

Test the results of huggingfaces' transformers using different max_input_len: here . The loss is the same.
Test the result of pytorch-lightning: here. The loss is significantly different from each other.

To Reproduce

See the description above.

Expected behavior

See the description above.

Environment

PyTorch Lightning Version (e.g., 1.5.0): 1.6.0
PyTorch Version (e.g., 1.10): 1.10.0
Python version (e.g., 3.9): 3.7.11
OS (e.g., Linux): Linux
CUDA/cuDNN version: 11.3
GPU models and configuration: NVIDIA RTX 3090
How you installed PyTorch (conda, pip, source): conda
If compiling from source, the output of torch.__config__.show():
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

SinclairCoder added the needs triage Waiting to be triaged by maintainers label Mar 30, 2022

SinclairCoder closed this as completed Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When wrapping T5 (from huggingfaces' transformers) with Pytorch-lightning, the loss changes. Meanwhile different max length (all of them are large than the actual length) of the source sentence lead to different loss value? #12533

When wrapping T5 (from huggingfaces' transformers) with Pytorch-lightning, the loss changes. Meanwhile different max length (all of them are large than the actual length) of the source sentence lead to different loss value? #12533

SinclairCoder commented Mar 30, 2022

When wrapping T5 (from huggingfaces' transformers) with Pytorch-lightning, the loss changes. Meanwhile different max length (all of them are large than the actual length) of the source sentence lead to different loss value? #12533

When wrapping T5 (from huggingfaces' transformers) with Pytorch-lightning, the loss changes. Meanwhile different max length (all of them are large than the actual length) of the source sentence lead to different loss value? #12533

Comments

SinclairCoder commented Mar 30, 2022

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context