Shifting labels for causal LM when using label smoother #17987

seungeunrho · 2022-07-01T17:25:10Z

What does this PR do?

When training CausalLM such as GPT2, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes misalignment of labels and corresponding input_ids. This commit is for resolving this misalignment.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sgugger

When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves huggingface#17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py

sgugger

Thanks a lot for your PR! Left a small comment and it should be good to merge.
Make sure to run make style on your branch to apply formatting.

src/transformers/trainer.py

HuggingFaceDocBuilderDev · 2022-07-01T17:35:28Z

The documentation is not available anymore as the PR was closed or merged.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…17987) * Shifting labels for causal LM when using label smoother When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves huggingface#17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger approved these changes Jul 1, 2022

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

Update trainer.py

295fc48

seungeunrho requested a review from sgugger July 1, 2022 18:13

Update src/transformers/trainer.py

e7b47ae

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger merged commit 6890d19 into huggingface:main Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shifting labels for causal LM when using label smoother #17987

Shifting labels for causal LM when using label smoother #17987

seungeunrho commented Jul 1, 2022 •

edited

Loading

sgugger left a comment

HuggingFaceDocBuilderDev commented Jul 1, 2022 •

edited

Loading

Shifting labels for causal LM when using label smoother #17987

Shifting labels for causal LM when using label smoother #17987

Conversation

seungeunrho commented Jul 1, 2022 • edited Loading

What does this PR do?

Before submitting

Who can review?

sgugger left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 1, 2022 • edited Loading

seungeunrho commented Jul 1, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 1, 2022 •

edited

Loading