[`Mask2Former`] Move normalization for numerical stability #29542

amyeroberts · 2024-03-08T19:00:05Z

What does this PR do?

Moving the normalization before the matmul operation makes the calculation more stable and less likely to overflow.

Same differences introduce in #26086 which was closed after becoming stale.

src/transformers/models/mask2former/modeling_mask2former.py

src/transformers/models/oneformer/modeling_oneformer.py

HuggingFaceDocBuilderDev · 2024-03-08T19:25:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2024-03-13T08:22:16Z

src/transformers/models/mask2former/modeling_mask2former.py

+    cross_entropy_loss_pos = criterion(inputs, torch.ones_like(inputs)) / height_and_width
+    cross_entropy_loss_neg = criterion(inputs, torch.zeros_like(inputs)) / height_and_width

    loss_pos = torch.matmul(cross_entropy_loss_pos, labels.T)
    loss_neg = torch.matmul(cross_entropy_loss_neg, (1 - labels).T)
    loss = loss_pos + loss_neg


fine for me. But the name might be a bit misleading (I am not 100% sure): does criterion(inputs, torch.ones_like(inputs)) / height_and_width really representing cross_entropy_loss?

I am not checking the full context here, but personally, I might just do something like

cross_entropy_loss_pos = criterion(inputs, torch.ones_like(inputs)) cross_entropy_loss_neg = criterion(inputs, torch.zeros_like(inputs)) loss_pos = torch.matmul(cross_entropy_loss_pos / height_and_width, labels.T) loss_neg = torch.matmul(cross_entropy_loss_neg / height_and_width, (1 - labels).T)

to avoid (if any) possible confusion.

Good point! Used your suggestion

ydshieh

Thanks!

Remove useless x=x line

* Move normalization for numerical stability * Apply suggestions from code review Remove useless x=x line * PR comment - normalize later to preserve var name meaning

amyeroberts changed the title ~~Move normalization for numerical stability~~ [Mask2Former] Move normalization for numerical stability Mar 8, 2024

amyeroberts commented Mar 8, 2024

View reviewed changes

src/transformers/models/mask2former/modeling_mask2former.py Outdated Show resolved Hide resolved

amyeroberts commented Mar 8, 2024

View reviewed changes

src/transformers/models/oneformer/modeling_oneformer.py Outdated Show resolved Hide resolved

amyeroberts requested a review from ydshieh March 11, 2024 11:56

ydshieh reviewed Mar 13, 2024

View reviewed changes

ydshieh approved these changes Mar 13, 2024

View reviewed changes

amyeroberts added 3 commits March 13, 2024 16:15

Move normalization for numerical stability

04691a1

Apply suggestions from code review

fa5a867

Remove useless x=x line

PR comment - normalize later to preserve var name meaning

1a3c162

amyeroberts force-pushed the add-normalization-before-matmul branch from 51acd85 to 1a3c162 Compare March 13, 2024 16:16

amyeroberts merged commit 3b6e95e into huggingface:main Mar 13, 2024
18 checks passed

amyeroberts deleted the add-normalization-before-matmul branch March 13, 2024 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Mask2Former`] Move normalization for numerical stability #29542

[`Mask2Former`] Move normalization for numerical stability #29542

amyeroberts commented Mar 8, 2024

HuggingFaceDocBuilderDev commented Mar 8, 2024

ydshieh Mar 13, 2024

amyeroberts Mar 13, 2024

ydshieh left a comment

[Mask2Former] Move normalization for numerical stability #29542

[Mask2Former] Move normalization for numerical stability #29542

Conversation

amyeroberts commented Mar 8, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 8, 2024

ydshieh Mar 13, 2024

Choose a reason for hiding this comment

amyeroberts Mar 13, 2024

Choose a reason for hiding this comment

ydshieh left a comment

Choose a reason for hiding this comment

[`Mask2Former`] Move normalization for numerical stability #29542

[`Mask2Former`] Move normalization for numerical stability #29542