TF: XLA stable softmax #16892

gante · 2022-04-22T14:09:16Z

What does this PR do?

As discussed in the thread about XLA problems (#16838), this PR adds a stable wrapper for the softmax operation, and replaces tf.nn.softmax by the wrapped function.

This PR:

Adds the wrapped softmax, named stable_softmax, in tf_utils.py. Its docstring includes why it is needed and why the new operation is valid;
Adds tests to the wrapped softmax, including XLA tests;
Replaces tf.nn.softmax by stable_softmax everywhere except in the doctests (I think it overcomplicates the examples, and no XLA should be needed there);
Removes the skipIf for XLA tests, as they can now be successfully executed in a CPU.

Closes #16838

HuggingFaceDocBuilderDev · 2022-04-22T14:23:12Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Great find for the bug and thanks a lot for fixing all models!

src/transformers/tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

src/transformers/tf_utils.py

ydshieh · 2022-04-22T15:44:33Z

tests/test_modeling_tf_common.py

+        xla_out = xla_masked_softmax(x, boolean_mask)
+        out = masked_softmax(x, boolean_mask)
+        assert tf.experimental.numpy.allclose(xla_out, out)
+


Could we have a test for bacth_size > 1 ?

Added batch size > 1 👍

ydshieh

Good for me, just left 2 nits.
(didn't check the changes in TFGPT2, TFT5 tests though. Let me know if you prefer me to check those too.)

Thank you, @gante 💯

Rocketknight1 · 2022-04-22T15:51:23Z

This looks good to me! Do you think it would be better to change stable_softmax to only add the offset if we're running on CPU? It makes very little difference either way, but we could hide the complexity of that inside stable_softmax and keep our code paths entirely unchanged on GPU. I'm not certain, though - since it's such a small change maybe we can just do it everywhere.

ydshieh · 2022-04-22T15:57:44Z

This looks good to me! Do you think it would be better to change stable_softmax to only add the offset if we're running on CPU? It makes very little difference either way, but we could hide the complexity of that inside stable_softmax and keep our code paths entirely unchanged on GPU. I'm not certain, though - since it's such a small change maybe we can just do it everywhere.

Good point! Hope this won't affect tests on GPU (at least not for PT/TF equivalence which use 1e-5). Let's see!

gante · 2022-04-22T16:11:38Z

@Rocketknight1 @ydshieh if you run the test and print the difference between stable_softmax and tf.nn.softmax, the difference is exactly 0.0 -- I don't think we need to worry about that :D

ydshieh · 2022-04-22T18:12:37Z

@gante With this, do we still have issues regarding sampling in generate(). Sorry, I didn't really follow that issue about sampling, but would like to know a bit more 😄

gante · 2022-04-22T18:15:33Z

@ydshieh after this fix, the errors related to generate() are gone -- they were caused by the forward pass in the models, which in turn were caused by the issue this PR solves

ydshieh · 2022-04-22T18:36:19Z

(I might be completely wrong below)

I could imagine that we (will) have tests like:

testing non-XLA and XLA generte() that use sampling
- even with this PR, the differences of output logits between these two might still be as large as, say, 1e-3?
- if so, the sampling might give different sampling results ..?
- if not, what's the magnitude of the diff we get after this PR?
testing PT and TF generte() that use sampling
- so same potential issue as above ..?

Thanks 🙏

ydshieh · 2022-04-22T18:39:33Z

OK, I saw your previous comment

I've spun up an Nvidia T4 ( = no tf32 format) and got an error < 1e-5 for all cases

Rocketknight1 · 2022-04-25T12:17:18Z

Based on the testing results, I'm happy for this to be merged now! If this is an XLA bug, though, we should make sure to revert our changes once none of the TF versions we support are affected by it anymore.

Should we add a TODO to the masked_softmax function or a reminder somewhere to make sure that we document why this change is here, and when it can be removed?

patrickvonplaten

Great - the solution is clean!

gante · 2022-04-25T18:33:23Z

@Rocketknight1 added a TODO with instructions related to when to deprecate 👍

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

gante added 2 commits April 22, 2022 11:38

add stable softmax

8f48420

replace tf.nn.softmax by stable_softmax

9aad2a6

gante added 2 commits April 22, 2022 14:34

move import

ff7631c

fix examples

5c5903a

gante marked this pull request as ready for review April 22, 2022 15:27

gante requested review from sgugger, patrickvonplaten, Rocketknight1 and ydshieh April 22, 2022 15:27

sgugger approved these changes Apr 22, 2022

View reviewed changes

src/transformers/tf_utils.py Outdated Show resolved Hide resolved

src/transformers/tf_utils.py Outdated Show resolved Hide resolved

src/transformers/tf_utils.py Outdated Show resolved Hide resolved

Apply suggestions from code review

f8870d2

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ydshieh reviewed Apr 22, 2022

View reviewed changes

src/transformers/tf_utils.py Outdated Show resolved Hide resolved

better fn docstring

4c87fd9

ydshieh reviewed Apr 22, 2022

View reviewed changes

test batch size > 1

ecdacd9

ydshieh approved these changes Apr 22, 2022

View reviewed changes

Rocketknight1 approved these changes Apr 22, 2022

View reviewed changes

patrickvonplaten approved these changes Apr 25, 2022

View reviewed changes

add todo

872edca

gante merged commit e03966e into huggingface:main Apr 25, 2022

gante deleted the stable_softmax branch April 25, 2022 19:10

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

TF: XLA stable softmax (huggingface#16892)

c3f1305

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF: XLA stable softmax #16892

TF: XLA stable softmax #16892

gante commented Apr 22, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 22, 2022 •

edited

Loading

sgugger left a comment

ydshieh Apr 22, 2022

gante Apr 22, 2022

ydshieh left a comment •

edited

Loading

Rocketknight1 commented Apr 22, 2022

ydshieh commented Apr 22, 2022

gante commented Apr 22, 2022

ydshieh commented Apr 22, 2022

gante commented Apr 22, 2022

ydshieh commented Apr 22, 2022

ydshieh commented Apr 22, 2022

Rocketknight1 commented Apr 25, 2022

patrickvonplaten left a comment

gante commented Apr 25, 2022

TF: XLA stable softmax #16892

TF: XLA stable softmax #16892

Conversation

gante commented Apr 22, 2022 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 22, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

ydshieh Apr 22, 2022

Choose a reason for hiding this comment

gante Apr 22, 2022

Choose a reason for hiding this comment

ydshieh left a comment • edited Loading

Choose a reason for hiding this comment

Rocketknight1 commented Apr 22, 2022

ydshieh commented Apr 22, 2022

gante commented Apr 22, 2022

ydshieh commented Apr 22, 2022

gante commented Apr 22, 2022

ydshieh commented Apr 22, 2022

ydshieh commented Apr 22, 2022

Rocketknight1 commented Apr 25, 2022

patrickvonplaten left a comment

Choose a reason for hiding this comment

gante commented Apr 25, 2022

gante commented Apr 22, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 22, 2022 •

edited

Loading

ydshieh left a comment •

edited

Loading