Fix ORTTrainer failure on DeBERTa(base/v2/sew_d) fp16 training #18529

JingyaHuang · 2022-08-08T15:10:04Z

What does this PR do?

Context

It was reported in optimum huggingface/optimum#305 that the training on DeBERTa with optimum.onnxruntime.ORTTrainer is broken.
After investigation, the break comes from two causes:

At that time XDropOut didn't have a symbolic function. And it has been implemented by @garymm in support ONNX export of XDropout in deberta{,_v2} and sew_d #17502 and has been merged to the main of transformers.
The implementation of DeBERTa have some numpy/math operations that led to incorrect export. This will be fixed in Deberta V2: Fix critical trace warnings to allow ONNX export #18272.

However with those two fixes, the fp32 training will work, but the mixed-precision training will fail due to mismatched inputs dtype for some Matmul nodes. In #18272, some sqrt results are cast to fp32, and they need to be re-casted to fp16 before Matmul ops, and this PR is supposed to add the re-cast part.

Fixes #huggingface/optimum#305

Who can review?

@LysandreJik @patrickvonplaten @lewtun

* fix typos * fix sequence_length docs of LayoutLMv3Model * delete trailing white spaces * fix layoutlmv3 docs more * apply make fixup & quality * change to two versions of input docstring * apply make fixup & quality

@michaelbenayoun

…upport Opacus training (huggingface#18486) * changing BartLearnedPositionalEmbedding forward signature and references to it * removing debugging dead code (thanks style checker) * blackened modeling_bart file * removing copy inconsistencies via make fix-copies * changing references to copied signatures in Bart variants * make fix-copies once more * using expand over repeat (thanks @michaelbenayoun) * expand instead of repeat for all model copies Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>

* Create _config.py * Create _toctree.yml * Create index.mdx not sure about "du / ihr" oder "sie" * Create quicktour.mdx * Update _toctree.yml * Update build_documentation.yml * Update build_pr_documentation.yml * fix build * Update index.mdx * Update quicktour.mdx * Create installation.mdx * Update _toctree.yml

…face#18272) * Fix critical trace warnings to allow ONNX export * Force input to `sqrt` to be float type * Cleanup code * Remove unused import statement * Update model sew * Small refactor Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> * Use broadcasting instead of repeat * Implement suggestion Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> * Match deberta v2 changes in sew_d * Improve code quality * Update code quality * Consistency of small refactor * Match changes in sew_d Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

JingyaHuang · 2022-08-11T15:28:17Z

close as it turned to be too messy even after rebasing.

JingyaHuang and others added 30 commits July 19, 2022 16:16

Modify DeBERTa-v2 modeling

71d0185

Partly fix DeBERTa

e38019f

Merge branch 'huggingface:main' into fix-deberta-tracing

f9d60cd

revert v2

17ac3f4

revert v2

a6b409d

Fix DeBERTa-v2

f2c6090

Merge branch 'huggingface:main' into fix-deberta-tracing

a2354d5

Fix Deberta with PR in trfrs

d996118

Fix Deberta with PR in trfrs

a6b2fd4

add xdropout symbolic

d4d46dc

Merge branch 'huggingface:main' into fix-deberta-tracing

effc31f

Fix docstring

f3e1e42

Fix sew_d

1beef74

Correct dtype inside torch.sqrt

5643493

Merge branch 'huggingface:main' into fix-deberta-tracing

110546d

Merge branch 'huggingface:main' into fix-deberta-tracing

2e16211

Merge branch 'huggingface:main' into fix-deberta-tracing

49da99e

Merge branch 'huggingface:main' into fix-deberta-tracing

be80b54

Merge branch 'huggingface:main' into fix-deberta-tracing

f448029

Try pos_key_layer dtype

61175f5

Try pos_query_layer dtype

d136a54

Try query_layer dtype

371a5ca

Try query_layer dtype

e8a8e1d

Try query_layer dtype

8d7f23e

Merge branch 'huggingface:main' into fix-deberta-tracing

73b0a49

Revert matmul changes

1446018

Hacky test torch.half

fb97fb6

Test remove dtype

dc52a6d

Test matmul unified dtype

f5cdb7b

Test matmul unified dtype

ef4e1ba

pocca2048 and others added 15 commits August 11, 2022 14:31

Fix LayoutLMv3 documentation (huggingface#17932)

093969d

* fix typos * fix sequence_length docs of LayoutLMv3Model * delete trailing white spaces * fix layoutlmv3 docs more * apply make fixup & quality * change to two versions of input docstring * apply make fixup & quality

Skip broken tests

ab67979

Modify DeBERTa-v2 modeling

7fe3b35

revert v2

d20e35b

revert v2

eab3d18

Fix DeBERTa-v2

070b268

Fix Deberta with PR in trfrs

4aa9153

Fix Deberta with PR in trfrs

f41f2cb

add xdropout symbolic

ec086ad

Fix sew_d

a1b55ca

Fix copies of sew-d

1a6f0ea

Update DeBERTa fix

af584db

JingyaHuang changed the base branch from main to albertvillanova-patch-1 August 11, 2022 15:00

JingyaHuang changed the base branch from albertvillanova-patch-1 to main August 11, 2022 15:00

JingyaHuang added 5 commits August 11, 2022 15:13

Modify DeBERTa-v2 modeling

0595de0

revert v2

9900f3c

Fix Deberta with PR in trfrs

050ccc8

Test optimized fix

b1ac7ea

Update DeBERTa fix

b871846

JingyaHuang changed the base branch from main to albertvillanova-patch-1 August 11, 2022 15:15

JingyaHuang changed the base branch from albertvillanova-patch-1 to main August 11, 2022 15:15

JingyaHuang removed request for LysandreJik, lewtun and patrickvonplaten August 11, 2022 15:27

JingyaHuang closed this Aug 11, 2022

JingyaHuang deleted the fix-deberta-tracing branch August 22, 2022 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ORTTrainer failure on DeBERTa(base/v2/sew_d) fp16 training #18529

Fix ORTTrainer failure on DeBERTa(base/v2/sew_d) fp16 training #18529

JingyaHuang commented Aug 8, 2022

JingyaHuang commented Aug 11, 2022

Fix ORTTrainer failure on DeBERTa(base/v2/sew_d) fp16 training #18529

Fix ORTTrainer failure on DeBERTa(base/v2/sew_d) fp16 training #18529

Conversation

JingyaHuang commented Aug 8, 2022

What does this PR do?

Who can review?

JingyaHuang commented Aug 11, 2022