[TIR, Relay] improve bfloat16 support #2

yangulei · 2022-01-20T07:44:32Z

Motivation:

We are enabling bfloat16 in BYOC-oneDNN following the path: [float32 graph] --> <AMP> --> [bfloat16 graph] --> <BYOC> --> [TVM + oneDNN module]. While some of the Passes like FoldConstant can not work for bfloat16 before the improvements below.

Changes:

Add runtime datatype dispatch and skip asserts for uint16 for bfloat16 compatibility.
Add bfloat16 casting for unary intrinsic operators to enable the graph optimization.
Improve the bf16_legalize module to enable bfloat16 lowering.

With those improvements, a float32 graph could be converted to bfloat16 through AMP, and then be lowered to inference in bfloat16 mode now.

Tested Models (gluoncv):

ResNet<18/34/50/101/152>_v1b
VGG<11/13/16/19>
VGG<11/13/16/19>_bn
DenseNet121
InceptionV3

By tested I mean I confirm it did some transformation on the graph and a forward pass could be run on CPU and matches the fp32 output somewhat. I have nothing on performance metrics or other devices yet.

As @AndrewZhaoLuo said at apache#8069

Pending:

The support for bfloat16 in BYOC-oneDNN is based on multi-blocking layout transform and the extensions on BYOC-oneDNN and pending.

ZhennanQin · 2022-01-20T08:33:44Z

.gitignore

@@ -11,7 +11,10 @@ __pycache__/
 .Python
 env/
 build/
+build_debug/


Please don't change this. You can change it locally, but don't upsteam.

ok, I'll fix this.

ZhennanQin · 2022-01-20T08:34:42Z

include/tvm/tir/op.h

+    static const Op& op = Op::Get("tir." #OpName);                 \
+    if (x.dtype().is_bfloat16()) {                                 \
+      DataType srcType = x.dtype();                                \
+      DataType dstType(kDLFloat, 32, srcType.lanes()); \


Make those \ in a row.

ZhennanQin · 2022-01-20T08:36:52Z

python/tvm/relay/transform/mixed_precision.py

@@ -40,6 +40,8 @@
    "nn.conv3d_transpose",
    "nn.dense",
    "nn.batch_matmul",
+    "nn.bias_add",


Not sure if we can change this default list. Better to have another CPU list, otherwise you need to evaluate the impact to NV hardware.

ZhennanQin · 2022-01-20T08:43:05Z

tests/python/relay/test_cpp_build_module.py

@@ -126,3 +155,4 @@ def test_fp16_conversion(target, dev):
    test_basic_build()
    test_fp16_build()
    test_fp16_conversion()


Do we need to add test_bf16_conversion as fp16?

I think it's already done at https://github.com/apache/tvm/blob/14d0187ce9cefc41e33aa30b55c08a75a6711732/tests/python/unittest/test_target_codegen_llvm.py#L718-L742

ZhennanQin · 2022-01-27T09:22:24Z

LGTM!

yangulei · 2022-04-01T10:57:42Z

Thanks for reviewing, this PR has been merged to the official repo.

* Revert "[skip ci] Revert "[ci] Default to n=2 for test parallelism (apache#12376)" (apache#12413)" This reverts commit 478b672. * [ci] Default to n=2 for test parallelism This is attempt #2 of apache#12376 which was reverted in apache#12413. The changes in `plugin.py` should keep all the tests on the same node so sporadic failures don't happen due to scheduling. Co-authored-by: driazati <driazati@users.noreply.github.com>

yangulei added 10 commits January 13, 2022 10:00

update AMP table to enable ResNet50 conversion

539e5e0

add runtime datatype dispatch for BFloat16

9fd4dd9

skip asserts for uint16 for bf16 compatibility

8bf6144

add bf16 cast for the unary intrinsic operators

a786001

enable "bf16<-->fp32<-->any dtype" casting

c959702

support inconsistent input for bf16 BIOP legalize

7e77f56

add treatments for bfloat16 in if statements

8e0766c

add bfloat16 dtype casts in binary OP

12fb7b1

delete unnecessary treatments for bfloat16

9e32fde

add test for bfloat16 building

6c29073

yangulei changed the title ~~improve bfloat16 support~~ [TIR, Relay] improve bfloat16 support Jan 20, 2022

ZhennanQin reviewed Jan 20, 2022

View reviewed changes

yangulei added 3 commits January 21, 2022 08:15

code style

07ea8de

restore the modifications in .gitignore

5583b86

restore the changes to AMP lists

a526fee

yangulei added 3 commits February 10, 2022 09:37

fix typos

a1a3c31

fix lint errors

689eead

fix typo

09a66f4

yangulei closed this Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR, Relay] improve bfloat16 support #2

[TIR, Relay] improve bfloat16 support #2

yangulei commented Jan 20, 2022 •

edited

Loading

ZhennanQin Jan 20, 2022

yangulei Jan 20, 2022

ZhennanQin Jan 20, 2022

yangulei Jan 20, 2022

ZhennanQin Jan 20, 2022

ZhennanQin Jan 20, 2022

yangulei Jan 20, 2022 •

edited

Loading

ZhennanQin commented Jan 27, 2022

yangulei commented Apr 1, 2022

[TIR, Relay] improve bfloat16 support #2

[TIR, Relay] improve bfloat16 support #2

Conversation

yangulei commented Jan 20, 2022 • edited Loading

Motivation:

Changes:

Tested Models (gluoncv):

Pending:

ZhennanQin Jan 20, 2022

Choose a reason for hiding this comment

yangulei Jan 20, 2022

Choose a reason for hiding this comment

ZhennanQin Jan 20, 2022

Choose a reason for hiding this comment

yangulei Jan 20, 2022

Choose a reason for hiding this comment

ZhennanQin Jan 20, 2022

Choose a reason for hiding this comment

ZhennanQin Jan 20, 2022

Choose a reason for hiding this comment

yangulei Jan 20, 2022 • edited Loading

Choose a reason for hiding this comment

ZhennanQin commented Jan 27, 2022

yangulei commented Apr 1, 2022

yangulei commented Jan 20, 2022 •

edited

Loading

yangulei Jan 20, 2022 •

edited

Loading