[XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8402

mahmoud-abuzaina · 2024-01-11T21:38:04Z

This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass.

Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232

Imported from GitHub PR openxla/xla#8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbcd5ecf7a4b3cfd140abd9a73d193e9ca39 by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 PiperOrigin-RevId: 599132673

…recision Imported from GitHub PR openxla/xla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla/xla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83ef34455e53b83c02dc68cd6d71306e654 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83ef34455e53b83c02dc68cd6d71306e654 PiperOrigin-RevId: 617252815

…recision Imported from GitHub PR #10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 COPYBARA_INTEGRATE_REVIEW=#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83 PiperOrigin-RevId: 617460913

…recision Imported from GitHub PR openxla/xla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla/xla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83ef34455e53b83c02dc68cd6d71306e654 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 PiperOrigin-RevId: 617460913

…xcess_precision Imported from GitHub PR openxla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes openxla#10687 COPYBARA_INTEGRATE_REVIEW=openxla#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83 PiperOrigin-RevId: 617460913

Enable MatMul op in BF16

4f7ddbc

github-actions bot added the kokoro:force-run Forces CI to rerun label Jan 11, 2024

github-actions bot assigned penpornk Jan 11, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 11, 2024

kamaljeeti requested a review from golechwierowicz January 12, 2024 06:00

golechwierowicz requested review from ezhulenev and d0k and removed request for ezhulenev January 12, 2024 09:02

d0k approved these changes Jan 12, 2024

View reviewed changes

penpornk added ready to pull PR ready for merge process kokoro:force-run Forces CI to rerun labels Jan 15, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 15, 2024

copybara-service bot mentioned this pull request Jan 16, 2024

PR #8402: [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8507

Merged

copybara-service bot closed this in 1ecfc4f Jan 17, 2024

apivovarov mentioned this pull request Mar 19, 2024

[CPU] Add SimplifyFPConversions only if xla_allow_excess_precision #10687

Closed

copybara-service bot mentioned this pull request Mar 19, 2024

PR #10687: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision tensorflow/tensorflow#63985

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8402

[XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8402

mahmoud-abuzaina commented Jan 11, 2024

[XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8402

[XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8402

Conversation

mahmoud-abuzaina commented Jan 11, 2024