-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8402
Closed
mahmoud-abuzaina
wants to merge
1
commit into
openxla:main
from
Intel-tensorflow:mabuzain/enable-bf16-matmul
Closed
[XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type #8402
mahmoud-abuzaina
wants to merge
1
commit into
openxla:main
from
Intel-tensorflow:mabuzain/enable-bf16-matmul
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
golechwierowicz
requested review from
ezhulenev and
d0k
and removed request for
ezhulenev
January 12, 2024 09:02
d0k
approved these changes
Jan 12, 2024
penpornk
added
ready to pull
PR ready for merge process
kokoro:force-run
Forces CI to rerun
labels
Jan 15, 2024
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 16, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 16, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 16, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 16, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 16, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 16, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 17, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
that referenced
this pull request
Jan 17, 2024
Imported from GitHub PR #8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbc by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 FUTURE_COPYBARA_INTEGRATE_REVIEW=#8402 from Intel-tensorflow:mabuzain/enable-bf16-matmul 4f7ddbc PiperOrigin-RevId: 598823232
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Jan 17, 2024
Imported from GitHub PR openxla/xla#8402 This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass. Copybara import of the project: -- 4f7ddbcd5ecf7a4b3cfd140abd9a73d193e9ca39 by Mahmoud Abuzaina <mahmoud.abuzaina@intel.com>: Enable MatMul op in BF16 Merging this change closes #8402 PiperOrigin-RevId: 599132673
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Mar 19, 2024
…recision Imported from GitHub PR openxla/xla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla/xla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83ef34455e53b83c02dc68cd6d71306e654 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83ef34455e53b83c02dc68cd6d71306e654 PiperOrigin-RevId: 617252815
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Mar 19, 2024
…recision Imported from GitHub PR openxla/xla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla/xla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83ef34455e53b83c02dc68cd6d71306e654 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83ef34455e53b83c02dc68cd6d71306e654 PiperOrigin-RevId: 617252815
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Mar 20, 2024
…recision Imported from GitHub PR openxla/xla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla/xla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83ef34455e53b83c02dc68cd6d71306e654 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83ef34455e53b83c02dc68cd6d71306e654 PiperOrigin-RevId: 617252815
copybara-service bot
pushed a commit
that referenced
this pull request
Mar 20, 2024
…recision Imported from GitHub PR #10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 COPYBARA_INTEGRATE_REVIEW=#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83 PiperOrigin-RevId: 617460913
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Mar 20, 2024
…recision Imported from GitHub PR openxla/xla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla/xla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83ef34455e53b83c02dc68cd6d71306e654 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes #10687 PiperOrigin-RevId: 617460913
steeve
pushed a commit
to zml/xla
that referenced
this pull request
Aug 30, 2024
…xcess_precision Imported from GitHub PR openxla#10687 Several weeks ago it was a change which enables "simplify-fp-conversions" pass in cpu_compiler.cc for intel cpus unconditionally. [PR-8402](openxla#8402) - [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type I noticed the following issue with having "simplify-fp-conversions" pass in cpu_compiler.cc enabled unconditionally. My model uses bf16 operators (e.g. convolution). I want to jit compile and run it on CPU preserving intermediate bf16 accuracy. Cpu compiler uses`float-normalization-bf16` pass which converts bf16 convolution to f32_convolution + convert_to_bf16 + convert_to_f32. (because typical cpu does not support bf16 computation) Cpu compiler (on XEON) also uses `simplify-fp-conversions` pass which simplifies `f32_convolution + convert_to_bf16 + convert_to_f32` to just `f32_convolution`. As the result - the whole model was converted to f32 precision internally and conversion to bf16 happens only at the very end. In some cases we want to execute bf16 model on CPU but get results with accuracy similar to the case when it is executed on bf16 hardware. To control the accuracy we can use debug_option `xla_allow_excess_precision` By default it is true - hence, `simplify-fp-conversions` pass is enabled. If we need to emulate bf16 computation on intel cpu we can set `XLA_FLAGS="--xla_allow_excess_precision=false"` - in this case `simplify-fp-conversions` will not be added to cpu_compiler pipeline. f32 ops results will be converted to bf16 immediately. This will preserve bf16 accuracy internally. [gpu_compiler.cc](https://github.com/openxla/xla/blob/main/xla/service/gpu/gpu_compiler.cc#L1359) already enables `SimplifyFPConversions` pass only if `debug_options.xla_allow_excess_precision()` is true. Copybara import of the project: -- 796dc83 by Alexander Pivovarov <pivovaa@amazon.com>: [CPU] Add SimplifyFPConversions only if xla_allow_excess_precision Merging this change closes openxla#10687 COPYBARA_INTEGRATE_REVIEW=openxla#10687 from apivovarov:fix_cpu_SimplifyFPConversions 796dc83 PiperOrigin-RevId: 617460913
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds BF16 support in oneDNN Matmul op by allowing the Dot op to maintain the BF16 type until handled by OneDnnMatMulRewriter pass.