From f8fbb6ee7d037eb9b8811355a7dac4a6866e9a9f Mon Sep 17 00:00:00 2001 From: "jack.kirk" Date: Wed, 23 Feb 2022 14:19:29 +0000 Subject: [PATCH 1/7] Added extension proposal for sycl_ext_ONEAPI_bf16_math. Signed-off-by: jack.kirk --- .../sycl_ext_oneapi_bf16_math.asciidoc | 105 ++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc new file mode 100644 index 0000000000000..42e9e18e9fd5d --- /dev/null +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc @@ -0,0 +1,105 @@ +# Bfloat16 math functions Extension for DPC++: = SYCL_ONEAPI_bf16_math +:source-highlighter: coderay +:coderay-linenums-mode: table +:dpcpp: pass:[DPC++] + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +Copyright (c) 2021-2021 Intel Corporation. All rights reserved. + +IMPORTANT: This specification is a draft. + +NOTE: The APIs described in this specification are experimental. Future versions of this extension may change these APIs in ways that are incompatible with the versions described here. + +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are +trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. +used by permission by Khronos. + +This extension is written against the SYCL 2020 revision 4 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + +## Contributors + +* Jack Kirk + +## Introduction + +This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar data type: `bfloat16`, and the `bfloat16` vector data types: `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16` available at compile-time, in line with corresponding `half` types that are available at compile time on devices that +have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of these vector and scalar types which we refer to as `genbfloat16`. Initially this experimental extension may also support the corresponding storage types for each of these `genbfloat16` types, namely `short`, `short3`, `short4`, `short8`, and `short16`. +The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions for the new types. + +## Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification section 6.3.3 "Feature test macros". Therefore, an implementation +supporting this extension must predefine the macro +SYCL_ONEAPI_BF16_MATH to one of the values defined in the table +below. Applications can test for the existence of this macro to determine if the +implementation supports this feature, or applications can test the macro’s value +to determine which of the extension’s APIs the implementation supports. + +[%header,cols="1,5"] +|=== +|Value |Description +|1 |Initial extension implementation. +|=== + +## Motivation + +In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, we need to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices. + +## New function declarations + +```c++ +namespace sycl::ext::oneapi::experimental { + +// genbfloat16 fma (genbfloat16 a, genbfloat16 b, genbfloat16 c) +template +detail::enable_if_t::value, T> fma(T a, T b, + T c); + +// genbfloat16 fmax (genbfloat16 x, genbfloat16 y) +template +detail::enable_if_t::value, T> fmax(T x, T y); + +// genbfloat16 fmin (genbfloat16 x, genbfloat16 y) +template +detail::enable_if_t::value, T> fmin(T x, T y); + +// genbfloat16 fabs (genbfloat16 x) +template +detail::enable_if_t::value, T> fabs(T x); + +} // namespace sycl::ext::oneapi::experimental +``` + +## Issues for future discussion + +1. We shold decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types. + +2. In the future we will wish to add an additional Fused Multiply Add function which performs RELU saturation. However such a function should also allow operands to be `half` types, and as such may fall outside of the scope of the current extension. + + +## Revision History + +[frame="none",options="header"] +|====================== +|Rev |Date |Author |Changes +|1 |2022-02-23 |Jack Kirk |Initial working draft. +|====================== From 5e6eb9302c6c7d0c00a3cae28d891d2565a6114a Mon Sep 17 00:00:00 2001 From: "jack.kirk" Date: Wed, 23 Feb 2022 14:26:12 +0000 Subject: [PATCH 2/7] small improvement --- .../experimental/sycl_ext_oneapi_bf16_math.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc index 42e9e18e9fd5d..d9b06b6ed6267 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc @@ -1,4 +1,4 @@ -# Bfloat16 math functions Extension for DPC++: = SYCL_ONEAPI_bf16_math +# Bfloat16 math functions extension for DPC++: = SYCL_ONEAPI_bf16_math :source-highlighter: coderay :coderay-linenums-mode: table :dpcpp: pass:[DPC++] @@ -42,7 +42,7 @@ SYCL specification refer to that revision. This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar data type: `bfloat16`, and the `bfloat16` vector data types: `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16` available at compile-time, in line with corresponding `half` types that are available at compile time on devices that have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of these vector and scalar types which we refer to as `genbfloat16`. Initially this experimental extension may also support the corresponding storage types for each of these `genbfloat16` types, namely `short`, `short3`, `short4`, `short8`, and `short16`. -The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions for the new types. +The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types. ## Feature test macro From e4aa08d9cc28a84a9b016709138dba0b294c0cdb Mon Sep 17 00:00:00 2001 From: "jack.kirk" Date: Thu, 3 Mar 2022 15:35:04 +0000 Subject: [PATCH 3/7] Discussed unified extensions and vec types in issues. --- .../sycl_ext_oneapi_bf16_math.asciidoc | 39 ++++++++++--------- 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc index d9b06b6ed6267..1d47b87ad1acd 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc @@ -1,4 +1,4 @@ -# Bfloat16 math functions extension for DPC++: = SYCL_ONEAPI_bf16_math +# Bfloat16 math functions extension for DPC++: SYCL_ONEAPI_bf16_math :source-highlighter: coderay :coderay-linenums-mode: table :dpcpp: pass:[DPC++] @@ -38,11 +38,16 @@ SYCL specification refer to that revision. * Jack Kirk +## Motivation + +In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, for many applications it is necessary to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices. + ## Introduction -This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar data type: `bfloat16`, and the `bfloat16` vector data types: `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16` available at compile-time, in line with corresponding `half` types that are available at compile time on devices that -have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of these vector and scalar types which we refer to as `genbfloat16`. Initially this experimental extension may also support the corresponding storage types for each of these `genbfloat16` types, namely `short`, `short3`, `short4`, `short8`, and `short16`. +This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the aspect introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar (`bfloat16`) and vector data types (See the issues section for an action item regarding suitable vector data types) available at compile-time, in line with corresponding `half` types that are available at compile time on devices that +have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of the `bfloat16` vector and scalar types which we refer to as `genbfloat16` in this document. Initially an implementation of this experimental extension may support the corresponding storage types for each of these `genbfloat16` types, e.g. `bfloat16` has storage type `uint16_t`. The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types. +This extension is intended to be backend agnostic, such that any backend may implement `bfloat16` versions of these math functions when available. This means that a pre-requisite for this extension is the existence of a backend agnostic `bfloat16` class and aspect. Ideally the existing `SYCL_EXT_INTEL_BF16_CONVERSION` extension can be generalized to other backends for this purpose. ## Feature test macro @@ -60,41 +65,39 @@ to determine which of the extension’s APIs the implementation supports. |1 |Initial extension implementation. |=== -## Motivation - -In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, we need to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices. - ## New function declarations ```c++ namespace sycl::ext::oneapi::experimental { -// genbfloat16 fma (genbfloat16 a, genbfloat16 b, genbfloat16 c) +// Available only when "T" is one of the genbfloat16 types. template -detail::enable_if_t::value, T> fma(T a, T b, - T c); +T fma(T a, T b, T c); -// genbfloat16 fmax (genbfloat16 x, genbfloat16 y) +// Available only when "T" is one of the genbfloat16 types. template -detail::enable_if_t::value, T> fmax(T x, T y); +T fmax(T x, T y); -// genbfloat16 fmin (genbfloat16 x, genbfloat16 y) +// Available only when "T" is one of the genbfloat16 types. template -detail::enable_if_t::value, T> fmin(T x, T y); +T fmin(T x, T y); -// genbfloat16 fabs (genbfloat16 x) +// Available only when "T" is one of the genbfloat16 types. template -detail::enable_if_t::value, T> fabs(T x); +T fabs(T x); } // namespace sycl::ext::oneapi::experimental ``` -## Issues for future discussion +## Issues for future discussion/resolution -1. We shold decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types. +1. Decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types. 2. In the future we will wish to add an additional Fused Multiply Add function which performs RELU saturation. However such a function should also allow operands to be `half` types, and as such may fall outside of the scope of the current extension. +3. Decide the appropriate `bfloat16` vector types to support. Should the range of vector types match the range used for other data types such as half, float, and double: e.g. `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16`. Natural storage types for `bfloat16` and `bfloat16_2` would probably be `uint16_t` and `uint32_t` respectively. What about the other vector types? The choice of `bfloat16` storage types should reflect the intended use of `bfloat16` for all compatible backends. + +4. Should we join together all `bfloat16` related extensions such as this one and a generalized `SYCL_EXT_INTEL_BF16_CONVERSION` into a single extension. In this case should the `bfloat16` vector types be defined in such a unified extension? ## Revision History From fffde29c0aae95b79106d3e9a9cac0cbe7d5cea1 Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Tue, 5 Apr 2022 10:09:44 +0100 Subject: [PATCH 4/7] moved bfloat16 math functions description to bfloat16 doc --- .../sycl_ext_oneapi_bf16_math.asciidoc | 108 ----- .../sycl_ext_oneapi_bfloat16.asciidoc | 370 ++++++++++++++++++ 2 files changed, 370 insertions(+), 108 deletions(-) delete mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc create mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc deleted file mode 100644 index 1d47b87ad1acd..0000000000000 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc +++ /dev/null @@ -1,108 +0,0 @@ -# Bfloat16 math functions extension for DPC++: SYCL_ONEAPI_bf16_math -:source-highlighter: coderay -:coderay-linenums-mode: table -:dpcpp: pass:[DPC++] - -// This section needs to be after the document title. -:doctype: book -:toc2: -:toc: left -:encoding: utf-8 -:lang: en - -:blank: pass:[ +] - -// Set the default source code type in this document to C++, -// for syntax highlighting purposes. This is needed because -// docbook uses c++ and html5 uses cpp. -:language: {basebackend@docbook:c++:cpp} - - -== Notice - -Copyright (c) 2021-2021 Intel Corporation. All rights reserved. - -IMPORTANT: This specification is a draft. - -NOTE: The APIs described in this specification are experimental. Future versions of this extension may change these APIs in ways that are incompatible with the versions described here. - -NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are -trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. -used by permission by Khronos. - -This extension is written against the SYCL 2020 revision 4 specification. All -references below to the "core SYCL specification" or to section numbers in the -SYCL specification refer to that revision. - -## Contributors - -* Jack Kirk - -## Motivation - -In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, for many applications it is necessary to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices. - -## Introduction - -This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the aspect introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar (`bfloat16`) and vector data types (See the issues section for an action item regarding suitable vector data types) available at compile-time, in line with corresponding `half` types that are available at compile time on devices that -have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of the `bfloat16` vector and scalar types which we refer to as `genbfloat16` in this document. Initially an implementation of this experimental extension may support the corresponding storage types for each of these `genbfloat16` types, e.g. `bfloat16` has storage type `uint16_t`. -The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types. -This extension is intended to be backend agnostic, such that any backend may implement `bfloat16` versions of these math functions when available. This means that a pre-requisite for this extension is the existence of a backend agnostic `bfloat16` class and aspect. Ideally the existing `SYCL_EXT_INTEL_BF16_CONVERSION` extension can be generalized to other backends for this purpose. - -## Feature test macro - -This extension provides a feature-test macro as described in the core SYCL -specification section 6.3.3 "Feature test macros". Therefore, an implementation -supporting this extension must predefine the macro -SYCL_ONEAPI_BF16_MATH to one of the values defined in the table -below. Applications can test for the existence of this macro to determine if the -implementation supports this feature, or applications can test the macro’s value -to determine which of the extension’s APIs the implementation supports. - -[%header,cols="1,5"] -|=== -|Value |Description -|1 |Initial extension implementation. -|=== - -## New function declarations - -```c++ -namespace sycl::ext::oneapi::experimental { - -// Available only when "T" is one of the genbfloat16 types. -template -T fma(T a, T b, T c); - -// Available only when "T" is one of the genbfloat16 types. -template -T fmax(T x, T y); - -// Available only when "T" is one of the genbfloat16 types. -template -T fmin(T x, T y); - -// Available only when "T" is one of the genbfloat16 types. -template -T fabs(T x); - -} // namespace sycl::ext::oneapi::experimental -``` - -## Issues for future discussion/resolution - -1. Decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types. - -2. In the future we will wish to add an additional Fused Multiply Add function which performs RELU saturation. However such a function should also allow operands to be `half` types, and as such may fall outside of the scope of the current extension. - -3. Decide the appropriate `bfloat16` vector types to support. Should the range of vector types match the range used for other data types such as half, float, and double: e.g. `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16`. Natural storage types for `bfloat16` and `bfloat16_2` would probably be `uint16_t` and `uint32_t` respectively. What about the other vector types? The choice of `bfloat16` storage types should reflect the intended use of `bfloat16` for all compatible backends. - -4. Should we join together all `bfloat16` related extensions such as this one and a generalized `SYCL_EXT_INTEL_BF16_CONVERSION` into a single extension. In this case should the `bfloat16` vector types be defined in such a unified extension? - -## Revision History - -[frame="none",options="header"] -|====================== -|Rev |Date |Author |Changes -|1 |2022-02-23 |Jack Kirk |Initial working draft. -|====================== diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc new file mode 100644 index 0000000000000..e53a9f3fdf1b7 --- /dev/null +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc @@ -0,0 +1,370 @@ += sycl_ext_oneapi_bfloat16 + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +// This is necessary for asciidoc, but not for asciidoctor +:cpp: C++ + +== Notice + +IMPORTANT: This specification is a draft. + +Copyright (c) 2021-2022 Intel Corporation. All rights reserved. + +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are +trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. +used by permission by Khronos. + +== Dependencies + +This extension is written against the SYCL 2020 specification, Revision 4. + +== Status + +Draft + +This is a preview extension specification, intended to provide early access to +a feature for review and community feedback. When the feature matures, this +specification may be released as a formal extension. + +Because the interfaces defined by this specification are not final and are +subject to change they are not intended to be used by shipping software +products. + +== Version + +Revision: 4 + +== Introduction + +This extension adds functionality to convert value of single-precision +floating-point type(`float`) to `bfloat16` type and vice versa. The extension +doesn't add support for `bfloat16` type as such, instead it uses 16-bit integer +type(`uint16_t`) as a storage for `bfloat16` values. + +The purpose of conversion from float to bfloat16 is to reduce the amount of memory +required to store floating-point numbers. Computations are expected to be done with +32-bit floating-point values. + +This extension is an optional kernel feature as described in +https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:optional-kernel-features[section 5.7] +of the SYCL 2020 spec. Therefore, attempting to submit a kernel using this +feature to a device that does not support it should cause a synchronous +`errc::kernel_not_supported` exception to be thrown from the kernel invocation +command (e.g. from `parallel_for`). + +== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification section 6.3.3 "Feature test macros". Therefore, an implementation +supporting this extension must predefine the macro +`SYCL_EXT_ONEAPI_BFLOAT16` to one of the values defined in the table +below. Applications can test for the existence of this macro to determine if +the implementation supports this feature, or applications can test the macro’s + value to determine which of the extension’s APIs the implementation supports. + +[%header,cols="1,5"] +|=== +|Value |Description +|1 |Initial extension version. Base features are supported. +|=== + +== Extension to `enum class aspect` + +[source] +---- +namespace sycl { +enum class aspect { + ... + ext_oneapi_bfloat16 +} +} +---- + +If a SYCL device has the `ext_oneapi_bfloat16` aspect, then it natively +supports conversion of values of `float` type to `bfloat16` and back. + +If the device doesn't have the aspect, objects of `bfloat16` class must not be +used in the device code. + +**NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported. The +`bfloat16` class is currently supported only on Xe HP GPU and Nvidia A100 GPU. + +== New `bfloat16` class + +The `bfloat16` class below provides the conversion functionality. Conversion +from `float` to `bfloat16` is done with round to nearest even(RTE) rounding +mode. + +[source] +---- +namespace sycl { +namespace ext { +namespace oneapi { +namespace experimental { + +class bfloat16 { + using storage_t = uint16_t; + storage_t value; + +public: + bfloat16() = default; + bfloat16(const bfloat16 &) = default; + ~bfloat16() = default; + + // Explicit conversion functions + static storage_t from_float(const float &a); + static float to_float(const storage_t &a); + + // Convert from float to bfloat16 + bfloat16(const float &a); + bfloat16 &operator=(const float &a); + + // Convert from bfloat16 to float + operator float() const; + + // Get bfloat16 as uint16. + operator storage_t() const; + + // Convert to bool type + explicit operator bool(); + + friend bfloat16 operator-(bfloat16 &bf) { /* ... */ } + + // OP is: prefix ++, -- + friend bfloat16 &operatorOP(bfloat16 &bf) { /* ... */ } + + // OP is: postfix ++, -- + friend bfloat16 operatorOP(bfloat16 &bf, int) { /* ... */ } + + // OP is: +=, -=, *=, /= + friend bfloat16 &operatorOP(bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ } + + // OP is +, -, *, / + friend bfloat16 operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ } + template + friend bfloat16 operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ } + template + friend bfloat16 operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ } + + // OP is ==,!=, <, >, <=, >= + friend bool operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ } + template + friend bool operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ } + template + friend bool operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ } +}; + +} // namespace experimental +} // namespace oneapi +} // namespace ext +} // namespace sycl +---- + +Table 1. Member functions of `bfloat16` class. +|=== +| Member Function | Description + +| `static storage_t from_float(const float &a);` +| Explicitly convert from `float` to `bfloat16`. + +| `static float to_float(const storage_t &a);` +| Interpret `a` as `bfloat16` and explicitly convert it to `float`. + +| `bfloat16(const float& a);` +| Construct `bfloat16` from `float`. Converts `float` to `bfloat16`. + +| `bfloat16 &operator=(const float &a);` +| Replace the value with `a` converted to `bfloat16` + +| `operator float() const;` +| Return `bfloat16` value converted to `float`. + +| `operator storage_t() const;` +| Return `uint16_t` value, whose bits represent `bfloat16` value. + +| `explicit operator bool() { /* ... */ }` +| Convert `bfloat16` to `bool` type. Return `false` if the value equals to + zero, return `true` otherwise. + +| `friend bfloat16 operator-(bfloat16 &bf) { /* ... */ }` +| Construct new instance of `bfloat16` class with negated value of the `bf`. + +| `friend bfloat16 &operatorOP(bfloat16 &bf) { /* ... */ }` +| Perform an in-place `OP` prefix arithmetic operation on the `bf`, + assigning the result to the `bf` and return the `bf`. + + OP is: `++, --` + +| `friend bfloat16 operatorOP(bfloat16 &bf, int) { /* ... */ }` +| Perform an in-place `OP` postfix arithmetic operation on `bf`, assigning + the result to the `bf` and return a copy of `bf` before the operation is + performed. + + OP is: `++, --` + +| `friend bfloat16 operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }` +| Perform an in-place `OP` arithmetic operation between the `lhs` and the `rhs` + and return the `lhs`. + + OP is: `+=, -=, *=, /=` + +| `friend type operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }` +| Construct a new instance of the `bfloat16` class with the value of the new + `bfloat16` instance being the result of an OP arithmetic operation between + the `lhs` `bfloat16` and `rhs` `bfloat16` values. + + OP is `+, -, *, /` + +| `template + friend bfloat16 operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ }` +| Construct a new instance of the `bfloat16` class with the value of the new + `bfloat16` instance being the result of an OP arithmetic operation between + the `lhs` `bfloat16` value and `rhs` of template type `T`. Type `T` must be + convertible to `float`. + + OP is `+, -, *, /` + +| `template + friend bfloat16 operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }` +| Construct a new instance of the `bfloat16` class with the value of the new + `bfloat16` instance being the result of an OP arithmetic operation between + the `lhs` of template type `T` and `rhs` `bfloat16` value. Type `T` must be + convertible to `float`. + + OP is `+, -, *, /` + +| `friend bool operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }` +| Perform comparison operation OP between `lhs` `bfloat16` and `rhs` `bfloat16` + values and return the result as a boolean value. + +OP is `==, !=, <, >, <=, >=` + +| `template + friend bool operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ }` +| Perform comparison operation OP between `lhs` `bfloat16` and `rhs` of + template type `T` and return the result as a boolean value. Type `T` must be + convertible to `float`. + +OP is `==, !=, <, >, <=, >=` + +| `template + friend bool operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }` +| Perform comparison operation OP between `lhs` of template type `T` and `rhs` + `bfloat16` value and return the result as a boolean value. Type `T` must be + convertible to `float`. + +OP is `==, !=, <, >, <=, >=` +|=== + +== Example + +[source] +---- +#include +#include + +using sycl::ext::oneapi::experimental::bfloat16; + +bfloat16 operator+(const bfloat16 &lhs, const bfloat16 &rhs) { + return static_cast(lhs) + static_cast(rhs); +} + +float foo(float a, float b) { + // Convert from float to bfloat16. + bfloat16 A {a}; + bfloat16 B {b}; + + // Convert A and B from bfloat16 to float, do addition on floating-pointer + // numbers, then convert the result to bfloat16 and store it in C. + bfloat16 C = A + B; + + // Return the result converted from bfloat16 to float. + return C; +} + +int main (int argc, char *argv[]) { + float data[3] = {7.0, 8.1, 0.0}; + sycl::device dev; + sycl::queue deviceQueue{dev}; + sycl::buffer buf {data, sycl::range<1> {3}}; + + if (dev.has(sycl::aspect::ext_oneapi_bfloat16)) { + deviceQueue.submit ([&] (sycl::handler& cgh) { + auto numbers = buf.get_access (cgh); + cgh.single_task ([=] () { + numbers[2] = foo(numbers[0], numbers[1]); + }); + }); + } + return 0; +} +---- + +== New bfloat16 math functions + +In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, for many applications it is necessary to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices. +The `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions are extended to support the `bfloat16` type. This proposal assumes that devices which support the `ext_oneapi_bfloat16` aspect have the `bfloat16` scalar (`bfloat16`) and vector data types (See the issues section for an action item regarding suitable vector data types) available at compile-time, in line with corresponding `half` types that are available at compile time on devices that +have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of the `bfloat16` vector and scalar types which we refer to as `genbfloat16` in this document. +The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This extension does not introduce any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types. + +== New bfloat16 math function declarations + +```c++ +namespace sycl::ext::oneapi::experimental { + +// Available only when "T" is one of the genbfloat16 types. +template +T fma(T a, T b, T c); + +// Available only when "T" is one of the genbfloat16 types. +template +T fmax(T x, T y); + +// Available only when "T" is one of the genbfloat16 types. +template +T fmin(T x, T y); + +// Available only when "T" is one of the genbfloat16 types. +template +T fabs(T x); + +} // namespace sycl::ext::oneapi::experimental +``` + +== Issues + +1. Decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types. + +2. Decide the appropriate `bfloat16` vector types to support. Should the range of vector types match the range used for other data types such as half, float, and double: e.g. `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16`. Natural storage types for `bfloat16` and `bfloat16_2` would probably be `uint16_t` and `uint32_t` respectively. What about the other vector types? The choice of `bfloat16` storage types should reflect the intended use of `bfloat16` for all compatible backends. + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|1|2021-08-02|Alexey Sotkin |Initial public working draft +|2|2021-08-17|Alexey Sotkin |Add explicit conversion functions + + Add operator overloadings + + Apply code review suggestions +|3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor +|4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi +|5|2022-04-05|Jack Kirk | Added section for bfloat16 math builtins +|======================================== From b176db23eff159e476ac008c5a24fdd9c53e2508 Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Fri, 15 Apr 2022 12:01:25 +0100 Subject: [PATCH 5/7] Added descriptions for new bfloat16 functs --- .../sycl_ext_oneapi_bfloat16.asciidoc | 81 ++++++++++++++++++- 1 file changed, 79 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc index 88b6c73b02514..ff11e6597b4d3 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc @@ -103,7 +103,7 @@ If the device doesn't have the aspect, objects of `bfloat16` class must not be used in the device code. **NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported. The -`bfloat16` class is currently supported only on Xe HP GPU and Nvidia A100 GPU. +`bfloat16` class is currently supported only on Xe HP GPU and Nvidia GPUs with Compute Capability >= SM80. == New `bfloat16` class @@ -316,9 +316,85 @@ int main (int argc, char *argv[]) { } ---- +== New bfloat16 math functions + +Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such application developers can use these functions together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension. +The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. +The following functions are only available only when "T" is `bfloat16` `or `sycl::marray`, where {N} means any positive value of size_t type. + +=== fma + +```c++ +namespace sycl::ext::oneapi::experimental { + +template +T fma(T a, T b, T c); +} // namespace sycl::ext::oneapi::experimental +``` + +==== Description + +Returns the correctly rounded +floating-point representation of the +sum of `c` with the infinitely precise +product of `a` and `b`. Rounding of +intermediate products shall not +occur. + +=== fmax + +```c++ +namespace sycl::ext::oneapi::experimental { +template +T fmax(T x, T y); +} // namespace sycl::ext::oneapi::experimental +``` + +==== Description + +Returns `y` if +`x < y`, otherwise it +returns `x`. If one argument is a +NaN, `fmax()` returns the other +argument. If both arguments are +NaNs, `fmax()` returns a NaN. + +=== fmin + +```c++ +namespace sycl::ext::oneapi::experimental { +template +T fmin(T x, T y); +} // namespace sycl::ext::oneapi::experimental +``` + +==== Description + +Returns `y` if +`y < x`, otherwise it +returns `x`. If one argument is a +NaN, `fmax()` returns the other +argument. If both arguments are +NaNs, `fmax()` returns a NaN. + +=== fabs + +```c++ +namespace sycl::ext::oneapi::experimental { +template +T fabs(T x); +} // namespace sycl::ext::oneapi::experimental +``` + +==== Description + +Compute absolute value of a `bfloat16`. + == Issues -None. +2. The CUDA backend has no use case that would necessitate support of the `vec` class in bfloat16 math builtins, since `marray` is always preferred. Support for the `vec` class can be added if other backends require it. + +3. Decide on a roadmap to extend support of `bfloat16` to more SYCL 2020 math functions. == Revision History @@ -333,4 +409,5 @@ None. Apply code review suggestions |3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor |4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi +|5|2022-04-05|Jack Kirk | Added section for bfloat16 math builtins |======================================== From ea4a64bfdfd83bfafb3e92fa15abbc860ee6ed24 Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Fri, 15 Apr 2022 12:55:52 +0100 Subject: [PATCH 6/7] format --- .../sycl_ext_oneapi_bfloat16.asciidoc | 20 +++++++++---------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc index ff11e6597b4d3..480ad901ddf84 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc @@ -48,7 +48,7 @@ products. == Version -Revision: 4 +Revision: 5 == Introduction @@ -318,9 +318,11 @@ int main (int argc, char *argv[]) { == New bfloat16 math functions -Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such application developers can use these functions together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension. +Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such can used together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension. + The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. -The following functions are only available only when "T" is `bfloat16` `or `sycl::marray`, where {N} means any positive value of size_t type. + +The following functions are only available when `T` is `bfloat16` `or `sycl::marray`, where {N} means any positive value of size_t type. === fma @@ -334,12 +336,8 @@ T fma(T a, T b, T c); ==== Description -Returns the correctly rounded -floating-point representation of the -sum of `c` with the infinitely precise -product of `a` and `b`. Rounding of -intermediate products shall not -occur. +Returns the correctly rounded floating-point representation of the sum of `c` with the infinitely precise product of `a` and `b`. +Rounding of intermediate products shall not occur. The mantissa LSB rounds to the nearest even. Subnormal numbers are supported. === fmax @@ -392,9 +390,9 @@ Compute absolute value of a `bfloat16`. == Issues -2. The CUDA backend has no use case that would necessitate support of the `vec` class in bfloat16 math builtins, since `marray` is always preferred. Support for the `vec` class can be added if other backends require it. +1. The CUDA backend does not have a use case that would necessitate support of the `vec` class in bfloat16 math builtins, and `marray` would always be preferred over `vec` if there were to be added `vec` support in the CUDA backend. For portability reasons, support for the `vec` class can be easily added if other backends require it. -3. Decide on a roadmap to extend support of `bfloat16` to more SYCL 2020 math functions. +2. We should decide on a roadmap to extend support of `bfloat16` to other SYCL 2020 math functions. == Revision History From 0ae399aecc9c08b5c0601492e87ece9f410c6ae3 Mon Sep 17 00:00:00 2001 From: JackAKirk Date: Fri, 15 Apr 2022 13:02:46 +0100 Subject: [PATCH 7/7] format --- .../experimental/sycl_ext_oneapi_bfloat16.asciidoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc index 480ad901ddf84..bec08876ed084 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc @@ -318,11 +318,11 @@ int main (int argc, char *argv[]) { == New bfloat16 math functions -Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such can used together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension. +Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the sycl_ext_oneapi_matrix extension. The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. -The following functions are only available when `T` is `bfloat16` `or `sycl::marray`, where {N} means any positive value of size_t type. +The following functions are only available when `T` is `bfloat16` or `sycl::marray`, where `{N}` means any positive value of `size_t` type. === fma @@ -390,7 +390,7 @@ Compute absolute value of a `bfloat16`. == Issues -1. The CUDA backend does not have a use case that would necessitate support of the `vec` class in bfloat16 math builtins, and `marray` would always be preferred over `vec` if there were to be added `vec` support in the CUDA backend. For portability reasons, support for the `vec` class can be easily added if other backends require it. +1. The CUDA backend does not have a use case that would necessitate support of the `vec` class in bfloat16 math functions, and `marray` would always be preferred over `vec` if `vec` support were to be added in the CUDA backend. For portability reasons, support for the `vec` class can be easily added if other backends require it. 2. We should decide on a roadmap to extend support of `bfloat16` to other SYCL 2020 math functions.