From f8fbb6ee7d037eb9b8811355a7dac4a6866e9a9f Mon Sep 17 00:00:00 2001
From: "jack.kirk" <jack.kirk@codeplay.com>
Date: Wed, 23 Feb 2022 14:19:29 +0000
Subject: [PATCH 1/7] Added extension proposal for sycl_ext_ONEAPI_bf16_math.

Signed-off-by: jack.kirk <jack.kirk@codeplay.com>
---
 .../sycl_ext_oneapi_bf16_math.asciidoc        | 105 ++++++++++++++++++
 1 file changed, 105 insertions(+)
 create mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc

diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
new file mode 100644
index 0000000000000..42e9e18e9fd5d
--- /dev/null
+++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
@@ -0,0 +1,105 @@
+# Bfloat16 math functions Extension for DPC++: = SYCL_ONEAPI_bf16_math
+:source-highlighter: coderay
+:coderay-linenums-mode: table
+:dpcpp: pass:[DPC++]
+
+// This section needs to be after the document title.
+:doctype: book
+:toc2:
+:toc: left
+:encoding: utf-8
+:lang: en
+
+:blank: pass:[ +]
+
+// Set the default source code type in this document to C++,
+// for syntax highlighting purposes.  This is needed because
+// docbook uses c++ and html5 uses cpp.
+:language: {basebackend@docbook:c++:cpp}
+
+
+== Notice
+
+Copyright (c) 2021-2021 Intel Corporation.  All rights reserved.
+
+IMPORTANT: This specification is a draft.
+
+NOTE: The APIs described in this specification are experimental. Future versions of this extension may change these APIs in ways that are incompatible with the versions described here.
+
+NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
+trademarks of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc.
+used by permission by Khronos.
+
+This extension is written against the SYCL 2020 revision 4 specification.  All
+references below to the "core SYCL specification" or to section numbers in the
+SYCL specification refer to that revision.
+
+## Contributors
+
+* Jack Kirk
+
+## Introduction
+
+This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar data type: `bfloat16`, and the `bfloat16` vector data types: `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16` available at compile-time, in line with corresponding `half` types that are available at compile time on devices that
+have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of these vector and scalar types which we refer to as `genbfloat16`. Initially this experimental extension may also support the corresponding storage types for each of these `genbfloat16` types, namely `short`, `short3`, `short4`, `short8`, and `short16`.
+The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions for the new types.
+
+## Feature test macro
+
+This extension provides a feature-test macro as described in the core SYCL
+specification section 6.3.3 "Feature test macros". Therefore, an implementation
+supporting this extension must predefine the macro
+SYCL_ONEAPI_BF16_MATH to one of the values defined in the table
+below. Applications can test for the existence of this macro to determine if the
+implementation supports this feature, or applications can test the macro’s value
+to determine which of the extension’s APIs the implementation supports.
+
+[%header,cols="1,5"]
+|===
+|Value |Description
+|1     |Initial extension implementation.
+|===
+
+## Motivation
+
+In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, we need to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices.
+
+## New function declarations
+
+```c++
+namespace sycl::ext::oneapi::experimental {
+
+// genbfloat16 fma (genbfloat16 a, genbfloat16 b, genbfloat16 c)
+template <typename T>
+detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fma(T a, T b,
+                                                          T c);
+
+// genbfloat16 fmax (genbfloat16 x, genbfloat16 y)
+template <typename T>
+detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fmax(T x, T y);
+
+// genbfloat16 fmin (genbfloat16 x, genbfloat16 y)
+template <typename T>
+detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fmin(T x, T y);
+
+// genbfloat16 fabs (genbfloat16 x)
+template <typename T>
+detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fabs(T x);
+
+} // namespace sycl::ext::oneapi::experimental
+```
+
+## Issues for future discussion
+
+1. We shold decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types.
+
+2. In the future we will wish to add an additional Fused Multiply Add function which performs RELU saturation. However such a function should also allow operands to be `half` types, and as such may fall outside of the scope of the current extension.
+
+
+## Revision History
+
+[frame="none",options="header"]
+|======================
+|Rev |Date       |Author        |Changes
+|1   |2022-02-23 |Jack Kirk  |Initial working draft.
+|======================

From 5e6eb9302c6c7d0c00a3cae28d891d2565a6114a Mon Sep 17 00:00:00 2001
From: "jack.kirk" <jack.kirk@codeplay.com>
Date: Wed, 23 Feb 2022 14:26:12 +0000
Subject: [PATCH 2/7] small improvement

---
 .../experimental/sycl_ext_oneapi_bf16_math.asciidoc           | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
index 42e9e18e9fd5d..d9b06b6ed6267 100644
--- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
+++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
@@ -1,4 +1,4 @@
-# Bfloat16 math functions Extension for DPC++: = SYCL_ONEAPI_bf16_math
+# Bfloat16 math functions extension for DPC++: = SYCL_ONEAPI_bf16_math
 :source-highlighter: coderay
 :coderay-linenums-mode: table
 :dpcpp: pass:[DPC++]
@@ -42,7 +42,7 @@ SYCL specification refer to that revision.
 
 This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar data type: `bfloat16`, and the `bfloat16` vector data types: `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16` available at compile-time, in line with corresponding `half` types that are available at compile time on devices that
 have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of these vector and scalar types which we refer to as `genbfloat16`. Initially this experimental extension may also support the corresponding storage types for each of these `genbfloat16` types, namely `short`, `short3`, `short4`, `short8`, and `short16`.
-The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions for the new types.
+The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types.
 
 ## Feature test macro
 

From e4aa08d9cc28a84a9b016709138dba0b294c0cdb Mon Sep 17 00:00:00 2001
From: "jack.kirk" <jack.kirk@codeplay.com>
Date: Thu, 3 Mar 2022 15:35:04 +0000
Subject: [PATCH 3/7] Discussed unified extensions and vec types in issues.

---
 .../sycl_ext_oneapi_bf16_math.asciidoc        | 39 ++++++++++---------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
index d9b06b6ed6267..1d47b87ad1acd 100644
--- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
+++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
@@ -1,4 +1,4 @@
-# Bfloat16 math functions extension for DPC++: = SYCL_ONEAPI_bf16_math
+# Bfloat16 math functions extension for DPC++: SYCL_ONEAPI_bf16_math
 :source-highlighter: coderay
 :coderay-linenums-mode: table
 :dpcpp: pass:[DPC++]
@@ -38,11 +38,16 @@ SYCL specification refer to that revision.
 
 * Jack Kirk
 
+## Motivation
+
+In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, for many applications it is necessary to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices.
+
 ## Introduction
 
-This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar data type: `bfloat16`, and the `bfloat16` vector data types: `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16` available at compile-time, in line with corresponding `half` types that are available at compile time on devices that
-have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of these vector and scalar types which we refer to as `genbfloat16`. Initially this experimental extension may also support the corresponding storage types for each of these `genbfloat16` types, namely `short`, `short3`, `short4`, `short8`, and `short16`.
+This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the aspect introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar (`bfloat16`) and vector data types (See the issues section for an action item regarding suitable vector data types) available at compile-time, in line with corresponding `half` types that are available at compile time on devices that
+have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of the `bfloat16` vector and scalar types which we refer to as `genbfloat16` in this document. Initially an implementation of this experimental extension may support the corresponding storage types for each of these `genbfloat16` types, e.g. `bfloat16` has storage type `uint16_t`.
 The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types.
+This extension is intended to be backend agnostic, such that any backend may implement `bfloat16` versions of these math functions when available. This means that a pre-requisite for this extension is the existence of a backend agnostic `bfloat16` class and aspect. Ideally the existing `SYCL_EXT_INTEL_BF16_CONVERSION` extension can be generalized to other backends for this purpose.
 
 ## Feature test macro
 
@@ -60,41 +65,39 @@ to determine which of the extension’s APIs the implementation supports.
 |1     |Initial extension implementation.
 |===
 
-## Motivation
-
-In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, we need to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices.
-
 ## New function declarations
 
 ```c++
 namespace sycl::ext::oneapi::experimental {
 
-// genbfloat16 fma (genbfloat16 a, genbfloat16 b, genbfloat16 c)
+// Available only when "T" is one of the genbfloat16 types.
 template <typename T>
-detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fma(T a, T b,
-                                                          T c);
+T fma(T a, T b, T c);
 
-// genbfloat16 fmax (genbfloat16 x, genbfloat16 y)
+// Available only when "T" is one of the genbfloat16 types.
 template <typename T>
-detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fmax(T x, T y);
+T fmax(T x, T y);
 
-// genbfloat16 fmin (genbfloat16 x, genbfloat16 y)
+// Available only when "T" is one of the genbfloat16 types.
 template <typename T>
-detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fmin(T x, T y);
+T fmin(T x, T y);
 
-// genbfloat16 fabs (genbfloat16 x)
+// Available only when "T" is one of the genbfloat16 types.
 template <typename T>
-detail::enable_if_t<detail::is_genbfloat16<T>::value, T> fabs(T x);
+T fabs(T x);
 
 } // namespace sycl::ext::oneapi::experimental
 ```
 
-## Issues for future discussion
+## Issues for future discussion/resolution
 
-1. We shold decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types.
+1. Decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types.
 
 2. In the future we will wish to add an additional Fused Multiply Add function which performs RELU saturation. However such a function should also allow operands to be `half` types, and as such may fall outside of the scope of the current extension.
 
+3. Decide the appropriate `bfloat16` vector types to support. Should the range of vector types match the range used for other data types such as half, float, and double: e.g. `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16`. Natural storage types for `bfloat16` and `bfloat16_2` would probably be `uint16_t` and `uint32_t` respectively. What about the other vector types? The choice of `bfloat16` storage types should reflect the intended use of `bfloat16` for all compatible backends.
+
+4. Should we join together all `bfloat16` related extensions such as this one and a generalized `SYCL_EXT_INTEL_BF16_CONVERSION` into a single extension. In this case should the `bfloat16` vector types be defined in such a unified extension?
 
 ## Revision History
 

From fffde29c0aae95b79106d3e9a9cac0cbe7d5cea1 Mon Sep 17 00:00:00 2001
From: JackAKirk <chezjakirk@gmail.com>
Date: Tue, 5 Apr 2022 10:09:44 +0100
Subject: [PATCH 4/7] moved bfloat16 math functions description to bfloat16 doc

---
 .../sycl_ext_oneapi_bf16_math.asciidoc        | 108 -----
 .../sycl_ext_oneapi_bfloat16.asciidoc         | 370 ++++++++++++++++++
 2 files changed, 370 insertions(+), 108 deletions(-)
 delete mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
 create mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc

diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
deleted file mode 100644
index 1d47b87ad1acd..0000000000000
--- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bf16_math.asciidoc
+++ /dev/null
@@ -1,108 +0,0 @@
-# Bfloat16 math functions extension for DPC++: SYCL_ONEAPI_bf16_math
-:source-highlighter: coderay
-:coderay-linenums-mode: table
-:dpcpp: pass:[DPC++]
-
-// This section needs to be after the document title.
-:doctype: book
-:toc2:
-:toc: left
-:encoding: utf-8
-:lang: en
-
-:blank: pass:[ +]
-
-// Set the default source code type in this document to C++,
-// for syntax highlighting purposes.  This is needed because
-// docbook uses c++ and html5 uses cpp.
-:language: {basebackend@docbook:c++:cpp}
-
-
-== Notice
-
-Copyright (c) 2021-2021 Intel Corporation.  All rights reserved.
-
-IMPORTANT: This specification is a draft.
-
-NOTE: The APIs described in this specification are experimental. Future versions of this extension may change these APIs in ways that are incompatible with the versions described here.
-
-NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
-trademarks of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc.
-used by permission by Khronos.
-
-This extension is written against the SYCL 2020 revision 4 specification.  All
-references below to the "core SYCL specification" or to section numbers in the
-SYCL specification refer to that revision.
-
-## Contributors
-
-* Jack Kirk
-
-## Motivation
-
-In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, for many applications it is necessary to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices.
-
-## Introduction
-
-This document proposes extending the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions to support the `bfloat16` type introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension. This proposal assumes that devices which support the aspect introduced in the `SYCL_EXT_INTEL_BF16_CONVERSION` extension have the `bfloat16` scalar (`bfloat16`) and vector data types (See the issues section for an action item regarding suitable vector data types) available at compile-time, in line with corresponding `half` types that are available at compile time on devices that
-have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of the `bfloat16` vector and scalar types which we refer to as `genbfloat16` in this document. Initially an implementation of this experimental extension may support the corresponding storage types for each of these `genbfloat16` types, e.g. `bfloat16` has storage type `uint16_t`.
-The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This proposal does not propose any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types.
-This extension is intended to be backend agnostic, such that any backend may implement `bfloat16` versions of these math functions when available. This means that a pre-requisite for this extension is the existence of a backend agnostic `bfloat16` class and aspect. Ideally the existing `SYCL_EXT_INTEL_BF16_CONVERSION` extension can be generalized to other backends for this purpose.
-
-## Feature test macro
-
-This extension provides a feature-test macro as described in the core SYCL
-specification section 6.3.3 "Feature test macros". Therefore, an implementation
-supporting this extension must predefine the macro
-SYCL_ONEAPI_BF16_MATH to one of the values defined in the table
-below. Applications can test for the existence of this macro to determine if the
-implementation supports this feature, or applications can test the macro’s value
-to determine which of the extension’s APIs the implementation supports.
-
-[%header,cols="1,5"]
-|===
-|Value |Description
-|1     |Initial extension implementation.
-|===
-
-## New function declarations
-
-```c++
-namespace sycl::ext::oneapi::experimental {
-
-// Available only when "T" is one of the genbfloat16 types.
-template <typename T>
-T fma(T a, T b, T c);
-
-// Available only when "T" is one of the genbfloat16 types.
-template <typename T>
-T fmax(T x, T y);
-
-// Available only when "T" is one of the genbfloat16 types.
-template <typename T>
-T fmin(T x, T y);
-
-// Available only when "T" is one of the genbfloat16 types.
-template <typename T>
-T fabs(T x);
-
-} // namespace sycl::ext::oneapi::experimental
-```
-
-## Issues for future discussion/resolution
-
-1. Decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types.
-
-2. In the future we will wish to add an additional Fused Multiply Add function which performs RELU saturation. However such a function should also allow operands to be `half` types, and as such may fall outside of the scope of the current extension.
-
-3. Decide the appropriate `bfloat16` vector types to support. Should the range of vector types match the range used for other data types such as half, float, and double: e.g. `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16`. Natural storage types for `bfloat16` and `bfloat16_2` would probably be `uint16_t` and `uint32_t` respectively. What about the other vector types? The choice of `bfloat16` storage types should reflect the intended use of `bfloat16` for all compatible backends.
-
-4. Should we join together all `bfloat16` related extensions such as this one and a generalized `SYCL_EXT_INTEL_BF16_CONVERSION` into a single extension. In this case should the `bfloat16` vector types be defined in such a unified extension?
-
-## Revision History
-
-[frame="none",options="header"]
-|======================
-|Rev |Date       |Author        |Changes
-|1   |2022-02-23 |Jack Kirk  |Initial working draft.
-|======================
diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
new file mode 100644
index 0000000000000..e53a9f3fdf1b7
--- /dev/null
+++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
@@ -0,0 +1,370 @@
+= sycl_ext_oneapi_bfloat16
+
+:source-highlighter: coderay
+:coderay-linenums-mode: table
+
+// This section needs to be after the document title.
+:doctype: book
+:toc2:
+:toc: left
+:encoding: utf-8
+:lang: en
+
+:blank: pass:[ +]
+
+// Set the default source code type in this document to C++,
+// for syntax highlighting purposes.  This is needed because
+// docbook uses c++ and html5 uses cpp.
+:language: {basebackend@docbook:c++:cpp}
+
+// This is necessary for asciidoc, but not for asciidoctor
+:cpp: C++
+
+== Notice
+
+IMPORTANT: This specification is a draft.
+
+Copyright (c) 2021-2022 Intel Corporation. All rights reserved.
+
+NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
+trademarks of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc.
+used by permission by Khronos.
+
+== Dependencies
+
+This extension is written against the SYCL 2020 specification, Revision 4.
+
+== Status
+
+Draft
+
+This is a preview extension specification, intended to provide early access to
+a feature for review and community feedback. When the feature matures, this
+specification may be released as a formal extension.
+
+Because the interfaces defined by this specification are not final and are
+subject to change they are not intended to be used by shipping software
+products.
+
+== Version
+
+Revision: 4
+
+== Introduction
+
+This extension adds functionality to convert value of single-precision
+floating-point type(`float`) to `bfloat16` type and vice versa. The extension
+doesn't add support for `bfloat16` type as such, instead it uses 16-bit integer
+type(`uint16_t`) as a storage for `bfloat16` values.
+
+The purpose of conversion from float to bfloat16 is to reduce the amount of memory
+required to store floating-point numbers. Computations are expected to be done with
+32-bit floating-point values.
+
+This extension is an optional kernel feature as described in
+https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:optional-kernel-features[section 5.7]
+of the SYCL 2020 spec. Therefore, attempting to submit a kernel using this
+feature to a device that does not support it should cause a synchronous
+`errc::kernel_not_supported` exception to be thrown from the kernel invocation
+command (e.g. from `parallel_for`).
+
+== Feature test macro
+
+This extension provides a feature-test macro as described in the core SYCL
+specification section 6.3.3 "Feature test macros". Therefore, an implementation
+supporting this extension must predefine the macro
+`SYCL_EXT_ONEAPI_BFLOAT16` to one of the values defined in the table
+below. Applications can test for the existence of this macro to determine if
+the implementation supports this feature, or applications can test the macro’s
+ value to determine which of the extension’s APIs the implementation supports.
+
+[%header,cols="1,5"]
+|===
+|Value |Description
+|1     |Initial extension version. Base features are supported.
+|===
+
+== Extension to `enum class aspect`
+
+[source]
+----
+namespace sycl {
+enum class aspect {
+  ...
+  ext_oneapi_bfloat16
+}
+}
+----
+
+If a SYCL device has the `ext_oneapi_bfloat16` aspect, then it natively
+supports conversion of values of `float` type to `bfloat16` and back.
+
+If the device doesn't have the aspect, objects of `bfloat16` class must not be
+used in the device code.
+
+**NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported.  The
+`bfloat16` class is currently supported only on Xe HP GPU and Nvidia A100 GPU.
+
+== New `bfloat16` class
+
+The `bfloat16` class below provides the conversion functionality. Conversion
+from `float` to `bfloat16` is done with round to nearest even(RTE) rounding
+mode.
+
+[source]
+----
+namespace sycl {
+namespace ext {
+namespace oneapi {
+namespace experimental {
+
+class bfloat16 {
+  using storage_t = uint16_t;
+  storage_t value;
+
+public:
+  bfloat16() = default;
+  bfloat16(const bfloat16 &) = default;
+  ~bfloat16() = default;
+
+  // Explicit conversion functions
+  static storage_t from_float(const float &a);
+  static float to_float(const storage_t &a);
+
+  // Convert from float to bfloat16
+  bfloat16(const float &a);
+  bfloat16 &operator=(const float &a);
+
+  // Convert from bfloat16 to float
+  operator float() const;
+
+  // Get bfloat16 as uint16.
+  operator storage_t() const;
+
+  // Convert to bool type
+  explicit operator bool();
+
+  friend bfloat16 operator-(bfloat16 &bf) { /* ... */ }
+
+  // OP is: prefix ++, --
+  friend bfloat16 &operatorOP(bfloat16 &bf) { /* ... */ }
+
+  // OP is: postfix ++, --
+  friend bfloat16 operatorOP(bfloat16 &bf, int) { /* ... */ }
+
+  // OP is: +=, -=, *=, /=
+  friend bfloat16 &operatorOP(bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }
+
+  // OP is +, -, *, /
+  friend bfloat16 operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }
+  template <typename T>
+  friend bfloat16 operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ }
+  template <typename T>
+  friend bfloat16 operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }
+
+  // OP is ==,!=, <, >, <=, >=
+  friend bool operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }
+  template <typename T>
+  friend bool operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ }
+  template <typename T>
+  friend bool operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }
+};
+
+} // namespace experimental
+} // namespace oneapi
+} // namespace ext
+} // namespace sycl
+----
+
+Table 1. Member functions of `bfloat16` class.
+|===
+| Member Function | Description
+
+|  `static storage_t from_float(const float &a);`
+|  Explicitly convert from `float` to `bfloat16`.
+
+|  `static float to_float(const storage_t &a);`
+|  Interpret `a` as `bfloat16` and explicitly convert it to `float`.
+
+| `bfloat16(const float& a);`
+| Construct `bfloat16` from `float`. Converts `float` to `bfloat16`.
+
+| `bfloat16 &operator=(const float &a);`
+| Replace the value with `a` converted to `bfloat16`
+
+| `operator float() const;`
+|  Return `bfloat16` value converted to `float`.
+
+| `operator storage_t() const;`
+| Return `uint16_t` value, whose bits represent `bfloat16` value.
+
+| `explicit operator bool() { /* ... */ }`
+| Convert `bfloat16` to `bool` type. Return `false` if the value equals to
+  zero, return `true` otherwise.
+
+| `friend bfloat16 operator-(bfloat16 &bf) { /* ... */ }`
+| Construct new instance of `bfloat16` class with negated value of the `bf`.
+
+| `friend bfloat16 &operatorOP(bfloat16 &bf) { /* ... */ }`
+| Perform an in-place `OP` prefix arithmetic operation on the `bf`,
+  assigning the result to the `bf` and return the `bf`.
+
+  OP is: `++, --`
+
+| `friend bfloat16 operatorOP(bfloat16 &bf, int) { /* ... */ }`
+| Perform an in-place `OP` postfix arithmetic operation on `bf`, assigning
+  the result to the `bf` and return a copy of `bf` before the operation is
+  performed.
+
+  OP is: `++, --`
+
+| `friend bfloat16 operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }`
+| Perform an in-place `OP` arithmetic operation between the `lhs` and the `rhs`
+  and return the `lhs`.
+
+  OP is: `+=, -=, *=, /=`
+
+| `friend type operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }`
+| Construct a new instance of the `bfloat16` class with the value of the new
+  `bfloat16` instance being the result of an OP arithmetic operation between
+  the `lhs` `bfloat16` and `rhs` `bfloat16` values.
+
+  OP is `+, -, *, /`
+
+| `template <typename T>
+  friend bfloat16 operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ }`
+| Construct a new instance of the `bfloat16` class with the value of the new
+  `bfloat16` instance being the result of an OP arithmetic operation between
+  the `lhs` `bfloat16` value and `rhs` of template type `T`. Type `T` must be
+  convertible to `float`.
+
+  OP is `+, -, *, /`
+
+| `template <typename T>
+  friend bfloat16 operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }`
+| Construct a new instance of the `bfloat16` class with the value of the new
+  `bfloat16` instance being the result of an OP arithmetic operation between
+  the `lhs` of template type `T` and `rhs` `bfloat16` value. Type `T` must be
+  convertible to `float`.
+
+  OP is `+, -, *, /`
+
+| `friend bool operatorOP(const bfloat16 &lhs, const bfloat16 &rhs) { /* ... */ }`
+| Perform comparison operation OP between `lhs` `bfloat16` and `rhs` `bfloat16`
+  values and return the result as a boolean value.
+
+OP is `==, !=, <, >, <=, >=`
+
+| `template <typename T>
+  friend bool operatorOP(const bfloat16 &lhs, const T &rhs) { /* ... */ }`
+| Perform comparison operation OP between `lhs` `bfloat16` and `rhs` of
+  template type `T` and return the result as a boolean value. Type `T` must be
+  convertible to `float`.
+
+OP is `==, !=, <, >, <=, >=`
+
+| `template <typename T>
+  friend bool operatorOP(const T &lhs, const bfloat16 &rhs) { /* ... */ }`
+| Perform comparison operation OP between `lhs` of template type `T` and `rhs`
+  `bfloat16` value and return the result as a boolean value. Type `T` must be
+  convertible to `float`.
+
+OP is `==, !=, <, >, <=, >=`
+|===
+
+== Example
+
+[source]
+----
+#include <sycl/sycl.hpp>
+#include <sycl/ext/oneapi/experimental/bfloat16.hpp>
+
+using sycl::ext::oneapi::experimental::bfloat16;
+
+bfloat16 operator+(const bfloat16 &lhs, const bfloat16 &rhs) {
+  return static_cast<float>(lhs) + static_cast<float>(rhs);
+}
+
+float foo(float a, float b) {
+  // Convert from float to bfloat16.
+  bfloat16 A {a};
+  bfloat16 B {b};
+
+  // Convert A and B from bfloat16 to float, do addition on floating-pointer
+  // numbers, then convert the result to bfloat16 and store it in C.
+  bfloat16 C = A + B;
+
+  // Return the result converted from bfloat16 to float.
+  return C;
+}
+
+int main (int argc, char *argv[]) {
+  float data[3] = {7.0, 8.1, 0.0};
+  sycl::device dev;
+  sycl::queue deviceQueue{dev};
+  sycl::buffer<float, 1> buf {data, sycl::range<1> {3}};
+
+  if (dev.has(sycl::aspect::ext_oneapi_bfloat16)) {
+    deviceQueue.submit ([&] (sycl::handler& cgh) {
+      auto numbers = buf.get_access<sycl::access::mode::read_write> (cgh);
+      cgh.single_task<class simple_kernel> ([=] () {
+        numbers[2] = foo(numbers[0], numbers[1]);
+      });
+    });
+  }
+  return 0;
+}
+----
+
+== New bfloat16 math functions
+
+In order to take full advantage of the new matrix extension, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc, for many applications it is necessary to introduce dedicated functions that support the new `bfloat16` data type, which can then be used as element wise operations on matrices.
+The `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions are extended to support the `bfloat16` type. This proposal assumes that devices which support the `ext_oneapi_bfloat16` aspect have the `bfloat16` scalar (`bfloat16`) and vector data types (See the issues section for an action item regarding suitable vector data types) available at compile-time, in line with corresponding `half` types that are available at compile time on devices that
+have `aspect::fp16` as described in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:opencl:extension-fp16. Therefore the `fma`, `fmin`, `fmax` and `fabs` functions should support all of the `bfloat16` vector and scalar types which we refer to as `genbfloat16` in this document.
+The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions. This extension does not introduce any changes to the expected behavior of these math functions beyond the new support for the `genbfloat16` types. A discussion issue has been raised at the bottom of this document on whether the specified maximum precision error should change for these functions when they use the `genbfloat16` data types.
+
+== New bfloat16 math function declarations
+
+```c++
+namespace sycl::ext::oneapi::experimental {
+
+// Available only when "T" is one of the genbfloat16 types.
+template <typename T>
+T fma(T a, T b, T c);
+
+// Available only when "T" is one of the genbfloat16 types.
+template <typename T>
+T fmax(T x, T y);
+
+// Available only when "T" is one of the genbfloat16 types.
+template <typename T>
+T fmin(T x, T y);
+
+// Available only when "T" is one of the genbfloat16 types.
+template <typename T>
+T fabs(T x);
+
+} // namespace sycl::ext::oneapi::experimental
+```
+
+== Issues
+
+1. Decide if there should be a different specified maximum precision error for these math functions when using the `genbfloat16` types.
+
+2. Decide the appropriate `bfloat16` vector types to support. Should the range of vector types match the range used for other data types such as half, float, and double: e.g. `bfloat16_1`, `bfloat16_2`, `bfloat16_3`, `bfloat16_4`, `bfloat16_8` and `bfloat16_16`. Natural storage types for `bfloat16` and `bfloat16_2` would probably be `uint16_t` and `uint32_t` respectively. What about the other vector types? The choice of `bfloat16` storage types should reflect the intended use of `bfloat16` for all compatible backends.
+
+== Revision History
+
+[cols="5,15,15,70"]
+[grid="rows"]
+[options="header"]
+|========================================
+|Rev|Date|Author|Changes
+|1|2021-08-02|Alexey Sotkin |Initial public working draft
+|2|2021-08-17|Alexey Sotkin |Add explicit conversion functions +
+                             Add operator overloadings +
+                             Apply code review suggestions
+|3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor
+|4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi
+|5|2022-04-05|Jack Kirk | Added section for bfloat16 math builtins
+|========================================

From b176db23eff159e476ac008c5a24fdd9c53e2508 Mon Sep 17 00:00:00 2001
From: JackAKirk <chezjakirk@gmail.com>
Date: Fri, 15 Apr 2022 12:01:25 +0100
Subject: [PATCH 5/7] Added descriptions for new bfloat16 functs

---
 .../sycl_ext_oneapi_bfloat16.asciidoc         | 81 ++++++++++++++++++-
 1 file changed, 79 insertions(+), 2 deletions(-)

diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
index 88b6c73b02514..ff11e6597b4d3 100644
--- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
+++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
@@ -103,7 +103,7 @@ If the device doesn't have the aspect, objects of `bfloat16` class must not be
 used in the device code.
 
 **NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported.  The
-`bfloat16` class is currently supported only on Xe HP GPU and Nvidia A100 GPU.
+`bfloat16` class is currently supported only on Xe HP GPU and Nvidia GPUs with Compute Capability >= SM80.
 
 == New `bfloat16` class
 
@@ -316,9 +316,85 @@ int main (int argc, char *argv[]) {
 }
 ----
 
+== New bfloat16 math functions
+
+Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such application developers can use these functions together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension.
+The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions.
+The following functions are only available only when "T" is `bfloat16` `or `sycl::marray<bfloat16, {N}>`, where {N} means any positive value of size_t type.
+
+=== fma
+
+```c++
+namespace sycl::ext::oneapi::experimental {
+
+template <typename T>
+T fma(T a, T b, T c);
+} // namespace sycl::ext::oneapi::experimental
+```
+
+==== Description
+
+Returns the correctly rounded
+floating-point representation of the
+sum of `c` with the infinitely precise
+product of `a` and `b`. Rounding of
+intermediate products shall not
+occur.
+
+=== fmax
+
+```c++
+namespace sycl::ext::oneapi::experimental {
+template <typename T>
+T fmax(T x, T y);
+} // namespace sycl::ext::oneapi::experimental
+```
+
+==== Description
+
+Returns `y` if
+`x < y`, otherwise it
+returns `x`. If one argument is a
+NaN, `fmax()` returns the other
+argument. If both arguments are
+NaNs, `fmax()` returns a NaN.
+
+=== fmin
+
+```c++
+namespace sycl::ext::oneapi::experimental {
+template <typename T>
+T fmin(T x, T y);
+} // namespace sycl::ext::oneapi::experimental
+```
+
+==== Description
+
+Returns `y` if
+`y < x`, otherwise it
+returns `x`. If one argument is a
+NaN, `fmax()` returns the other
+argument. If both arguments are
+NaNs, `fmax()` returns a NaN.
+
+=== fabs
+
+```c++
+namespace sycl::ext::oneapi::experimental {
+template <typename T>
+T fabs(T x);
+} // namespace sycl::ext::oneapi::experimental
+```
+
+==== Description
+
+Compute absolute value of a `bfloat16`.
+
 == Issues
 
-None.
+2. The CUDA backend has no use case that would necessitate support of the `vec` class in bfloat16 math builtins, since `marray` is always preferred. Support for the `vec` class can be added if other backends require it.
+
+3. Decide on a roadmap to extend support of `bfloat16` to more SYCL 2020 math functions.
 
 == Revision History
 
@@ -333,4 +409,5 @@ None.
                              Apply code review suggestions
 |3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor
 |4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi
+|5|2022-04-05|Jack Kirk | Added section for bfloat16 math builtins
 |========================================

From ea4a64bfdfd83bfafb3e92fa15abbc860ee6ed24 Mon Sep 17 00:00:00 2001
From: JackAKirk <chezjakirk@gmail.com>
Date: Fri, 15 Apr 2022 12:55:52 +0100
Subject: [PATCH 6/7] format

---
 .../sycl_ext_oneapi_bfloat16.asciidoc         | 20 +++++++++----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
index ff11e6597b4d3..480ad901ddf84 100644
--- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
+++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
@@ -48,7 +48,7 @@ products.
 
 == Version
 
-Revision: 4
+Revision: 5
 
 == Introduction
 
@@ -318,9 +318,11 @@ int main (int argc, char *argv[]) {
 
 == New bfloat16 math functions
 
-Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such application developers can use these functions together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension.
+Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such can used together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension.
+
 The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions.
-The following functions are only available only when "T" is `bfloat16` `or `sycl::marray<bfloat16, {N}>`, where {N} means any positive value of size_t type.
+
+The following functions are only available when `T` is `bfloat16` `or `sycl::marray<bfloat16, {N}>`, where {N} means any positive value of size_t type.
 
 === fma
 
@@ -334,12 +336,8 @@ T fma(T a, T b, T c);
 
 ==== Description
 
-Returns the correctly rounded
-floating-point representation of the
-sum of `c` with the infinitely precise
-product of `a` and `b`. Rounding of
-intermediate products shall not
-occur.
+Returns the correctly rounded floating-point representation of the sum of `c` with the infinitely precise product of `a` and `b`.
+Rounding of intermediate products shall not occur. The mantissa LSB rounds to the nearest even. Subnormal numbers are supported.
 
 === fmax
 
@@ -392,9 +390,9 @@ Compute absolute value of a `bfloat16`.
 
 == Issues
 
-2. The CUDA backend has no use case that would necessitate support of the `vec` class in bfloat16 math builtins, since `marray` is always preferred. Support for the `vec` class can be added if other backends require it.
+1. The CUDA backend does not have a use case that would necessitate support of the `vec` class in bfloat16 math builtins, and `marray` would always be preferred over `vec` if there were to be added `vec` support in the CUDA backend. For portability reasons, support for the `vec` class can be easily added if other backends require it.
 
-3. Decide on a roadmap to extend support of `bfloat16` to more SYCL 2020 math functions.
+2. We should decide on a roadmap to extend support of `bfloat16` to other SYCL 2020 math functions.
 
 == Revision History
 

From 0ae399aecc9c08b5c0601492e87ece9f410c6ae3 Mon Sep 17 00:00:00 2001
From: JackAKirk <chezjakirk@gmail.com>
Date: Fri, 15 Apr 2022 13:02:46 +0100
Subject: [PATCH 7/7] format

---
 .../experimental/sycl_ext_oneapi_bfloat16.asciidoc          | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
index 480ad901ddf84..bec08876ed084 100644
--- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
+++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
@@ -318,11 +318,11 @@ int main (int argc, char *argv[]) {
 
 == New bfloat16 math functions
 
-Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, and as such can used together with the `bfloat16` implementation of the sycl_ext_oneapi_matrix extension.
+Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the sycl_ext_oneapi_matrix extension.
 
 The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions.
 
-The following functions are only available when `T` is `bfloat16` `or `sycl::marray<bfloat16, {N}>`, where {N} means any positive value of size_t type.
+The following functions are only available when `T` is `bfloat16` or `sycl::marray<bfloat16, {N}>`, where `{N}` means any positive value of `size_t` type.
 
 === fma
 
@@ -390,7 +390,7 @@ Compute absolute value of a `bfloat16`.
 
 == Issues
 
-1. The CUDA backend does not have a use case that would necessitate support of the `vec` class in bfloat16 math builtins, and `marray` would always be preferred over `vec` if there were to be added `vec` support in the CUDA backend. For portability reasons, support for the `vec` class can be easily added if other backends require it.
+1. The CUDA backend does not have a use case that would necessitate support of the `vec` class in bfloat16 math functions, and `marray` would always be preferred over `vec` if `vec` support were to be added in the CUDA backend. For portability reasons, support for the `vec` class can be easily added if other backends require it.
 
 2. We should decide on a roadmap to extend support of `bfloat16` to other SYCL 2020 math functions.