From 103144c0b2d58d6a570efe7d94cf48ecbc53f965 Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Tue, 25 Apr 2023 16:26:25 +0800 Subject: [PATCH 1/6] Create 0000-additional-float-types.md --- text/0000-additional-float-types.md | 139 ++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 text/0000-additional-float-types.md diff --git a/text/0000-additional-float-types.md b/text/0000-additional-float-types.md new file mode 100644 index 00000000000..1de845517c1 --- /dev/null +++ b/text/0000-additional-float-types.md @@ -0,0 +1,139 @@ +- Feature Name: `additional-float-types` +- Start Date: 2023-6-28 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +This RFC proposes new floating point types `f16` and `f128` into core language and standard library. Also this RFC introduces `f80`, `doubledouble`, `bf16` into `core::arch` for inter-op with existing native code. + +# Motivation +[motivation]: #motivation + +IEEE-754 standard defines binary floating point formats, including binary16, binary32, binary64 and binary128. The binary32 and binary64 correspond to `f32` and `f64` types in Rust, while binary16 and binary128 are used in multiple scenarios (machine learning, scientific computing, etc.) and accepted by some modern architectures (by software or hardware). + +In C/C++ world, there're already types representing these formats, along with more legacy non-standard types specific to some platform. Introduce them in a limited way would help improve FFI against such code. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +`f16` and `f128` are primitive floating types, they can be used just like `f32` or `f64`. They always conform to binary16 and binary128 format defined in IEEE-754, which means size of `f16` is always 16-bit, and size of `f128` is always 128-bit. + +```rust +let val1 = 1.0; // Default type is still f64 +let val2: f128 = 1.0; +let val3: f16 = 1.0; +let val4 = 1.0f128; // Suffix of f128 literal +let val5 = 1.0f16; // Suffix of f16 literal + +println!("Size of f128 in bytes: {}", std::mem::size_of_val(&val2)); // 16 +println!("Size of f16 in bytes: {}", std::mem::size_of_val(&val3)); // 2 +``` + +Because not every target supports `f16` and `f128`, compiler provides conditional guards for them: + +```rust +#[cfg(target_has_f128)] +fn get_f128() -> f128 { 1.0f128 } + +#[cfg(target_has_f16)] +fn get_f16() -> f16 { 1.0f16 } +``` + +All operators, constants and math functions defined for `f32` and `f64` in core, are also defined for `f16` and `f128`, and guarded by respective conditional guards. + +`f80` type is defined in `core::arch::{x86, x86_64}`. `doubledouble` type is defined in `core::arch::{powerpc, powerpc64}`. `bf16` type is defined in `core::arch::{arm, aarch64, x86, x86_64}`. They do not have literal representation. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## `f16` type + +`f16` consists of 1 bit of sign, 5 bits of exponent, 10 bits of mantissa. + +The following `From` and `TryFrom` traits are implemented for conversion between `f16` and other types: + +```rust +impl From for f32 { /* ... */ } +impl From for f64 { /* ... */ } +impl From for f16 { /* ... */ } +impl From for f16 { /* ... */ } +impl From for f16 { /* ... */ } +``` + +`f16` will generate `half` type in LLVM IR. + +## `f128` type + +`f128` consists of 1 bit of sign, 15 bits of exponent, 112 bits of mantissa. + +`f128` is available for on targets having (1) hardware instructions or software emulation for 128-bit float type; (2) backend support for `f128` type on the target; (3) essential target features enabled (if any). + +The list of targets supporting `f128` type may change over time. Initially, it includes `powerpc64le-*`. + +The following traits are also implemented for conversion between `f128` and other types: + +```rust +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +impl From for f128 { /* ... */ } +``` + +`f128` will generate `fp128` type in LLVM IR. + + +`std::simd` defines new vector types with `f16` or `f128` element: `f16x2` `f16x4` `f16x8` `f16x16` `f16x32` `f128x2` `f128x4`. + +For `doubledouble` type, conversion intrinsics are available under `core::arch::{powerpc, powerpc64}`. For `f80` type, conversion intrinsics are available under `core::arch::{x86, x86_64}`. + +## Architectures specific types + +As for non-standard types, `f80` generates `x86_fp80`, `doubledouble` generates `ppc_fp128`, `bf16` generates `bfloat`. + +# Drawbacks +[drawbacks]: #drawbacks + +Unlike f32 and f64, although there are platform independent implementation of supplementary intrinsics on these types, not every target support the two types natively, with regards to the ABI. Adding them will be a challenge for handling different cases. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +There are some crates aiming for similar functionality: + +- [f128](https://github.com/jkarns275/f128) provides binding to `__float128` type in GCC. +- [half](https://github.com/starkat99/half-rs) provides implementation of binary16 and bfloat16 types. + +However, besides the disadvantage of usage inconsistency between primitive type and type from crate, there are still issues around those bindings. + +The availablity of additional float types depends on CPU/OS/ABI/features of different targets heavily. Evolution of LLVM may also unlock possibility of the types on new targets. Implementing them in compiler handles the stuff at the best location. + +Most of such crates defines their type on top of C binding. But extended float type definition in C is complex and confusing. The meaning of `long double`, `_Float128` varies by targets or compiler options. Implementing in Rust compiler helps to maintain a stable codegen interface. + +And since third party tools also relies on Rust internal code, implementing additional float types in compiler also help the tools to recognize them. + +# Prior art +[prior-art]: #prior-art + +We have a previous proposal on `f16b` type to represent `bfloat16`: https://github.com/joshtriplett/rfcs/blob/f16b/text/0000-f16b.md + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +This proposal does not introduce `c_longdouble` type for FFI, because it means one of `f128`, `doubledouble`, `f64` or `f80` on different cases. Also for `c_float128`. + +# Future possibilities +[future-possibilities]: #future-possibilities + +More functions will be added to those platform dependent float types, like casting between `f128` and `doubledouble`. + +For targets not supporting `f16` or `f128`, we may be able to introduce a 'limited mode', where the types are not fully functional, but user can load, store and call functions with such arguments. From 49539190ae1122803aca53339a52d77c7bd8061e Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Thu, 29 Jun 2023 01:46:03 +0800 Subject: [PATCH 2/6] Give RFC number --- ...additional-float-types.md => 3451-additional-float-types.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-additional-float-types.md => 3451-additional-float-types.md} (98%) diff --git a/text/0000-additional-float-types.md b/text/3451-additional-float-types.md similarity index 98% rename from text/0000-additional-float-types.md rename to text/3451-additional-float-types.md index 1de845517c1..68808d9a234 100644 --- a/text/0000-additional-float-types.md +++ b/text/3451-additional-float-types.md @@ -1,6 +1,6 @@ - Feature Name: `additional-float-types` - Start Date: 2023-6-28 -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3451](https://github.com/rust-lang/rfcs/pull/3451) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From 676f1611671e3e43194b4b02fdf23a300a0a2e8c Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Tue, 4 Jul 2023 01:09:35 +0800 Subject: [PATCH 3/6] Format and update description - Rename doubledouble to f64f64 - Add architecture info about x86 and arm (as suggested) - Add description to target related types (as suggested) - Add link to IEEE-754 and LLVM LangRef --- text/3451-additional-float-types.md | 45 ++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/text/3451-additional-float-types.md b/text/3451-additional-float-types.md index 68808d9a234..cda8810bb0b 100644 --- a/text/3451-additional-float-types.md +++ b/text/3451-additional-float-types.md @@ -6,19 +6,19 @@ # Summary [summary]: #summary -This RFC proposes new floating point types `f16` and `f128` into core language and standard library. Also this RFC introduces `f80`, `doubledouble`, `bf16` into `core::arch` for inter-op with existing native code. +This RFC proposes new floating point types `f16` and `f128` into core language and standard library. Also, this RFC introduces `f80`, `f64f64`, and `bf16` into `core::arch` for target-specific support, and `core::ffi::c_longdouble` for FFI interop. # Motivation [motivation]: #motivation -IEEE-754 standard defines binary floating point formats, including binary16, binary32, binary64 and binary128. The binary32 and binary64 correspond to `f32` and `f64` types in Rust, while binary16 and binary128 are used in multiple scenarios (machine learning, scientific computing, etc.) and accepted by some modern architectures (by software or hardware). +[IEEE-754] standard defines binary floating point formats, including `binary16`, `binary32`, `binary64` and `binary128`. `binary32` and `binary64` correspond to `f32` and `f64` types in Rust, but there is currently no representation for `binary16` or `binary128`; these have uses in multiple scenarios (machine learning, scientific computing, etc.) and accepted by some modern architectures (by software or hardware), so this RFC proposes to add representations for them to the language. -In C/C++ world, there're already types representing these formats, along with more legacy non-standard types specific to some platform. Introduce them in a limited way would help improve FFI against such code. +In C/C++ world, there are already types representing these formats, along with more legacy non-standard types specific to some platform. Introduce them in a limited way would help improve FFI against such code. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -`f16` and `f128` are primitive floating types, they can be used just like `f32` or `f64`. They always conform to binary16 and binary128 format defined in IEEE-754, which means size of `f16` is always 16-bit, and size of `f128` is always 128-bit. +`f16` and `f128` are primitive floating types, they can be used just like `f32` or `f64`. They always conform to the `binary16` and `binary128` formats defined in [IEEE-754], which means size of `f16` is always 16-bit, and size of `f128` is always 128-bit. ```rust let val1 = 1.0; // Default type is still f64 @@ -31,7 +31,9 @@ println!("Size of f128 in bytes: {}", std::mem::size_of_val(&val2)); // 16 println!("Size of f16 in bytes: {}", std::mem::size_of_val(&val3)); // 2 ``` -Because not every target supports `f16` and `f128`, compiler provides conditional guards for them: +`f16` and `f128` will only be available on hardware that supports or natively emulates these type via LLVM's `half` and `fp128`, as mentioned in the [LLVM reference for floating types]. This means that the semantics of `f16` and `f128` are fixed as IEEE compliant in every supported platform, different from `long double` in C. + +Because not every target supports `f16` and `f128`, compiler provides conditional guards. ```rust #[cfg(target_has_f128)] @@ -43,14 +45,16 @@ fn get_f16() -> f16 { 1.0f16 } All operators, constants and math functions defined for `f32` and `f64` in core, are also defined for `f16` and `f128`, and guarded by respective conditional guards. -`f80` type is defined in `core::arch::{x86, x86_64}`. `doubledouble` type is defined in `core::arch::{powerpc, powerpc64}`. `bf16` type is defined in `core::arch::{arm, aarch64, x86, x86_64}`. They do not have literal representation. +- The `f80` type is defined in `core::arch::{x86, x86_64}` as 80-bit extended precision. +- The `f64f64` type is defined in `core::arch::{powerpc, powerpc64}` and represent's PowerPC's non-IEEE double-double format (two `f64`s used to aproximate `f128`). +- `bf16` type is defined in `core::arch::{arm, aarch64, x86, x86_64}` and represents the "brain" float, a truncated `f32` with SIMD support on some hardware. These types do not have literal representation. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation ## `f16` type -`f16` consists of 1 bit of sign, 5 bits of exponent, 10 bits of mantissa. +`f16` consists of 1 bit of sign, 5 bits of exponent, 10 bits of mantissa. It is always in accordance with [IEEE-754]. The following `From` and `TryFrom` traits are implemented for conversion between `f16` and other types: @@ -70,7 +74,7 @@ impl From for f16 { /* ... */ } `f128` is available for on targets having (1) hardware instructions or software emulation for 128-bit float type; (2) backend support for `f128` type on the target; (3) essential target features enabled (if any). -The list of targets supporting `f128` type may change over time. Initially, it includes `powerpc64le-*`. +The list of targets supporting `f128` type may change over time. Initially, it includes `powerpc64le-*`, `x86_64-*` and `aarch64-*` The following traits are also implemented for conversion between `f128` and other types: @@ -91,14 +95,24 @@ impl From for f128 { /* ... */ } `f128` will generate `fp128` type in LLVM IR. +For `f64f64` type, conversion intrinsics are available under `core::arch::{powerpc, powerpc64}`. For `f80` type, conversion intrinsics are available under `core::arch::{x86, x86_64}`. -`std::simd` defines new vector types with `f16` or `f128` element: `f16x2` `f16x4` `f16x8` `f16x16` `f16x32` `f128x2` `f128x4`. +## Architectures specific types -For `doubledouble` type, conversion intrinsics are available under `core::arch::{powerpc, powerpc64}`. For `f80` type, conversion intrinsics are available under `core::arch::{x86, x86_64}`. +- `core::arch::{x86, x86_64}::f80` generates LLVM's `x86_fp80`, 80-bit extended precision +- `core::arch::{powerpc, powerpc64}::f64f64` generates LLVM's `ppc_fp128`, a `f128` emulated type via dual `f64`s +- `core::arch::{arm, aarch64, x86, x86_64}::bf16` generates LLVM's `bfloat`, 16-bit "brain" floats used in AVX and ARMv8.6-A -## Architectures specific types +Where possible, `From` will be implemented to convert `f80` and `f64f64` to `f128`. -As for non-standard types, `f80` generates `x86_fp80`, `doubledouble` generates `ppc_fp128`, `bf16` generates `bfloat`. +## FFI types + +`core::ffi::c_longdouble` will always represent whatever `long double` does in C. Rust will defer to the compiler backend (LLVM) for what exactly this represents, but it will approximately be: + +- 80-bit extended precision (f80) on `x86` and `x86_64`: +- `f64` double precision with MSVC +- `f128` quadruple precision on AArch64 +- `f64f64` on PowerPC # Drawbacks [drawbacks]: #drawbacks @@ -129,11 +143,14 @@ We have a previous proposal on `f16b` type to represent `bfloat16`: https://gith # Unresolved questions [unresolved-questions]: #unresolved-questions -This proposal does not introduce `c_longdouble` type for FFI, because it means one of `f128`, `doubledouble`, `f64` or `f80` on different cases. Also for `c_float128`. +This proposal does not introduce `c_longdouble` type for FFI, because it means one of `f128`, `f64f64`, `f64` or `f80` on different cases. Also for `c_float128`. # Future possibilities [future-possibilities]: #future-possibilities -More functions will be added to those platform dependent float types, like casting between `f128` and `doubledouble`. +More functions will be added to those platform dependent float types, like casting between `f128` and `f64f64`. For targets not supporting `f16` or `f128`, we may be able to introduce a 'limited mode', where the types are not fully functional, but user can load, store and call functions with such arguments. + +[LLVM reference for floating types]: https://llvm.org/docs/LangRef.html#floating-point-types +[IEEE-754]: https://en.wikipedia.org/wiki/IEEE_754 From 3e9bb4925ba776902516a989d9d03a5b7a116156 Mon Sep 17 00:00:00 2001 From: Aaron Franke Date: Mon, 7 Aug 2023 10:12:40 -0500 Subject: [PATCH 4/6] Rename additional float formats file to 3453-f16-and-f128 Co-authored-by: Qiu Chaofan --- text/{3451-additional-float-types.md => 3453-f16-and-f128.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{3451-additional-float-types.md => 3453-f16-and-f128.md} (100%) diff --git a/text/3451-additional-float-types.md b/text/3453-f16-and-f128.md similarity index 100% rename from text/3451-additional-float-types.md rename to text/3453-f16-and-f128.md From 757e62cebb408d2bf408e4a62b07ecc57b13f633 Mon Sep 17 00:00:00 2001 From: Aaron Franke Date: Mon, 7 Aug 2023 10:13:44 -0500 Subject: [PATCH 5/6] Add `f16` and `f128` float types Co-authored-by: Qiu Chaofan Co-authored-by: Trevor Gross Co-authored-by: Jacob Lifshay Co-authored-by: Clar Charr --- text/3453-f16-and-f128.md | 112 +++++++++++++++----------------------- 1 file changed, 45 insertions(+), 67 deletions(-) diff --git a/text/3453-f16-and-f128.md b/text/3453-f16-and-f128.md index cda8810bb0b..229f9f0d5f9 100644 --- a/text/3453-f16-and-f128.md +++ b/text/3453-f16-and-f128.md @@ -1,29 +1,31 @@ -- Feature Name: `additional-float-types` -- Start Date: 2023-6-28 -- RFC PR: [rust-lang/rfcs#3451](https://github.com/rust-lang/rfcs/pull/3451) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) +- Feature Name: `f16_and_f128` +- Start Date: 2023-07-02 +- RFC PR: [rust-lang/rfcs#3453](https://github.com/rust-lang/rfcs/pull/3453) +- Rust Issue: [rust-lang/rfcs#2629](https://github.com/rust-lang/rfcs/issues/2629) # Summary [summary]: #summary -This RFC proposes new floating point types `f16` and `f128` into core language and standard library. Also, this RFC introduces `f80`, `f64f64`, and `bf16` into `core::arch` for target-specific support, and `core::ffi::c_longdouble` for FFI interop. +This RFC proposes adding new IEEE-compliant floating point types `f16` and `f128` into the core language and standard library. We will provide a soft float implementation for all targets, and use hardware support where possible. # Motivation [motivation]: #motivation -[IEEE-754] standard defines binary floating point formats, including `binary16`, `binary32`, `binary64` and `binary128`. `binary32` and `binary64` correspond to `f32` and `f64` types in Rust, but there is currently no representation for `binary16` or `binary128`; these have uses in multiple scenarios (machine learning, scientific computing, etc.) and accepted by some modern architectures (by software or hardware), so this RFC proposes to add representations for them to the language. +The IEEE 754 standard defines many binary floating point formats. The most common of these types are the binary32 and binary64 formats, available in Rust as `f32` and `f64`. However, other formats are useful in various uncommon scenarios. The binary16 format is useful for situations where storage compactness is important and low precision is acceptable, such as HDR images, mesh quantization, and AI neural networks.[^1] The binary128 format is useful for situations where high precision is needed, such as scientific computing contexts. -In C/C++ world, there are already types representing these formats, along with more legacy non-standard types specific to some platform. Introduce them in a limited way would help improve FFI against such code. +The proposal is to add `f16` and `f128` types in Rust to represent IEEE 754 binary16 and binary128 respectively. Having `f16` and `f128` types in the Rust language would make Rust an optimal environment for more advanced use cases. Unlike third-party crates, this enables the compiler to perform optimizations for hardware with native support for these types, allows defining literals for these types, and would provide one single canonical data type for these floats, making it easier to exchange data between libraries. + +This RFC does not have the goal of covering the entire IEEE 754 standard, since it does not include `f256` and the decimal-float types. This RFC also does not have the goal of adding existing platform-specific float types such as x86's 80-bit double-extended-precision. This RFC does not make a judgement of whether those types should be added in the future, such discussion can be left to a future RFC, but it is not the goal of this RFC. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -`f16` and `f128` are primitive floating types, they can be used just like `f32` or `f64`. They always conform to the `binary16` and `binary128` formats defined in [IEEE-754], which means size of `f16` is always 16-bit, and size of `f128` is always 128-bit. +`f16` and `f128` are primitive floating types, they can be used like `f32` or `f64`. They always conform to binary16 and binary128 formats defined in the IEEE 754 standard, which means the size of `f16` is always 16-bit, the size of `f128` is always 128-bit, the amount of exponent and mantissa bits follows the standard, and all operations are IEEE 754-compliant. ```rust let val1 = 1.0; // Default type is still f64 -let val2: f128 = 1.0; -let val3: f16 = 1.0; +let val2: f128 = 1.0; // Explicit f128 type +let val3: f16 = 1.0; // Explicit f16 type let val4 = 1.0f128; // Suffix of f128 literal let val5 = 1.0f16; // Suffix of f16 literal @@ -31,32 +33,18 @@ println!("Size of f128 in bytes: {}", std::mem::size_of_val(&val2)); // 16 println!("Size of f16 in bytes: {}", std::mem::size_of_val(&val3)); // 2 ``` -`f16` and `f128` will only be available on hardware that supports or natively emulates these type via LLVM's `half` and `fp128`, as mentioned in the [LLVM reference for floating types]. This means that the semantics of `f16` and `f128` are fixed as IEEE compliant in every supported platform, different from `long double` in C. - -Because not every target supports `f16` and `f128`, compiler provides conditional guards. - -```rust -#[cfg(target_has_f128)] -fn get_f128() -> f128 { 1.0f128 } - -#[cfg(target_has_f16)] -fn get_f16() -> f16 { 1.0f16 } -``` +Every target should support `f16` and `f128`, either in hardware or software. Most platforms do not have hardware support and therefore will need to use a software implementation. -All operators, constants and math functions defined for `f32` and `f64` in core, are also defined for `f16` and `f128`, and guarded by respective conditional guards. - -- The `f80` type is defined in `core::arch::{x86, x86_64}` as 80-bit extended precision. -- The `f64f64` type is defined in `core::arch::{powerpc, powerpc64}` and represent's PowerPC's non-IEEE double-double format (two `f64`s used to aproximate `f128`). -- `bf16` type is defined in `core::arch::{arm, aarch64, x86, x86_64}` and represents the "brain" float, a truncated `f32` with SIMD support on some hardware. These types do not have literal representation. +All [operators](https://doc.rust-lang.org/stable/std/primitive.f64.html#trait-implementations), [constants](https://doc.rust-lang.org/stable/std/f64/consts/), and [math functions](https://doc.rust-lang.org/stable/std/primitive.f64.html#implementations) defined for `f32` and `f64` in `core`, must also be defined for `f16` and `f128` in `core`. Similarly, all functionality defined for `f32` and `f64` in `std` must also be defined for `f16` and `f128`. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation ## `f16` type -`f16` consists of 1 bit of sign, 5 bits of exponent, 10 bits of mantissa. It is always in accordance with [IEEE-754]. +`f16` consists of 1 bit of sign, 5 bits of exponent, 10 bits of mantissa. It is exactly equivalent to the 16-bit IEEE 754 binary16 [half-precision floating-point format](https://en.wikipedia.org/wiki/Half-precision_floating-point_format). -The following `From` and `TryFrom` traits are implemented for conversion between `f16` and other types: +The following traits will be implemented for conversion between `f16` and other types: ```rust impl From for f32 { /* ... */ } @@ -66,17 +54,17 @@ impl From for f16 { /* ... */ } impl From for f16 { /* ... */ } ``` -`f16` will generate `half` type in LLVM IR. +Conversions to `f16` will also be available with `as` casts, which allow for truncated conversions. -## `f128` type +`f16` will generate the `half` type in LLVM IR. It is also equivalent to C++ `std::float16_t`, C `_Float16`, and GCC `__fp16`. `f16` is ABI-compatible with all of these. `f16` values must be aligned in memory on a multiple of 16 bits, or 2 bytes. -`f128` consists of 1 bit of sign, 15 bits of exponent, 112 bits of mantissa. +On the hardware level, `f16` can be accelerated on RISC-V via [the Zfh or Zfhmin extensions](https://five-embeddev.com/riscv-isa-manual/latest/zfh.html), on x86 with AVX-512 via [the FP16 instruction set](https://en.wikipedia.org/wiki/AVX-512#FP16), on [some Arm platforms](https://developer.arm.com/documentation/100067/0607/Other-Compiler-specific-Features/Half-precision-floating-point-number-format), and on PowerISA via [VSX on PowerISA v3.1B and later](https://files.openpower.foundation/s/dAYSdGzTfW4j2r2). Most platforms do not have hardware support and therefore will need to use a software implementation. -`f128` is available for on targets having (1) hardware instructions or software emulation for 128-bit float type; (2) backend support for `f128` type on the target; (3) essential target features enabled (if any). +## `f128` type -The list of targets supporting `f128` type may change over time. Initially, it includes `powerpc64le-*`, `x86_64-*` and `aarch64-*` +`f128` consists of 1 bit of sign, 15 bits of exponent, 112 bits of mantissa. It is exactly equivalent to the 128-bit IEEE 754 binary128 [quadruple-precision floating-point format](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format). -The following traits are also implemented for conversion between `f128` and other types: +The following traits will be implemented for conversion between `f128` and other types: ```rust impl From for f128 { /* ... */ } @@ -93,64 +81,54 @@ impl From for f128 { /* ... */ } impl From for f128 { /* ... */ } ``` -`f128` will generate `fp128` type in LLVM IR. - -For `f64f64` type, conversion intrinsics are available under `core::arch::{powerpc, powerpc64}`. For `f80` type, conversion intrinsics are available under `core::arch::{x86, x86_64}`. - -## Architectures specific types +Conversions from `i128`/`u128` to `f128` will also be available with `as` casts, which allow for truncated conversions. -- `core::arch::{x86, x86_64}::f80` generates LLVM's `x86_fp80`, 80-bit extended precision -- `core::arch::{powerpc, powerpc64}::f64f64` generates LLVM's `ppc_fp128`, a `f128` emulated type via dual `f64`s -- `core::arch::{arm, aarch64, x86, x86_64}::bf16` generates LLVM's `bfloat`, 16-bit "brain" floats used in AVX and ARMv8.6-A +`f128` will generate the `fp128` type in LLVM IR. It is also equivalent to C++ `std::float128_t`, C `_Float128`, and GCC `__float128`. `f128` is ABI-compatible with all of these. `f128` values must be aligned in memory on a multiple of 128 bits, or 16 bytes. -Where possible, `From` will be implemented to convert `f80` and `f64f64` to `f128`. - -## FFI types - -`core::ffi::c_longdouble` will always represent whatever `long double` does in C. Rust will defer to the compiler backend (LLVM) for what exactly this represents, but it will approximately be: - -- 80-bit extended precision (f80) on `x86` and `x86_64`: -- `f64` double precision with MSVC -- `f128` quadruple precision on AArch64 -- `f64f64` on PowerPC +On the hardware level, `f128` can be accelerated on RISC-V via [the Q extension](https://five-embeddev.com/riscv-isa-manual/latest/q.html), on IBM [S/390x G5 and later](https://doi.org/10.1147%2Frd.435.0707), and on PowerISA via [BFP128, an optional part of PowerISA v3.0C and later](https://files.openpower.foundation/s/XXFoRATEzSFtdG8). Most platforms do not have hardware support and therefore will need to use a software implementation. # Drawbacks [drawbacks]: #drawbacks -Unlike f32 and f64, although there are platform independent implementation of supplementary intrinsics on these types, not every target support the two types natively, with regards to the ABI. Adding them will be a challenge for handling different cases. +While `f32` and `f64` have very broad support in most hardware, hardware support for `f16` and `f128` is more niche. On most systems software emulation will be required. Therefore, the main drawback is implementation difficulty. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives There are some crates aiming for similar functionality: -- [f128](https://github.com/jkarns275/f128) provides binding to `__float128` type in GCC. -- [half](https://github.com/starkat99/half-rs) provides implementation of binary16 and bfloat16 types. +- [f128](https://github.com/jkarns275/f128) provides binding to the `__float128` type in GCC. +- [half](https://crates.io/crates/half) provides an implementation of binary16 and bfloat16 types. -However, besides the disadvantage of usage inconsistency between primitive type and type from crate, there are still issues around those bindings. +However, besides the disadvantage of usage inconsistency between primitive types and types from a crate, there are still issues around those bindings. -The availablity of additional float types depends on CPU/OS/ABI/features of different targets heavily. Evolution of LLVM may also unlock possibility of the types on new targets. Implementing them in compiler handles the stuff at the best location. +The ability to accelerate additional float types heavily depends on CPU/OS/ABI/features of different targets heavily. Evolution of LLVM may unlock possibilities of accelerating the types on new targets. Implementing them in the compiler allows the compiler to perform optimizations for hardware with native support for these types. -Most of such crates defines their type on top of C binding. But extended float type definition in C is complex and confusing. The meaning of `long double`, `_Float128` varies by targets or compiler options. Implementing in Rust compiler helps to maintain a stable codegen interface. - -And since third party tools also relies on Rust internal code, implementing additional float types in compiler also help the tools to recognize them. +Crates may define their type on top of a C binding, but extended float type definition in C is complex and confusing. The meaning of C types may vary by target and/or compiler options. Implementing `f16` and `f128` in the Rust compiler helps to maintain a stable codegen interface and ensures that all users have one single canonical definition of 16-bit and 128-bit float types, making it easier to exchange data between crates and languages. # Prior art [prior-art]: #prior-art -We have a previous proposal on `f16b` type to represent `bfloat16`: https://github.com/joshtriplett/rfcs/blob/f16b/text/0000-f16b.md +As noted above, there are crates that provide these types, one for `f16` and one for `f128`. Another prior art to reference is [RFC 1504 for int128](https://rust-lang.github.io/rfcs/1504-int128.html). + +Many other languages and compilers have support for these proposed float types. As mentioned above, C has `_Float16` and `_Float128`, and C++ has `std::float16_t` and `std::float128_t`. Glibc supports 128-bit floats in software on [many architectures](https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;hb=81325b12b14c44887f1633a2c180a413afc2b504#l143). + +This RFC is designed as a subset of [RFC 3451](https://github.com/rust-lang/rfcs/pull/3451), which proposes adding a variety of float types, including ones not in this RFC designed for interoperability with other languages. + +Both this RFC and RFC 3451 are built upon the discussion in [issue 2629](https://github.com/rust-lang/rfcs/issues/2629). + +The main consensus of the discussion thus far is that more float types would be useful, especially the IEEE 754 types proposed in this RFC as `f16` and `f128`. Other types can be discussed in a future RFC. # Unresolved questions [unresolved-questions]: #unresolved-questions -This proposal does not introduce `c_longdouble` type for FFI, because it means one of `f128`, `f64f64`, `f64` or `f80` on different cases. Also for `c_float128`. +The main unresolved parts of this RFC are the implementation details in the context of the Rust compiler and standard library. The behavior of `f16` and `f128` is well-defined by the IEEE 754 standard, and is not up for debate. Whether these types should be included in the language is the main question of this RFC, which will be resolved when this RFC is accepted. + +Several future questions are intentionally left unresolved, and should be handled by another RFC. This RFC does not have the goal of covering the entire IEEE 754 standard, since it does not include `f256` and the decimal-float types. This RFC also does not have the goal of adding existing platform-specific float types such as x86's 80-bit double-extended-precision. # Future possibilities [future-possibilities]: #future-possibilities -More functions will be added to those platform dependent float types, like casting between `f128` and `f64f64`. - -For targets not supporting `f16` or `f128`, we may be able to introduce a 'limited mode', where the types are not fully functional, but user can load, store and call functions with such arguments. +See [RFC 3451](https://github.com/rust-lang/rfcs/pull/3451) for discussion about adding more float types. RFC 3451 is mostly a superset of this RFC. -[LLVM reference for floating types]: https://llvm.org/docs/LangRef.html#floating-point-types -[IEEE-754]: https://en.wikipedia.org/wiki/IEEE_754 +[^1]: Existing AI neural networks often use the [16-bit brain float format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) instead of 16-bit half precision, which is a truncated version of 32-bit single precision. This is done to allow performing operations with 32-bit floats and quickly convert to 16-bit for storage. From 5ef7fac8ecd947b4389cb181e361a93126b47dd5 Mon Sep 17 00:00:00 2001 From: Aaron Franke Date: Sat, 7 Oct 2023 14:58:04 -0500 Subject: [PATCH 6/6] Minor tweaks from tgross35 Co-authored-by: Trevor Gross --- text/3453-f16-and-f128.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/text/3453-f16-and-f128.md b/text/3453-f16-and-f128.md index 229f9f0d5f9..321f85c75db 100644 --- a/text/3453-f16-and-f128.md +++ b/text/3453-f16-and-f128.md @@ -13,14 +13,14 @@ This RFC proposes adding new IEEE-compliant floating point types `f16` and `f128 The IEEE 754 standard defines many binary floating point formats. The most common of these types are the binary32 and binary64 formats, available in Rust as `f32` and `f64`. However, other formats are useful in various uncommon scenarios. The binary16 format is useful for situations where storage compactness is important and low precision is acceptable, such as HDR images, mesh quantization, and AI neural networks.[^1] The binary128 format is useful for situations where high precision is needed, such as scientific computing contexts. -The proposal is to add `f16` and `f128` types in Rust to represent IEEE 754 binary16 and binary128 respectively. Having `f16` and `f128` types in the Rust language would make Rust an optimal environment for more advanced use cases. Unlike third-party crates, this enables the compiler to perform optimizations for hardware with native support for these types, allows defining literals for these types, and would provide one single canonical data type for these floats, making it easier to exchange data between libraries. +This RFC proposes adding `f16` and `f128` primitive types in Rust to represent IEEE 754 binary16 and binary128, respectively. Having `f16` and `f128` types in the Rust language would allow Rust to better support the above mentioned use cases, allowing for optimizations and native support that may not be possible in a third party crate. Additionally, providing a single canonical data type for these floating point representations will make it easier to exchange data between libraries. This RFC does not have the goal of covering the entire IEEE 754 standard, since it does not include `f256` and the decimal-float types. This RFC also does not have the goal of adding existing platform-specific float types such as x86's 80-bit double-extended-precision. This RFC does not make a judgement of whether those types should be added in the future, such discussion can be left to a future RFC, but it is not the goal of this RFC. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -`f16` and `f128` are primitive floating types, they can be used like `f32` or `f64`. They always conform to binary16 and binary128 formats defined in the IEEE 754 standard, which means the size of `f16` is always 16-bit, the size of `f128` is always 128-bit, the amount of exponent and mantissa bits follows the standard, and all operations are IEEE 754-compliant. +`f16` and `f128` are primitive floating types, they can be used like `f32` or `f64`. They always conform to binary16 and binary128 formats defined in the IEEE 754 standard, which means the size of `f16` is always 16-bit, the size of `f128` is always 128-bit, the amount of exponent and mantissa bits follows the standard, and all operations are IEEE 754-compliant. Float literals of these sizes have `f16` and `f128` suffixes respectively. ```rust let val1 = 1.0; // Default type is still f64 @@ -83,7 +83,7 @@ impl From for f128 { /* ... */ } Conversions from `i128`/`u128` to `f128` will also be available with `as` casts, which allow for truncated conversions. -`f128` will generate the `fp128` type in LLVM IR. It is also equivalent to C++ `std::float128_t`, C `_Float128`, and GCC `__float128`. `f128` is ABI-compatible with all of these. `f128` values must be aligned in memory on a multiple of 128 bits, or 16 bytes. +`f128` will generate the `fp128` type in LLVM IR. It is also equivalent to C++ `std::float128_t`, C `_Float128`, and GCC `__float128`. `f128` is ABI-compatible with all of these. `f128` values must be aligned in memory on a multiple of 128 bits, or 16 bytes. LLVM provides support for 128-bit float math operations. On the hardware level, `f128` can be accelerated on RISC-V via [the Q extension](https://five-embeddev.com/riscv-isa-manual/latest/q.html), on IBM [S/390x G5 and later](https://doi.org/10.1147%2Frd.435.0707), and on PowerISA via [BFP128, an optional part of PowerISA v3.0C and later](https://files.openpower.foundation/s/XXFoRATEzSFtdG8). Most platforms do not have hardware support and therefore will need to use a software implementation. @@ -111,9 +111,9 @@ Crates may define their type on top of a C binding, but extended float type defi As noted above, there are crates that provide these types, one for `f16` and one for `f128`. Another prior art to reference is [RFC 1504 for int128](https://rust-lang.github.io/rfcs/1504-int128.html). -Many other languages and compilers have support for these proposed float types. As mentioned above, C has `_Float16` and `_Float128`, and C++ has `std::float16_t` and `std::float128_t`. Glibc supports 128-bit floats in software on [many architectures](https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;hb=81325b12b14c44887f1633a2c180a413afc2b504#l143). +Many other languages and compilers have support for these proposed float types. As mentioned above, C has `_Float16` and `_Float128` ([IEC 60559 WG 14 N2601](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2601.pdf)), and C++ has `std::float16_t` and `std::float128_t` ([P1467R9](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html)). Glibc supports 128-bit floats in software on [many architectures](https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;hb=81325b12b14c44887f1633a2c180a413afc2b504#l143). GCC also provides the `libquadmath` library for 128-bit float math operations. -This RFC is designed as a subset of [RFC 3451](https://github.com/rust-lang/rfcs/pull/3451), which proposes adding a variety of float types, including ones not in this RFC designed for interoperability with other languages. +This RFC was split from [RFC 3451], which proposed adding a variety of float types beyond what is in this RFC including interoperability types like `c_longdouble`. The remaining portions [RFC 3451] has since developed into [RFC 3456]. Both this RFC and RFC 3451 are built upon the discussion in [issue 2629](https://github.com/rust-lang/rfcs/issues/2629). @@ -129,6 +129,9 @@ Several future questions are intentionally left unresolved, and should be handle # Future possibilities [future-possibilities]: #future-possibilities -See [RFC 3451](https://github.com/rust-lang/rfcs/pull/3451) for discussion about adding more float types. RFC 3451 is mostly a superset of this RFC. +See [RFC 3456] for discussion about adding more float types including `f80`, `bf16`, and `c_longdouble`, which is an extension of the discussion in [RFC 3451]. [^1]: Existing AI neural networks often use the [16-bit brain float format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) instead of 16-bit half precision, which is a truncated version of 32-bit single precision. This is done to allow performing operations with 32-bit floats and quickly convert to 16-bit for storage. + +[RFC 3451]: https://github.com/rust-lang/rfcs/pull/3451 +[RFC 3456]: https://github.com/rust-lang/rfcs/pull/3456