-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional float types #3451
Additional float types #3451
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,139 @@ | ||||||||||||||||||
- Feature Name: `additional-float-types` | ||||||||||||||||||
- Start Date: 2023-6-28 | ||||||||||||||||||
- RFC PR: [rust-lang/rfcs#3451](https://github.com/rust-lang/rfcs/pull/3451) | ||||||||||||||||||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||||||||||||||||||
|
||||||||||||||||||
# Summary | ||||||||||||||||||
[summary]: #summary | ||||||||||||||||||
|
||||||||||||||||||
This RFC proposes new floating point types `f16` and `f128` into core language and standard library. Also this RFC introduces `f80`, `doubledouble`, `bf16` into `core::arch` for inter-op with existing native code. | ||||||||||||||||||
|
||||||||||||||||||
# Motivation | ||||||||||||||||||
[motivation]: #motivation | ||||||||||||||||||
|
||||||||||||||||||
IEEE-754 standard defines binary floating point formats, including binary16, binary32, binary64 and binary128. The binary32 and binary64 correspond to `f32` and `f64` types in Rust, while binary16 and binary128 are used in multiple scenarios (machine learning, scientific computing, etc.) and accepted by some modern architectures (by software or hardware). | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
In C/C++ world, there're already types representing these formats, along with more legacy non-standard types specific to some platform. Introduce them in a limited way would help improve FFI against such code. | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
# Guide-level explanation | ||||||||||||||||||
[guide-level-explanation]: #guide-level-explanation | ||||||||||||||||||
|
||||||||||||||||||
`f16` and `f128` are primitive floating types, they can be used just like `f32` or `f64`. They always conform to binary16 and binary128 format defined in IEEE-754, which means size of `f16` is always 16-bit, and size of `f128` is always 128-bit. | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
```rust | ||||||||||||||||||
let val1 = 1.0; // Default type is still f64 | ||||||||||||||||||
let val2: f128 = 1.0; | ||||||||||||||||||
let val3: f16 = 1.0; | ||||||||||||||||||
let val4 = 1.0f128; // Suffix of f128 literal | ||||||||||||||||||
let val5 = 1.0f16; // Suffix of f16 literal | ||||||||||||||||||
|
||||||||||||||||||
println!("Size of f128 in bytes: {}", std::mem::size_of_val(&val2)); // 16 | ||||||||||||||||||
println!("Size of f16 in bytes: {}", std::mem::size_of_val(&val3)); // 2 | ||||||||||||||||||
``` | ||||||||||||||||||
|
||||||||||||||||||
Because not every target supports `f16` and `f128`, compiler provides conditional guards for them: | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
```rust | ||||||||||||||||||
#[cfg(target_has_f128)] | ||||||||||||||||||
fn get_f128() -> f128 { 1.0f128 } | ||||||||||||||||||
|
||||||||||||||||||
#[cfg(target_has_f16)] | ||||||||||||||||||
fn get_f16() -> f16 { 1.0f16 } | ||||||||||||||||||
``` | ||||||||||||||||||
|
||||||||||||||||||
All operators, constants and math functions defined for `f32` and `f64` in core, are also defined for `f16` and `f128`, and guarded by respective conditional guards. | ||||||||||||||||||
|
||||||||||||||||||
`f80` type is defined in `core::arch::{x86, x86_64}`. `doubledouble` type is defined in `core::arch::{powerpc, powerpc64}`. `bf16` type is defined in `core::arch::{arm, aarch64, x86, x86_64}`. They do not have literal representation. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I did not add the mention of
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeap bf16 should be simulated when target arch is not supported There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For people that have never heard of Also the RFC needs to say what their semantics are, if IEEE doesn't specify them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nobody seems to agree on |
||||||||||||||||||
|
||||||||||||||||||
# Reference-level explanation | ||||||||||||||||||
[reference-level-explanation]: #reference-level-explanation | ||||||||||||||||||
|
||||||||||||||||||
## `f16` type | ||||||||||||||||||
|
||||||||||||||||||
`f16` consists of 1 bit of sign, 5 bits of exponent, 10 bits of mantissa. | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
The following `From` and `TryFrom` traits are implemented for conversion between `f16` and other types: | ||||||||||||||||||
|
||||||||||||||||||
```rust | ||||||||||||||||||
impl From<f16> for f32 { /* ... */ } | ||||||||||||||||||
impl From<f16> for f64 { /* ... */ } | ||||||||||||||||||
impl From<bool> for f16 { /* ... */ } | ||||||||||||||||||
impl From<u8> for f16 { /* ... */ } | ||||||||||||||||||
impl From<i8> for f16 { /* ... */ } | ||||||||||||||||||
``` | ||||||||||||||||||
|
||||||||||||||||||
`f16` will generate `half` type in LLVM IR. | ||||||||||||||||||
|
||||||||||||||||||
## `f128` type | ||||||||||||||||||
|
||||||||||||||||||
`f128` consists of 1 bit of sign, 15 bits of exponent, 112 bits of mantissa. | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
`f128` is available for on targets having (1) hardware instructions or software emulation for 128-bit float type; (2) backend support for `f128` type on the target; (3) essential target features enabled (if any). | ||||||||||||||||||
|
||||||||||||||||||
The list of targets supporting `f128` type may change over time. Initially, it includes `powerpc64le-*`. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I don't know for sure what targets support it, but we should aim to at least support the major 64-bit CPUs There is also a risc target per @aaronfranke here #2629 (comment) but I'm not sure how |
||||||||||||||||||
|
||||||||||||||||||
The following traits are also implemented for conversion between `f128` and other types: | ||||||||||||||||||
|
||||||||||||||||||
```rust | ||||||||||||||||||
impl From<f16> for f128 { /* ... */ } | ||||||||||||||||||
impl From<f32> for f128 { /* ... */ } | ||||||||||||||||||
impl From<f64> for f128 { /* ... */ } | ||||||||||||||||||
impl From<bool> for f128 { /* ... */ } | ||||||||||||||||||
impl From<u8> for f128 { /* ... */ } | ||||||||||||||||||
impl From<i8> for f128 { /* ... */ } | ||||||||||||||||||
impl From<u16> for f128 { /* ... */ } | ||||||||||||||||||
impl From<i16> for f128 { /* ... */ } | ||||||||||||||||||
impl From<u32> for f128 { /* ... */ } | ||||||||||||||||||
impl From<i32> for f128 { /* ... */ } | ||||||||||||||||||
impl From<u64> for f128 { /* ... */ } | ||||||||||||||||||
impl From<i64> for f128 { /* ... */ } | ||||||||||||||||||
``` | ||||||||||||||||||
|
||||||||||||||||||
`f128` will generate `fp128` type in LLVM IR. | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
`std::simd` defines new vector types with `f16` or `f128` element: `f16x2` `f16x4` `f16x8` `f16x16` `f16x32` `f128x2` `f128x4`. | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
For `doubledouble` type, conversion intrinsics are available under `core::arch::{powerpc, powerpc64}`. For `f80` type, conversion intrinsics are available under `core::arch::{x86, x86_64}`. | ||||||||||||||||||
|
||||||||||||||||||
## Architectures specific types | ||||||||||||||||||
|
||||||||||||||||||
As for non-standard types, `f80` generates `x86_fp80`, `doubledouble` generates `ppc_fp128`, `bf16` generates `bfloat`. | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
# Drawbacks | ||||||||||||||||||
[drawbacks]: #drawbacks | ||||||||||||||||||
|
||||||||||||||||||
Unlike f32 and f64, although there are platform independent implementation of supplementary intrinsics on these types, not every target support the two types natively, with regards to the ABI. Adding them will be a challenge for handling different cases. | ||||||||||||||||||
Comment on lines
+117
to
+120
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not every platform supports f32 and f64 natively either. For example, RISC-V without the F or D extensions (ex: ISA string of Whatever emulation Rust already does to support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For riscv without hardware float support there is a defined soft-float ABI. There is not for f16/bf16. Same for x86_64. Many other architectures likely don't have a defined soft-float abi for f128 either. And as I understand it AArch64 doesn't have a soft-float abi at all as Neon support is mandatory and floats are even allowed inside the kernel unlike eg x86_64 where floats are disabled in the kernel to avoid having to save them on syscalls. |
||||||||||||||||||
|
||||||||||||||||||
# Rationale and alternatives | ||||||||||||||||||
[rationale-and-alternatives]: #rationale-and-alternatives | ||||||||||||||||||
|
||||||||||||||||||
There are some crates aiming for similar functionality: | ||||||||||||||||||
|
||||||||||||||||||
- [f128](https://github.com/jkarns275/f128) provides binding to `__float128` type in GCC. | ||||||||||||||||||
- [half](https://github.com/starkat99/half-rs) provides implementation of binary16 and bfloat16 types. | ||||||||||||||||||
|
||||||||||||||||||
However, besides the disadvantage of usage inconsistency between primitive type and type from crate, there are still issues around those bindings. | ||||||||||||||||||
|
||||||||||||||||||
The availablity of additional float types depends on CPU/OS/ABI/features of different targets heavily. Evolution of LLVM may also unlock possibility of the types on new targets. Implementing them in compiler handles the stuff at the best location. | ||||||||||||||||||
|
||||||||||||||||||
Most of such crates defines their type on top of C binding. But extended float type definition in C is complex and confusing. The meaning of `long double`, `_Float128` varies by targets or compiler options. Implementing in Rust compiler helps to maintain a stable codegen interface. | ||||||||||||||||||
|
||||||||||||||||||
And since third party tools also relies on Rust internal code, implementing additional float types in compiler also help the tools to recognize them. | ||||||||||||||||||
|
||||||||||||||||||
# Prior art | ||||||||||||||||||
[prior-art]: #prior-art | ||||||||||||||||||
|
||||||||||||||||||
We have a previous proposal on `f16b` type to represent `bfloat16`: https://github.com/joshtriplett/rfcs/blob/f16b/text/0000-f16b.md | ||||||||||||||||||
|
||||||||||||||||||
# Unresolved questions | ||||||||||||||||||
[unresolved-questions]: #unresolved-questions | ||||||||||||||||||
|
||||||||||||||||||
This proposal does not introduce `c_longdouble` type for FFI, because it means one of `f128`, `doubledouble`, `f64` or `f80` on different cases. Also for `c_float128`. | ||||||||||||||||||
|
||||||||||||||||||
# Future possibilities | ||||||||||||||||||
[future-possibilities]: #future-possibilities | ||||||||||||||||||
|
||||||||||||||||||
More functions will be added to those platform dependent float types, like casting between `f128` and `doubledouble`. | ||||||||||||||||||
|
||||||||||||||||||
For targets not supporting `f16` or `f128`, we may be able to introduce a 'limited mode', where the types are not fully functional, but user can load, store and call functions with such arguments. | ||||||||||||||||||
ecnelises marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest use these symbols, all start with
f
prefix that consistencef128
: C_Float128
, LLVMfp128
, GCC__float128
f16
: C_Float16
, LLVMhalf
, GCC__fp16
f16b
: C++std::bfloat16_t
, LLVMbfloat
, GCC__bf16
f80e
: LLVMx86_fp80
, GCC__float80
f64f64
: LLVMppc_fp128
, GCC__ibm128
doubledouble
, or maybef64f64e
means not standardAnd these symbols can be used as literal suffix as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for
f16b
in favor overbf16
for consistency, I liked that about @joshtriplett's original proposal.I don't think we should introduce something like
f128x
-doubledouble
or something like the GCC or LLVM types would be better IMO. Reason being, it's kind of ambiguous and specific to one architecture - PowerPC is even moving away from it. Better to give it an unambigous name since it will be used relatively rarely.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should use
bf16
rather thanf16b
since that is widely recognized whereasf16b
isn't, andf64_f64
instead off128x
since it really is 2f64
values and could be easily emulated on any other architecture (do not usef64x2
since that's already used bySimd<f64, 2>
). alsof<N>x
names are more or less defined by IEEE 754 to be wider thanN
bits, so e.g.f64x
would be approximately any type wider thanf64
but less thanf128
such asf80
,f16x
could be thef24
type used by some GPUs for depth buffers. so logicallyf80x
would need to be more than 80 bits andf128x
would need to be more than 128 bits.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f64f64
is OK, no need the underscore looks likedoubledouble
,f80e
insteadf80x
iff80x
is not acceptableStill vote for
f16b
, It's rust specific, we can create relationship betweenbf16
andf16b
in rust, that's won't be a burden.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What aboutf64x2
to indicate it's two f64 glued together?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use f64x2 since that's already used by Simd<f64, 2>