-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow floating-point operations to provide extra precision than specified, as an optimization #2686
Changes from 1 commit
c911f31
7ddfbcd
f9fffdf
047dea6
ce7d876
349c711
d78aa75
57e4331
b699bc5
84734aa
9db70f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
- Feature Name: `allow_extra_fp_precision` | ||
- Start Date: 2019-04-08 | ||
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Update the Rust specification to allow floating-point operations to provide | ||
*more* precision than specified, but not less precision; this allows many safe | ||
optimizations. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Some platforms provide instructions to run a series of floating-point | ||
operations quickly, such as fused multiply-add instructions; using these | ||
instructions can provide performance wins up to 2x or more. These instructions | ||
may provide *more* precision than required by IEEE floating-point operations, | ||
such as by doing multiple operations before rounding or losing precision. | ||
Similarly, high-performance floating-point code could perform multiple | ||
operations with higher-precision floating-point registers before converting | ||
back to a lower-precision format. | ||
|
||
In general, providing more precision than required should not cause a | ||
mathematical algorithm to fail or to lose numeric accuracy. | ||
|
||
This RFC proposes allowing floating-point types to perform intermediate | ||
calculations using more precision than the type itself, as long as they provide | ||
*at least* as much precision as the IEEE 754 standard requires. | ||
|
||
See the [prior art section](#prior-art) for precedent in several other | ||
languages and compilers. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
After this RFC, we could explain floating-point operations as follows: | ||
|
||
Floating-point operations in Rust have a guaranteed minimum accuracy, which | ||
specifies how far the result may differ from an infinitely accurate, | ||
mathematically exact answer. The implementation of Rust for any target platform | ||
must provide at least that much accuracy. In some cases, Rust can perform | ||
operations with higher accuracy than required, and doing so provides greater | ||
performance (such as by removing intermediate rounding steps). This will never | ||
make any floating-point operation *less* accurate, but it can make | ||
floating-point operations *more* accurate, making the result closer to the | ||
mathematically exact answer. | ||
|
||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
Currently, Rust's [specification for floating-point | ||
types](https://doc.rust-lang.org/reference/types/numeric.html#floating-point-types) | ||
states that: | ||
> The IEEE 754-2008 "binary32" and "binary64" floating-point types are f32 and f64, respectively. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall this be understood as "the layout of The IEEE-754:2008 standard is very clear that optimizations like replacing |
||
|
||
This RFC proposes updating that definition as follows: | ||
|
||
The `f32` and `f64` types represent the IEEE 754-2008 "binary32" and "binary64" | ||
floating-point types. Operations on those types must provide no less | ||
precision than the IEEE standard requires; such operations may provide *more* | ||
precision than the standard requires, such as by doing a series of operations | ||
with higher precision before storing a value of the desired precision. | ||
|
||
rustc may provide a codegen (`-C`) option to disable this behavior, such as `-C | ||
disable-extra-fp-precision`. Rust may also provide an attribute to disable this | ||
behavior from within code, such as `#[disable_extra_fp_precision]`. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
If Rust already provided bit-for-bit identical floating-point computations | ||
across platforms, this change could potentially allow floating-point | ||
computations to differ by platform (though never below the standards-required | ||
accuracy). However, standards-compliant implementations of math functions on | ||
floating-point values may already vary slightly by platform, sufficiently so to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm the last person to argue we have any sort of bit-for-bit reproducibility of floating point calculations across platforms or even optimization levels (I know in regretable detail many of the reasons why not), but it seems like a notable further step further to make even the basic arithmetic operations dependent on the optimization level, even for normal inputs, even on the (numerous) targets where they are currently not. |
||
produce different binary results. This proposal can never make results *less* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the intention of the user was for its Rust programs to actually have the semantics of the code it actually wrote, e.g., first do a If the user wants higher precision they can write There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What we're discussing in this RFC is, precisely, 1) whether that's actually the definition of the Rust language, and 2) whether it should be. Meanwhile, I'm not seeing any indication that that's actually the behavior Rust developers expect to get, or that they expect to pay 2x performance by default to get it.
I'm already editing the RFC to require (rather than suggest) an attribute for this. |
||
accurate, it can only make results *more* accurate. | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
We could provide a separate set of types and allow extra accuracy in their | ||
operations; however, this would create ABI differences between floating-point | ||
functions, and the longer, less-well-known types seem unlikely to see | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not necessarily, these wrappers could be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean this in the sense that changing from one to the other would be an incompatible API change in a crate. I'll clarify that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the algorithm does not care about contraction, it might also not care about In other words, once one starts walking down the road of lifting assumptions about floating-point arithmetic, contraction is just one of the many many different assumptions that one might want to lift. Making it special does not solve the issue of these APIs having to be generic about these. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not think we have anywhere near a smooth enough UX for working with wrappers around primitive arithmetic types for me to seriously consider them as a solution for licensing fast-math transformations. There's serious papercuts even when trying to generic over the existing primitive types (e.g., you can't use literals without wrapping them in ugly I also think it's quite questionable whether these should be properties of the type. It kind of fits "no infinities/nans/etc." but other things are fundamentally about particular operations and therefore may be OK in one code region but not in another code region even if the same data is being operated on. |
||
widespread use. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prior art shows that people that need / want this are going to use them, e.g., "less-well-known_ flags like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Separate types are harder to drop into a code base than a compiler flag or attribute, though, because using the type in one place generally leads to type errors (and need for conversions to solve them) at the interface with other code. |
||
|
||
We could provide an option to enable extra accuracy for the default | ||
floating-point types, but disable it by default. This would leave the majority | ||
of floating-point code unable to use these optimizations, however; defaults | ||
matter, and the majority of code seems likely to use the defaults. | ||
|
||
We could do nothing, and require code to use `a.mul_add(b, c)` for | ||
optimization; however, this would not allow for similar future optimizations, | ||
and would not allow code to easily enable this optimization without substantial | ||
code changes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could provide a clippy lint that recognizes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On this particular point a clippy lint is helpful but not necessarily enough. Once the optimizer chews through layers of code it can end up at an |
||
|
||
We could narrow the scope of optimization opportunities to *only* include | ||
floating-point contraction but not any other precision-increasing operations. | ||
See the [future possibilities](#future-possibilities) section for further | ||
discussion on this point. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
This has precedent in several other languages and compilers: | ||
|
||
- [C11](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) allows | ||
this with the `STDC FP_CONTRACT` pragma enabled, and the default state | ||
of that pragma is implementation-defined. GCC enables this pragma by | ||
default, [as does the Microsoft C | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that GCC defaults to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on some careful research, as far as I can tell GCC's ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is: the C standard only allows FMA synthesis within a source-level expression. This is extremely inconvenient to respect at the IR level (you'd have to track which source level expression each operation comes from), so Clang implements this option too, but it defaults to standard compliance by performing contraction in the frontend where source level boundaries are still available. |
||
compiler](https://docs.microsoft.com/en-us/cpp/preprocessor/fp-contract?view=vs-2019). | ||
joshtriplett marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- [The C++ standard](http://eel.is/c++draft/expr.pre#6) states that "The | ||
values of the floating operands and the results of floating | ||
expressions may be represented in greater precision and range than | ||
that required by the type; the types are not changed thereby." | ||
|
||
- The [Fortran standard](https://www.fortran.com/F77_std/rjcnf0001-sh-6.html#sh-6.6.4) | ||
states that "the processor may evaluate any mathematically equivalent | ||
expression", where "Two arithmetic expressions are mathematically | ||
equivalent if, for all possible values of their primaries, their | ||
mathematical values are equal. However, mathematically equivalent | ||
arithmetic expressions may produce different computational results." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not familiar with Fortran (or at least this aspect of it), but this quote seems to license far more than contraction, e.g. all sorts of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rkruppe That's correct, Fortran also allows things like reassociation and commutation, as long as you never ignore parentheses. |
||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
- Should we provide a rustc codegen option? | ||
- Should we provide an attribute? | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
The initial implementation of this RFC can simply enable floating-point | ||
contraction within LLVM (and equivalent options in future codegen backends). | ||
However, this RFC also allows other precision-increasing optimizations; in | ||
particular, this RFC would allow the implementation of f32 or future f16 | ||
formats using higher-precision registers, without having to apply rounding | ||
after each operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. One simple counter-example is
x * x - y * y
, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted tox.mul_add(x, - y * y)
then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed intosqrt()
to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.
That said, I'll adjust the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some C, C++, and Fortran compilers do this (gcc, msvc), some don't (clang). If this were an universally good idea, all of them would do this, but this is not the case. That is, those languages are prior art, but I'm really missing from the prior art section why this would actually be a good idea - are programmers using those languages happy with that "feature" ?
A sign change trickling down your application depending on the optimization level (or even debug-information level) can be extremely hard to debug in practice. So IMO this issue raised by @rkruppe deserves more analysis than a language adjustment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The beginning of the RFC already makes the rationale quite clear: this allows for optimizations on the scale of 2x performance improvements, while never reducing the accuracy of a calculation compared to the mathematically accurate result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkruppe Looking again at your example, I think there's something missing from it? You said:
Counter-example to that:
x = 2.0
,y = 4.0
. Bothx
andy
square to finite floats, andx*x - y*y
should absolutely be negative. I don't think those properties alone are enough to reasonably expect that you can callsqrt
on that and get a non-imaginary result.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, sorry, you're right. That's what I get for repeating the argument from memory and filling the gaps without thinking too long. In general of course x² may be smaller than y². The problematic case is only when x = y (+ aforementioned side conditions), in that case
(x * x) - (y * y)
is zero but with FMA it can be negative.Another example, I am told, is complex multiplication when multiplying a number by its conjugate. I will not elaborate because apparently I cannot be trusted this late in the evening to work out the details correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is not a valid statement.
the original is in pseudocode
round64( round64(x * x) - round64(y * y) )
the contraction you gives
round64( x * x - round64(y * y) )
the case for this to go negative only in the contraction case would require the
round64(x * x) to round up to >= round64(y * y) while x * x itself is < round64(y * y),
so round64(x * x) == round64(y * y) by the "nearest" element of rounding; it can't cross round64(y * y)
since we're rounding to nearest, it means that x * x is equal to less than half a unit of precision away from round64(y * y).
this in turn means that x * x - round64(y * y) is, while negative in this case, less than half a unit of precision way from 0, which means the outer round64() will round up to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use
y=x
, then ifround64(x*x)
rounds up, it's easy to see thatround64(x*x - round64(x*x))
is negative. This does not round to zero, because units of precision are not absolute, but relative (think significant figures in scientific notation).For reference (and more interesting floating point information!) see the "fmadd" section on https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the conclusion, if I read this correctly, is that indeed increasing precision locally in some sub-computations can reduce precision of the overall computation, right? (Also see here.)