-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow floating-point operations to provide extra precision than specified, as an optimization #2686
Allow floating-point operations to provide extra precision than specified, as an optimization #2686
Conversation
…fied, as an optimization This enables optimizations such as fused multiply-add operations by default.
cc @fenrus75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want Rust to have a good story for licensing floating point optimizations, including but not limited to contraction. However, simply turning on contraction by default is not a good step in that direction. Contrary to what the RFC claims, contraction is not "safe" (meaning that it breaks otherwise-working programs; obviously there's no memory safety at stake), and we have not previously reserved the right to do this or given any other indication to users that it might happen.
Let's design a way to opt into and out of this behavior at crate/module/function first, and once that's done we can look at how to make more code use it automatically. A fine-grained opt-in and -out is very useful even if we end up changing the default e.g., to ensure code that breaks under contraction can be compiled as part of a crate graph that generally has contraction enabled. There's plenty of design work to keep us busy even without touching defaults:
- compiler options or attributes or ...?
- how does it propagate from callers into callees, if at all? (generally hard problem, but IMO a good story for this is just as valuable as providing the basic feature in the first place)
- what transformations are licensed exactly? (e.g., do we want roughly what the C standard allows, or do we want more like GCC does?)
back to a lower-precision format. | ||
|
||
In general, providing more precision than required should not cause a | ||
mathematical algorithm to fail or to lose numeric accuracy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. One simple counter-example is x * x - y * y
, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted to x.mul_add(x, - y * y)
then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed into sqrt()
to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.
That said, I'll adjust the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.
Some C, C++, and Fortran compilers do this (gcc, msvc), some don't (clang). If this were an universally good idea, all of them would do this, but this is not the case. That is, those languages are prior art, but I'm really missing from the prior art section why this would actually be a good idea - are programmers using those languages happy with that "feature" ?
A sign change trickling down your application depending on the optimization level (or even debug-information level) can be extremely hard to debug in practice. So IMO this issue raised by @rkruppe deserves more analysis than a language adjustment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this would actually be a good idea
are programmers using those languages happy with that "feature"
The beginning of the RFC already makes the rationale quite clear: this allows for optimizations on the scale of 2x performance improvements, while never reducing the accuracy of a calculation compared to the mathematically accurate result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkruppe Looking again at your example, I think there's something missing from it? You said:
One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats
Counter-example to that: x = 2.0
, y = 4.0
. Both x
and y
square to finite floats, and x*x - y*y
should absolutely be negative. I don't think those properties alone are enough to reasonably expect that you can call sqrt
on that and get a non-imaginary result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, sorry, you're right. That's what I get for repeating the argument from memory and filling the gaps without thinking too long. In general of course x² may be smaller than y². The problematic case is only when x = y (+ aforementioned side conditions), in that case (x * x) - (y * y)
is zero but with FMA it can be negative.
Another example, I am told, is complex multiplication when multiplying a number by its conjugate. I will not elaborate because apparently I cannot be trusted this late in the evening to work out the details correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. One simple counter-example is
x * x - y * y
, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted tox.mul_add(x, - y * y)
then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed intosqrt()
to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.
I suspect this is not a valid statement.
the original is in pseudocode
round64( round64(x * x) - round64(y * y) )
the contraction you gives
round64( x * x - round64(y * y) )
the case for this to go negative only in the contraction case would require the
round64(x * x) to round up to >= round64(y * y) while x * x itself is < round64(y * y),
so round64(x * x) == round64(y * y) by the "nearest" element of rounding; it can't cross round64(y * y)
since we're rounding to nearest, it means that x * x is equal to less than half a unit of precision away from round64(y * y).
this in turn means that x * x - round64(y * y) is, while negative in this case, less than half a unit of precision way from 0, which means the outer round64() will round up to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. One simple counter-example is
x * x - y * y
, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted tox.mul_add(x, - y * y)
then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed intosqrt()
to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.I suspect this is not a valid statement.
the original is in pseudocode
round64( round64(x * x) - round64(y * y) )
the contraction you gives
round64( x * x - round64(y * y) )
If you use y=x
, then if round64(x*x)
rounds up, it's easy to see that round64(x*x - round64(x*x))
is negative. This does not round to zero, because units of precision are not absolute, but relative (think significant figures in scientific notation).
For reference (and more interesting floating point information!) see the "fmadd" section on https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the conclusion, if I read this correctly, is that indeed increasing precision locally in some sub-computations can reduce precision of the overall computation, right? (Also see here.)
across platforms, this change could potentially allow floating-point | ||
computations to differ by platform (though never below the standards-required | ||
accuracy). However, standards-compliant implementations of math functions on | ||
floating-point values may already vary slightly by platform, sufficiently so to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm the last person to argue we have any sort of bit-for-bit reproducibility of floating point calculations across platforms or even optimization levels (I know in regretable detail many of the reasons why not), but it seems like a notable further step further to make even the basic arithmetic operations dependent on the optimization level, even for normal inputs, even on the (numerous) targets where they are currently not.
- [C11](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) allows | ||
this with the `STDC FP_CONTRACT` pragma enabled, and the default state | ||
of that pragma is implementation-defined. GCC enables this pragma by | ||
default, [as does the Microsoft C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that GCC defaults to -ffp-contract=fast
, which goes beyond what's described in the C standard, and according to documentation the only other option it implements is off
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on some careful research, as far as I can tell GCC's -ffp-contract=fast
just changes the default value of STDC FP_CONTRACT
, nothing else. It does not enable any of the potentially accuracy-reducing "fast-math" optimizations.
(-ffp-contract=off
means "ignore the pragma", and -ffp-contract=on
means "don't ignore the pragma" but doesn't change the default.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is: the C standard only allows FMA synthesis within a source-level expression. This is extremely inconvenient to respect at the IR level (you'd have to track which source level expression each operation comes from), so -ffp-contract=fast
simply disregards source level information and just contracts IR operations if they're of the suitable form.
Clang implements this option too, but it defaults to standard compliance by performing contraction in the frontend where source level boundaries are still available.
expression", where "Two arithmetic expressions are mathematically | ||
equivalent if, for all possible values of their primaries, their | ||
mathematical values are equal. However, mathematically equivalent | ||
arithmetic expressions may produce different computational results." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with Fortran (or at least this aspect of it), but this quote seems to license far more than contraction, e.g. all sorts of -ffast-math
style transformation that ignore the existence of NaNs. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkruppe That's correct, Fortran also allows things like reassociation and commutation, as long as you never ignore parentheses.
@rkruppe wrote:
It'd be a step towards parity with other languages, rather than intentionally being slower. I think we need to seriously evaluate whether we're buying anything by intentionally being slower. (And by "slower" here, I don't mean a few percent, I mean 2x slower.)
Any such programs would be broken in C, C++, Fortran, and likely other languages by default; they'd have to explicitly disable the default behavior. Such programs are also going directly against best practices in numerical methods; if anything, we should ideally be linting against code like
I've also found no explicit indications that we can't do this. And I've seen no indications that people expect Rust's default behavior to be different than the default behavior of other languages in this regard. What concrete problem are we trying to solve that outweighs a 2x performance win?
Agreed. The RFC already proposes an attribute; I could expand that to provide an attribute with two possible values.
If we have any hope of changing the defaults, the time to do that would be before those defaults are relied on.
I think it makes sense to have a global compiler codegen option, and I also think it makes sense to have an attribute (with a yes/no) that can be applied to any amount of code.
The attribute shouldn't. It should only affect code generation under the scope of the attribute.
My ideal goal would be "anything that strictly increases accuracy, making the result closer to the mathematically accurate answer". That would also include, for instance, doing f32 math in f64 registers and not forcing the result to f32 after each operation, if that'd be faster. |
In favor of what? |
in favor of something that ensures not negative... like fabs or an if
statement
…On Thu, Apr 18, 2019, 06:28 Michael Lamparski ***@***.***> wrote:
Such programs are also going directly against best practices in numerical
methods; if anything, we should ideally be linting against code like (x*x
- y*y).sqrt().
In favor of what?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2686 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAJ54FLHBJLJSDWMNKU7LRDPRBZO3ANCNFSM4HGXEURQ>
.
|
> Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.
_Some_ C, C++, and Fortran compilers do this (gcc, msvc), some don't (clang). If this were an universally good idea, all of them would do this, but this is not the case.
The point is that the language specification allows it, and many popular
implementations of the languages do it, and authors of code in the
language would have to explicitly disable that if they want their code
to not be affected by it. It isn't a default expectation that this
*can't* happen, and that makes it effectively an opt-out rather than an
opt-in.
|
Currently, Rust's [specification for floating-point | ||
types](https://doc.rust-lang.org/reference/types/numeric.html#floating-point-types) | ||
states that: | ||
> The IEEE 754-2008 "binary32" and "binary64" floating-point types are f32 and f64, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall this be understood as "the layout of f{32, 64}
is that of binary{32, 64}
" or as "the layout and arithmetic of f{32, 64}
is that of binary{32, 64}
" ?
The IEEE-754:2008 standard is very clear that optimizations like replacing a * b + c
with fusedMultiplyAdd(a, b, c)
should be opt-in, and not opt-out (e.g. see section 10.4), so depending on how one interprets the above, the proposed change could be a backwards incompatible change.
computations to differ by platform (though never below the standards-required | ||
accuracy). However, standards-compliant implementations of math functions on | ||
floating-point values may already vary slightly by platform, sufficiently so to | ||
produce different binary results. This proposal can never make results *less* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intention of the user was for its Rust programs to actually have the semantics of the code it actually wrote, e.g., first do a a * b
, and then add the result to c
, performing intermediate rounding according the precision of the type, this proposal does not only make the result less accurate, but it makes it impossible to actually even express that operation in the Rust language.
If the user wants higher precision they can write fma(a, b, c)
today, and if the user does not care, they can write fmul_add(a, b, c)
. This proposal, as presented, does not provide a first_mul_a_b_then_add_c(a, b, c)
intrinsic that replaces the current semantics, so the current semantics become impossible to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performing intermediate rounding, according the precision of the type
What we're discussing in this RFC is, precisely, 1) whether that's actually the definition of the Rust language, and 2) whether it should be. Meanwhile, I'm not seeing any indication that that's actually the behavior Rust developers expect to get, or that they expect to pay 2x performance by default to get it.
but it makes it impossible to actually even express that operation in the Rust language
I'm already editing the RFC to require (rather than suggest) an attribute for this.
|
||
We could provide a separate set of types and allow extra accuracy in their | ||
operations; however, this would create ABI differences between floating-point | ||
functions, and the longer, less-well-known types seem unlikely to see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, these wrappers could be repr(transparent)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this in the sense that changing from one to the other would be an incompatible API change in a crate. I'll clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the algorithm does not care about contraction, it might also not care about NaN
s, or associativity, or denormals, or ... so if it wants to accept a NonNaN<Associative<NoDenormals<fXY>>>
type as well as the primitive f{32, 64}
types, then it has to be generic, and if its generic, it would also accept a type wrapper lifting the assumption that contraction is not ok without breaking the API.
In other words, once one starts walking down the road of lifting assumptions about floating-point arithmetic, contraction is just one of the many many different assumptions that one might want to lift. Making it special does not solve the issue of these APIs having to be generic about these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we have anywhere near a smooth enough UX for working with wrappers around primitive arithmetic types for me to seriously consider them as a solution for licensing fast-math transformations. There's serious papercuts even when trying to generic over the existing primitive types (e.g., you can't use literals without wrapping them in ugly T::from
calls), and we have even less machinery to address the mixing of different types that such wrappers would entail.
I also think it's quite questionable whether these should be properties of the type. It kind of fits "no infinities/nans/etc." but other things are fundamentally about particular operations and therefore may be OK in one code region but not in another code region even if the same data is being operated on.
We could provide a separate set of types and allow extra accuracy in their | ||
operations; however, this would create ABI differences between floating-point | ||
functions, and the longer, less-well-known types seem unlikely to see | ||
widespread use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior art shows that people that need / want this are going to use them, e.g., "less-well-known_ flags like -ffast-math
are in widespread use, even though they are not enabled by default. So it is unclear to me how much weight this argument should actually have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate types are harder to drop into a code base than a compiler flag or attribute, though, because using the type in one place generally leads to type errors (and need for conversions to solve them) at the interface with other code.
We could do nothing, and require code to use `a.mul_add(b, c)` for | ||
optimization; however, this would not allow for similar future optimizations, | ||
and would not allow code to easily enable this optimization without substantial | ||
code changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could provide a clippy lint that recognizes a * b + c
(and many others), and tell people that if they don't care about precision, they can write a.mul_add(b, c)
instead. We could have a group of clippy lints about these kind of things that people can enable in bulk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this particular point a clippy lint is helpful but not necessarily enough. Once the optimizer chews through layers of code it can end up at an a * b + c
expression without it being anything that is obvious to clippy.
@rkruppe I would prefer even finer grained control than that, e.g., individual type wrappers that add a single assumption about floating-point math that the compiler is allowed to make and that can be combined, e.g.,
That way I can write a: pub type Real = Trapless<Finite<Normal<Associative<Constractable<...<f32>...>>>>>>; and use it throughout the parts of my code where its appropriate. When I need to interface with other crates (or they with me), I can still use pub fn my_algo(x: f32) -> f32 {
let r: Real = x.into()
// ... do stuff with r ...
r.into()
} Sure, some people might go overboard with these, and create complicated trait hierarchies, make all their code generic, etc. but one doesn't really need to do that (if somebody wants to provide a good library to abstract over all of this, similar to how Global flags for turning these on/off require you to inspect the module/function/crate/ .cargo/config / ... to know what the rules for floating-point arithmetic are, and then use that knowledge to reason about your program, and the chances that some code that wasn't intended to play by those rules get those flags applied (e.g. because it was inlined, monomorphized, etc. on a module with those flags enabled), don't seem worth the risk (reading Fortran here gives me fond memories of writing The main argument of this RFC is that if we do something like this, then some code that expends 99% of its execution time doing |
Things not being deterministic just came up recently on URLO: https://users.rust-lang.org/t/result-of-f64-cos-is-slightly-different-on-macos-and-linux-in-some-cases/27198/3?u=scottmcm |
In C, even if you make sure your compiler outputs code that uses IEEE 754 floats on all platforms, trying to get the same floating-point results across different platforms, build configurations, and times is an exercise in plugging up a bazillion abstraction leaks. That's par for the course for C. Not for Rust. I am well aware that floating point is a mere approximation of the real numbers, and that you're suggesting transformations that would increase this accuracy. That said, I still disapprove of the proposed new defaults. I'd much rather not have the compiler try by default to second-guess me on what really should be a perfectly well-defined and predictable operation. I'd much rather the compiler, by default, choose some specific observable output behaviour, and stick to it, just like it normally does. I'll flick the floating point flags myself if I want to sacrifice determinism for a better approximation of what I've given up on since I was a clueless novice looking around for the reason why NaNs may also have unspecified bit patterns. However, IEEE 754 mandates behaviour for NaNs that make them opaque unless you specifically crack them open, and NaNs propagate through most floating-point operations, so if their payload can be disregarded, they are essentially fixed points of floating point operations. Small floating point evaluation differences tend to be magnified by systems with chaotic behaviour, which includes most nontrivial physical systems, and treating finite floats as opaque would completely defeat the purpose of doing the floating point computations in the first place. |
By way of providing concrete examples that Rust already provides extra accuracy today on some platforms:
i586-unknown-linux-gnu has more accuracy than x86_64-unknown-linux-gnu, because it does intermediate calculations with more precision. And changing that would substantially reduce performance. |
@gnzlbg What code do you expect the compiler to generate when you use those generics? Because ultimately, if you want that, you're asking for pure software floating-point on many platforms. |
If you check the LangRef for the LLVM-IR of the floating point intrinsics, e.g.,
where flags like So when one uses such a type, I expect that rustc will insert the fast-math flags for each operation as appropriate. That's more finer grained than just inserting them as function attributes for all functions in an LLVM module. |
@gnzlbg What machine code do you expect to generate when every operation can potentially have different flags? How much performance do you consider reasonable to sacrifice to get the behavior you're proposing? What specific code do you want to write that depends on having such fine-grained type-level control of this behavior? Not all abstract machines and specifications translate to reasonable machine code on concrete machines. If you want bit-for-bit identical results for floating point on different platforms and target feature flats and optimization levels, you're going to end up doing software floating point for many operations on many platforms, and I don't think that's going to meet people's expectations at all. If you can live with the current state that we've had for years, then this RFC is already consistent with that behavior. I would like to request that discussion of adding much more fine-grained control of specific floating-point flags that weren't already raised in the RFC be part of some other RFC, rather than this one. I already have a mention of the idea of adding specific types, which covers the idea of (for instance) |
Expanding on my earlier comment, Rust also already allows floating-point accuracy to depend on optimization level, in addition to targets:
So, in practice, Rust already has this behavior, and this RFC does not represent a breaking change. (Worth noting that it's easy enough to reproduce this with f64 as well, just by changing the types and constants.) |
For things like From http://www.box2d.org/forum/viewtopic.php?f=3&t=1800#p16480:
So it's not trivial, but apparently it works across processor vendors and such. |
…om other languages "fast math" is widely perceived as an unsafe option to go faster by sacrificing accuracy.
well, can perhaps we just be real about what we're telling the compiler to allow?
Or are there things besides just allowing for FMA usage that we're talking about here? (EDIT: in this first wave of optimizations at least) |
there most certainly are other things; (f32->f64 tends to be cheap since it's mostly just padding 0 bits) |
From what I understand, LLVM (and probably Rust) by default assumes that traps don't occur and the rounding mode is set to round-to-nearest |
On Wed, Apr 24, 2019 at 05:47:09PM -0700, Lokathor wrote:
well, can perhaps we just be real about what we're telling the compiler to allow?
`#![fp(allow_fma)]`
Or are there things _besides_ just allowing for FMA usage that we're talking about here?
Yes, there are other optimizations this would allow besides
floating-point contraction, such as performing several operations on
`f32` using an `f64` register before putting it back into an `f32`
location. And that has exactly the same property as contraction: extra
precision. I'd like to describe the semantic concept rather than the
specific optimization, since it has the same effect on code.
|
On Wed, Apr 24, 2019 at 05:31:34PM -0700, Kornel wrote:
I don't mind that behavior. In fact, I'd like even more reckless-approx-math options. Is there a path to opting in to more fast fp math?
Maybe `#[extra_fp_precision(on)]` could be `#[fp(extra_precision)]` and eventually become `#[fp(extra_precision, associative, reciprocal_approx, no_signed_zero, no_traps)]`, etc.
I don't want to add those other flags in this RFC (I *really* want to
avoid the implication of association with `-ffast-math`), but I have no
objection to changing this to `fp(extra_precision(off))` (or perhaps
`fp(no_extra_precision)`), to allow for future `fp(...)` flags. That
seems entirely reasonable.
|
In my opinion, this attribute should really have an option to disable all the optimisations that may change the exact results of the computations, present and future. So that people who care can write e.g. Also, I find |
should explicitly discuss the issue of extra floating-point precision and how | ||
to disable it. Furthermore, this change should not become part of a stable Rust | ||
release until at least eight stable releases *after* it first becomes | ||
implemented in the nightly compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the point of this last sentence. And particularly, why is the reference point the first availability in nightly? I think it would be more useful to guarantee that the optimisations will not be enabled by default on stable until the opt-out has been available as a no-op for a few stable releases.
loss.) However, with some additional care, applications desiring cross-platform | ||
identical results can potentially achieve that on multiple target platforms. In | ||
particular, applications prioritizing identical, portable results across two or | ||
more target platforms can disable extra floating-point precision entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in a previous comment, mere reproducibility is not always the reason to disable this behaviour. Some algorithms can actually take advantage of the weird special properties of floating-point arithmetic. Such algorithms should remain implementable as Rust libraries, and those should not break just because someone decided they wanted their unrelated floating-point code to be as fast as possible.
I have some concern with this approach, that I'd at least like to see listed in the "drawbacks".
I understand the practical concerns leading here, but from a formal perspective and wanting to make the Rust spec precise, this RFC is a step backwards. That is not at all a Rust-specific problem; |
back to a lower-precision format. | ||
|
||
In general, providing more precision than required will only bring a | ||
calculation closer to the mathematically precise answer, never further away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like an extremely strong statement to me that needs further justification. I see no reason to assume such monotonicity here. Different rounding errors happening during a computation might as well just happen to cancel each other such that removing some errors actually increases the error of the final result.
Extreme example: 1 / 10
has a rounding error, but 1.0/10.0 - 1.0/10.0
actually gives the right result. Providing more precision only on one side of the subtraction increases the error of the entire calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2686 (comment)
I work on the measure of the numerical error introduced by floating-point arithmetic and I believe this RFC could be an all around improvement. It is a speed improvement. It is an accuracy improvement. It could be a determinism improvement. Finally, as others have suggested (and out of the scope of this RFC), I would love to have the ability to locally enforce strict floating-point manipulation. While I believe that binary reproducibility of floating-point result is often misguided, some operations do require absolute control from the user. |
That's a stretch. We could certainly provide binary guarantees for all platforms without introducing non-determinism for all platforms.
Some platforms being ill-behaved does not seem like a good argument for introducing ill-behavedness on sane platforms.^^ (Some good arguments have been made in this thread, but this isn't one.)
There's no UB here, right? Just non-determinism. |
For me it is UB in the sense that the code's behavior is not specified and can, thus, vary on different platform in a way that is not predictable by the user. My argument is not that the current situation is bad and thus it does not matter if we worsen it but that the current situation is unregulated and that this could bring in a flag to improve on the current situation when it matters (by specifying the expected behavior) and let things be when it doesn't (where I believe contraction is a better default). |
A note for users of other languages: this is *not* the equivalent of the "fast | ||
math" option provided by some compilers. Unlike such options, this behavior | ||
will never make any floating-point operation *less* accurate, but it can make | ||
floating-point operations *more* accurate, making the result closer to the | ||
mathematically exact answer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given #2686 (comment), I think this statement should be removed as it is incorrect -- or else there should be an argument for how we plan to guarantee that we never make things less accurate.
UB is a technical term with a specific meaning, and this is not it. I get what you mean but please let's use terminology correctly, lest it become useless. :)
So I think the argument is that this reduces underspecification for platforms which currently do not faithfully implement IEEE semantics? I agree it does. It also makes those platforms not special exceptions any more. However, it does so by pulling all platforms (in their default config) down to the level of (what I consider to be) "misbehaving" platforms. The proposal is to use the lowest common denominator as the new default. (Please correct me if I misread.) Somehow I cannot see that as progress. Ultimately this is a question of defaults: I would prefer the default to be IEEE with no exception, and then a way to opt-in to deviations from this strict baseline. These deviations would be in the style of There seems to be a spectrum of "IEEE conformance", with full conformance on one end, "whatever C does per default" somewhere in the middle (where it will e.g. use x87 instructions), and full fast-math on the other end. If I read this proposal correctly, it proposes to make the Rust default the same as / close to the C default. But I do not see any good reason for picking this particular spot on the spectrum other than "C did it" (the claim that this never reduces accuracy has been refuted, from what I can tell). So if we ignore C, IMO the most obvious choices for the default are "fully conformant" or "fully fast-math", and the RFC does not do a good enough job arguing for why we should pick another default on some "random" spot in the middle. |
Right now, for a particular Rust toolchain and for many particular target platforms we are very close to having bit-per-bit deterministic results on a wide range of different hardware on that platform. That is, if a user of your program hits a bug on some weird target on release mode, you can just pick the same toolchain and options, cross-compile to the target, and debug your program under QEMU and be able to reproduce, debug, and fix the issue.
With this RFC, the results do not only depend on the optimization level, but also on the optimizations that actually get performed. Compiling in The 32-bit x86 without SSE target is the only targets mentioned for which debugging is already hard due to these issues. The only debugging tool one ends up having is "Look at the assembly" and hope that you can figure the bug out from there. That's a bad user experience even for Having reported bugs for these targets and having seen people invest a lot of time into figuring them out I don't see how making all targets equally hard to debug by default is a good value proposition. It'd be much simpler to instead use soft-floats on weird targets by default while adding an option that allows users to opt-in to the x87 FPU (with a big "warning" that documents known issues). I have yet to run into an actual user that wants to do high-performance work on a x86 32-bit CPU without SSE in 2019, but if those users end up appearing, we could always invest time and effort into improving that opt-in option when that happens. That sounds much better to me than lowering the debuggability of all other targets to "32-bit x86 without SSE" standards. |
I think it may be more useful to have IEEE 754 compliant (no FP traps, round-to-nearest-even, FP exception flags are ignored as an output -- basically what LLVM assumes by default on most platforms) be the default, and optimizations that change the results (such as fast-math and some forms of vectorization) be opt-in (at at least function, crate, and binary levels). This will improve reproducibility and debuggability such that results can be relied on cross-platform (excluding differences in NaN encodings) with a minor performance loss on unusual platforms (x86 without SSE). IEEE 754 compliance would not apply to SIMD types by default due to ARM (unfortunately) not supporting denormal numbers by default. This is similar to how Rust has reproducible results for integer overflow/wrapping cross-platform even though C allows some forms of integer overflow to be undefined behavior. |
We call that unspecified behavior around here. Values which do not have a data dependency on the results of these computations are unaffected by the choice of semantics for floating point. |
For completeness: there's an ongoing discussion over exactly what terminology we should use for this sort of thing in Rust (rust-lang/unsafe-code-guidelines#201), though it'll probably be something similar to "unspecified" or "implementation-defined". Back on-topic: it seems clear that we should be looking into fine-grained opt-in mechanisms for fast-math-y things before we seriously consider any changes to the global default behavior. In particular, #2686 (comment) is exactly what I think we should do. |
@RalfJung and others who flirt with
Arguments for the default position on the spectrum are indeed needed, so let me try to supply some. I am still not in favor of this RFC, but I think it is much better than the equivalent of First off, So if we exclude that, we still got the following on top of what's allows in the RFC (probably a non-exhaustive list, but should include everything
IMO there is no strong reason to include or exclude (1) so whatever. On the other hand, (2) is a very broad license to the compiler (there's no rules about how imprecise it can get) and one that is hard to make good use of in practice (because the compiler generally can't know what level of precision is acceptable for all of its users). Moreover, unless you're targeting a rather specialized chip that has hardware instructions for approximating transcendental functions, you can probably achieve the same effect by just using a different libm, which Rust does not yet support super well but could learn to do without touching the semantics of built-in types and operations. As for (3), while any change to rounding can break the correctness of some numerical algorithms and snowball into an overall loss of accuracy, increasing precision of intermediate results is rather mild in this respect compared to freely performing reassociation, which can more easily and more drastically affect the results. It is also very important for enabling automatic vectorization of reductions, so it's still commonly enabled, but its benefits are much smaller for code that is not vectorizable. For these reasons, I am quite sure something roughly like the RFC's position on the spectrum is a reasonable tradeoff between performance improvements and program reliability. Definitely not the only reasonable option, but clearly superior to full-on |
@rkruppe thanks for pointing out that full fast-path can cause UB; I agree that that is indeed a qualitative "step" somewhere on the line of floating point conformance. |
I'd like to formally withdraw this RFC. I still think this is a good idea, and I think having this substantial optimization happen by default is important. But there are many concerns that need to be dealt with, and we'd likely need some better ways to opt out of or into this, at both a library-crate level and a project level. I don't have the bandwidth to do that design work at this time, so I'm going to close this. If someone would be interested in working on the general issue of floating-point precision, FMA, and similar, I would be thrilled to serve as the liaison for it. |
Is there any way at present to enable floating point contractions and/or associative math without dropping to intrinsics? Seeming inability to write things like a good dot product (e.g., https://godbolt.org/z/Y35sda) without intrinsics is a critical issue for adoption in numerical/scientific computing. I think attributes of the |
There isnt, but ive made a crate which allows you to use the faster floats without, yknow, great inconvenience. |
Rendered
This enables optimizations such as fused multiply-add operations by default, while providing robust mechanisms to disable extra precision for applications that wish to do so.
EDIT: Please note that this RFC has been substantially overhauled to better accommodate applications that wish to disable extra precision. In particular, there's a top-level codegen option (
-C extra-fp-precision=off
) to disable this program-wide.