-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sqrt
and sqrtf
by using hardware more often.
#222
Conversation
Change per the updated info in rust-lang/rust#63455 (comment)
Change per the updated info in rust-lang/rust#63455 (comment)
Also tell clippy to hush about some things.
Okay, so I'm a little stumped on how to make this work right. The CI tests with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to just frob CI to do the right thing, it's not really set up in any particular way one way or another.
CI thinks that there's a panic in here, so let's see if it thinks the panic is in the software part or the intrinsics part. Sadly, I don't have the arm7 stuff setup on my home machine so we're just letting the CI do this.
The current PR doesn't change the If there are no further problems I think we can merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to confirm that this actually generates a hardware instruction for the relevant platforms. Additionally CI should be green before this PR, so while it may not look like it it may still be possible that CI is red because of this PR. In either case CI will need to be green before merging.
#[cfg(all( | ||
feature = "unstable", | ||
any( | ||
all(target_arch = "x86", not(target_feature = "soft-float")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may not have been clear enough earlier, but soft-float
and hard-float
are not features, I don't think that target_feature = "soft-float"
is ever set (nor for hard-float
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I got the name of those target features from rustc --print target-features --target TARGET
, so I think at least rustc thinks they exist.
How else should we detect for hard or soft float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately -C target-feature
, what --print target-features
does, is not equivalent to #[cfg(target_feature = "..")]
. I'd recommend looking at compiled code to see how the #[cfg]
here isn't working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I see now that target_feature = "soft-float"
will always evaluate to false (for example), but that brings us back to the question: how do we respect the hard-float/soft-float configuration of the target? If we can't do that, we can't ever be sure that a hardware instruction exists to be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't always have the answers to all the questions, I'm just pointing out how this isn't working as intended. I don't know how this would be detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's totally fine if you don't know it all! You just happen to usually know the most, and you're "here already". We'll have to ask around I guess.
Without an answer to this problem we're just plain stuck. It would be incorrect to call back to the LLVM intrinsic if the arch's version of hardware floating point isn't enabled, since LLVM will end up calling us again and it's just infinite recursion.
The only real relief to be had is that if there is hardware sqrt then LLVM won't ever call us here, so we could just skip trying to divert back to LLVM at all. Except that's where we started basically and that's what could potentially give the performance regressions that we're trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue to seek a wider audience for support: rust-lang/rust#64514
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm understanding the issue it's that if we export sqrt
and sqrtf
from compiler_builtins
unconditionally then any C code that's linked will use this version and it might perform slower than what it's already using. I think the solution is to not export sqrt
and sqrtf
unconditionally not only to avoid the performance issue but also because C code could depend on the specific error handling of the libm it was compiled against that this code can't emulate. core
already depends on fmod
so I think sqrt
can and should be handled in the same way.
Per rust-lang/rust#63455 (comment)
The goal of this change is to be a stepping stone towards getting
sqrt
into core.As part of that, we're making sure that libm is using the hardware instruction for
sqrt
as often as possible.Here's a spreadsheet of what I got when I tried to use core::intrinsics::{sqrtf32, sqrtf64} in godbolt for various targets. Based on this, and discussion in the other PR, I've made early returns with the intrinsic happen more often just in case the
sqrt
function does get called.As part of this, the software solution has been broken into its own function so that it's easier to keep clear from all the hardware cfg selection and early return stuff.
(Also on the side I added the usual little crates.io and docs.rs badges to the readme file.)