Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sqrt and sqrtf by using hardware more often. #222

Closed
wants to merge 17 commits into from
Closed

Improve sqrt and sqrtf by using hardware more often. #222

wants to merge 17 commits into from

Conversation

Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Aug 24, 2019

Per rust-lang/rust#63455 (comment)

The goal of this change is to be a stepping stone towards getting sqrt into core.

As part of that, we're making sure that libm is using the hardware instruction for sqrt as often as possible.

Here's a spreadsheet of what I got when I tried to use core::intrinsics::{sqrtf32, sqrtf64} in godbolt for various targets. Based on this, and discussion in the other PR, I've made early returns with the intrinsic happen more often just in case the sqrt function does get called.

As part of this, the software solution has been broken into its own function so that it's easier to keep clear from all the hardware cfg selection and early return stuff.

  • Note: Despite what I named the branch, thinking that i'd use cfg_if, I ended up not using it at all.

(Also on the side I added the usual little crates.io and docs.rs badges to the readme file.)

@Lokathor Lokathor added the enhancement New feature or request label Aug 24, 2019
@Lokathor Lokathor requested a review from alexcrichton August 24, 2019 05:20
@Lokathor Lokathor self-assigned this Aug 24, 2019
@Lokathor
Copy link
Contributor Author

Okay, so I'm a little stumped on how to make this work right.

The CI tests with --no-default-features while using the Stable channel, and this PR uses the the "stable" feature to detect when to use core intrinsics.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just frob CI to do the right thing, it's not really set up in any particular way one way or another.

src/math/sqrt.rs Outdated Show resolved Hide resolved
src/math/sqrt.rs Outdated Show resolved Hide resolved
src/math/sqrt.rs Show resolved Hide resolved
@Lokathor
Copy link
Contributor Author

The current PR doesn't change the pow function and it doesn't change the crate version of no-panic but for some reason a panic is being detected in pow. I don't think this is anything I did.

If there are no further problems I think we can merge this.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to confirm that this actually generates a hardware instruction for the relevant platforms. Additionally CI should be green before this PR, so while it may not look like it it may still be possible that CI is red because of this PR. In either case CI will need to be green before merging.

#[cfg(all(
feature = "unstable",
any(
all(target_arch = "x86", not(target_feature = "soft-float")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I may not have been clear enough earlier, but soft-float and hard-float are not features, I don't think that target_feature = "soft-float" is ever set (nor for hard-float)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I got the name of those target features from rustc --print target-features --target TARGET, so I think at least rustc thinks they exist.

How else should we detect for hard or soft float?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately -C target-feature, what --print target-features does, is not equivalent to #[cfg(target_feature = "..")]. I'd recommend looking at compiled code to see how the #[cfg] here isn't working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I see now that target_feature = "soft-float" will always evaluate to false (for example), but that brings us back to the question: how do we respect the hard-float/soft-float configuration of the target? If we can't do that, we can't ever be sure that a hardware instruction exists to be called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't always have the answers to all the questions, I'm just pointing out how this isn't working as intended. I don't know how this would be detected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's totally fine if you don't know it all! You just happen to usually know the most, and you're "here already". We'll have to ask around I guess.

Without an answer to this problem we're just plain stuck. It would be incorrect to call back to the LLVM intrinsic if the arch's version of hardware floating point isn't enabled, since LLVM will end up calling us again and it's just infinite recursion.

The only real relief to be had is that if there is hardware sqrt then LLVM won't ever call us here, so we could just skip trying to divert back to LLVM at all. Except that's where we started basically and that's what could potentially give the performance regressions that we're trying to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue to seek a wider audience for support: rust-lang/rust#64514

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding the issue it's that if we export sqrt and sqrtf from compiler_builtins unconditionally then any C code that's linked will use this version and it might perform slower than what it's already using. I think the solution is to not export sqrt and sqrtf unconditionally not only to avoid the performance issue but also because C code could depend on the specific error handling of the libm it was compiled against that this code can't emulate. core already depends on fmod so I think sqrt can and should be handled in the same way.

src/math/sqrt.rs Show resolved Hide resolved
@Lokathor Lokathor closed this Dec 7, 2019
@Lokathor Lokathor removed their assignment Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants