Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move sqrt from std to core #63455

Closed
wants to merge 3 commits into from
Closed

Move sqrt from std to core #63455

wants to merge 3 commits into from

Conversation

Lokathor
Copy link
Contributor

Per #50145 (comment) and #50145 (comment)

  • This PR moves the sqrt method of the f32 and f64 types from libstd to libcore.
  • I literally just cut and pasted the methods from one place to the other, it was simpler than I expected it to be.
  • I didn't alter any tests, though naturally any tests that check methods available on a type in std should also still work if the method is available on the type in core.
  • I didn't add and new tests that check that the functionality is available in just core, because I don't know where those tests go. There's a whole lot of folders and no tests/ folder in the project root like a normal crate has. Someone please tell me.
  • I didn't put this behind a feature flag because I don't know how to do that, and also I'm not sure that a feature flag is necessary. If a feature flag is necessary someone also please explain how to do that.

cc @varkor, cc @alexcrichton

@rust-highfive
Copy link
Collaborator

r? @shepmaster

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 11, 2019
@clarfonthey
Copy link
Contributor

This can generate a libmath call on systems without hardware sqrt, I believe, which is why it was in std before.

@Lokathor
Copy link
Contributor Author

According to an issue on the compiler-builtins crate (link), that will apparently never happen with any current target.

If it does on some new target that Rust is ported to, it's as "simple" as patching compiler-builtins to expose that function in libm.

@nikic
Copy link
Contributor

nikic commented Aug 11, 2019

We should probably resolve #62729 before exposing more things from libcore, as this will likely run into the same issue.

@roblabla
Copy link
Contributor

I'll try to get a PR to #62729 rolling, exposing the necessary symbols for soft-float impls. The PR will probably need to add a number of new symbols anyways, I can add sqrt then too.

@shepmaster shepmaster added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 12, 2019
@shepmaster
Copy link
Member

Too deep knowledge for me; reassigning to someone from libs...

r? @Amanieu

@rust-highfive rust-highfive assigned Amanieu and unassigned shepmaster Aug 12, 2019
@alexcrichton
Copy link
Member

I'm not certain that we're ready to do this just yet. There's a number of reasons that we haven't started moving so many intrinsics to libcore yet. While the compiler-builtins fallback implementation is a prerequisite it has unfortunate interactions with symbol visibility which also causes issues. Native platform libm libraries (or the equivalent) typically have quite optimized routines for various functions (especially things like sqrt). The fallback implementation present in compiler-builtins hasn't been verified to be of adequate performance compared to these native implementations.

By putting a symbol in compiler-builtins we're shadowing the system libm which prevents usage of the native optimized routine if it's available, which basically means that this could result in a performance regression. AFAIK no performance analysis has been done to determine the results of this movement.

Note that on some platforms LLVM will expand the intrinsic inline (such as x86/x86_64 most likely). These will have no performance regression even if Rust's compiler-builtins is slower than the native libm because the native libm isn't called. The worrisome part is platforms where the intrinsic isn't expanded inline and it's lowered to a library call.

@Lokathor
Copy link
Contributor Author

I thought you said in the compiler-builtins issue that LLVM wouldn't do that? I guess you meant "LLVM won't do that on this particular target"?

@alexcrichton
Copy link
Member

LLVM in general will sort of just do what's best at the time. If LLVM can lower an intrinsinc call like these to an inline instruction sequence, it will. Otherwise if LLVM can't it will lower it to a function call. For x86/x86_64 it seems to be dependent on whether the SSE feature is enabled. Although it's on by default it's not always on. I suspect that LLVM's logic for lowering is different for other architectures like ARM or AArch64.

@Lokathor
Copy link
Contributor Author

Note that f64 sqrt requires sse2, not just sse, though yes both CPU feature sets are enabled by default for rust builds targeting i686 and x86_64. Similarly, ARM and AArch64 have a sqrt instruction available in common, modern target setups. As does wasm32.

Focusing back on point: What concrete action can be done to advance this PR towards being accepted?

Regarding the concerns about possibly bad performance on some targets:

  • On Stable, std calls intrinsics::sqrtf32.
  • On Nightly, you can do the same in no_std with the core_intrinsics unstable feature.
  • If using intrinsics::sqrtf32 (or any other float intrinsic) has some sort of performance problems on some platforms then (as far as I can tell) that's already a problem on those platforms. This does not give us a new problem.
  • If it only might have a performance problem on some platforms isn't that solved separately with other PRs once a problem is actually detected? Either to the libm crate (adjusting its code) and/or compiler-builtins crate (not exporting the libm crate's function so that the system libm function is used).

I certainly want to do this right, but I also want to try and have a clear path forward, to keep making progress, because "floating point in core" has been stalled for many months now.

@alexcrichton
Copy link
Member

For moving this forward, are there parts of my previous comment that are unclear? Sorry I don't have a ton of time to manage this, so I can help out sometimes but I cannot personally lead the charge and make sure all the ducks are in a row for all PRs.

@Lokathor
Copy link
Contributor Author

  • It's not clear how we could ever fully verify against a potential performance regression on all possible targets ahead of time, since we don't really have easy access to all those targets. How much checking is enough checking?
  • This PR doesn't make a change compiler-builtins or libm, so if those crates shadow the system libm on some platforms and cause performance problems then doesn't the system libm already get shadowed even without this change? Wouldn't we have already gotten a bug report? This is the part I'm the most confused about.

Apart from those questions, I did do a quick check of core-only sqrt on all our official targets by placing

#![feature(core_intrinsics)]
#![no_std]

pub fn sqrt_f32(f: f32) -> f32 {
  unsafe { core::intrinsics::sqrtf32(f) }
}

pub fn sqrt_f64(f: f64) -> f64 {
  unsafe { core::intrinsics::sqrtf64(f) }
}

into godbolt, using Nightly

  • rustc 1.38.0-nightly (60960a260 2019-08-12)

and the compiler args

  • --edition 2018 -C opt-level=3 --target=TARGET

for each TARGET value.

Here's the highlights, though you can also look at the spreadsheet too I guess:

  • 36 of the 96 targets don't actually build and emit ASM on godbolt. "can't find crate for core", iOS files missing, enscripten files missing, things like that. All of these are Tier 2, 2.5, or 3.
    • Note: For asmjs-unknown-emscripten I was able to get some JS output but it seemed like complete nonsense that wasn't performing a sqrt at all, so I counted it in the "doesn't seem to build right on godbolt" group.
  • Some Tier 2 targets compile just fine but DO NOT have a hardware instruction generated.
  • In all other cases, including all Tier 1 targets, wasm32-unknown-unknown, x86 without SSE, and every variant of ARM that did build, a hardware instruction was compiled.

The targets where a call to sqrt is performed are as follows:

  • mips-unknown-linux-musl // MIPS Linux with MUSL
  • mipsel-unknown-linux-musl // MIPS (LE) Linux with MUSL
  • powerpc-unknown-linux-gnu // PowerPC Linux
  • riscv32imac-unknown-none-elf // Bare RISC-V (RV32IMAC ISA)
  • riscv32imc-unknown-none-elf // Bare RISC-V (RV32IMC ISA)
  • riscv64gc-unknown-none-elf // Bare RISC-V (RV64IMAFDC ISA)
  • riscv64imac-unknown-none-elf // Bare RISC-V (RV64IMAC ISA)
  • thumbv7em-none-eabihf // Bare Cortex-M4F, M7F, FPU, hardfloat (this had an instruction for f32 but a call for f64 so we're gonna count it as a call)

If "Bare RISC-V" means "no OS" like I think it does, then they don't really have an existing system libm to get shadowed by us.

Since the libm crate's sqrt comes from the MUSL source I think we're solid on the MUSL targets.

That just leaves PowerPC Linux... I don't know what libm they use, but if it's either MUSL or OpenLibm then that's the same sqrt formula as we're using, so there's no potential for regression there.

In summary, I strongly suspect that we're in the clear.

@alexcrichton
Copy link
Member

How much checking is enough checking?

For me, I'd like to personally have faith that someone has investigated this to the degree that they know more about it than me, because I know that I at least personally do not know how this works on all platforms. I think doing something in terms of checking is required and we can see where we feel after that.

This PR doesn't make a change compiler-builtins or libm, so if those crates shadow the system libm on some platforms and cause performance problems then doesn't the system libm already get shadowed even without this change? Wouldn't we have already gotten a bug report? This is the part I'm the most confused about.

The compiler-builtins crate doesn't currently export symbols on main platforms that override those found in system libm implementations. It only does that on some platforms (not all) today and otherwise the symbols exported on all platforms are rustc-specific.

In summary, I strongly suspect that we're in the clear.

Thanks for investigating all that! One thing I'm also worried about though is what happens when the default codegen settings are tweaked. For example if sse2 is disabled on x86/x86_64 or if there's some corresponding feature that enables the usage here for other platforms. And yeah I'm primarily worried about platforms like mips/powerpc/riscv for the glibc variants, because I'd expect that the libm on those platforms have asm implementations that are likely faster than ours and could result in a performance regression.

@Lokathor
Copy link
Contributor Author

Lokathor commented Aug 15, 2019

The compiler-builtins crate doesn't currently export symbols on main platforms that override those found in system libm implementations. It only does that on some platforms (not all) today and otherwise the symbols exported on all platforms are rustc-specific.

But none of that is affected by this PR, right?

Here's my logic:

  • The list of symbols that compiler-builtins exports on any particular platform is the exact same, with or without this PR, because that's a totally separate crate from libcore.
  • So if compiler-builtins can shadow a symbol and cause a performance regression on a particular platform, it's already doing that (or not) on each particular platform, even without this PR being merged.
  • So any potential performance loss is already happening, and we should have already had a bug report or something.
  • All this PR appears to change is if sqrt is visible at the core or std level

That's why I'm confused about this concern.

One thing I'm also worried about though is what happens when the default codegen settings are tweaked. For example if sse2 is disabled on x86/x86_64 or if there's some corresponding feature that enables the usage here for other platforms.

What rust calls the i586 target is an x86 target with SSE stuff turned off, in which case your ASM looks more like this:

example::sqrt_f32:
        flds    4(%esp)
        fsqrt
        retl

example::sqrt_f64:
        fldl    4(%esp)
        fsqrt
        retl

Because the intel chips supported sqrt before SSE was added, and they've just stayed backwards compatible all this time, so they can still do it with the feature off.

And yeah I'm primarily worried about platforms like mips/powerpc/riscv for the glibc variants, because I'd expect that the libm on those platforms have asm implementations that are likely faster than ours and could result in a performance regression.

Unfortunately, as mentioned, I don't have access to a mips, powerpc, or riscv device and I don't know anyone who does.

  • Where do we go to ask someone to build and test rust for us on these targets?
  • If, after some reasonable period of time (3-6 months maybe), we can't contact anyone to test some code on those targets, then can we just merge this? And if a bug report about a regression does come in after that we can adjust compiler-builtins/libm at that point once we have a specific bug report to look at?

@tesuji
Copy link
Contributor

tesuji commented Aug 16, 2019

@Lokathor, you might want to have a look in https://internals.rust-lang.org/t/gcc-compile-farm-for-rustc/9511 . At the moment, they have mips, powerpc computers. See the list for more info: https://cfarm.tetaneutral.net/machines/list/.

@alexcrichton
Copy link
Member

Sorry I think this got lost in my inbox.

This whole discussion affects this PR because support in compiler-builtins is a prerequisite to landing this PR. You're right that this PR doesn't affect symbols of compiler-builtins but this feature wouldn't work on platforms where the intrinsic is lowered to a sqrt function call, becuase that symbol is not present in compiler-builtins and libm is not guaranteed to be linked.

I also unfortunately do not know how to test on mips/powerpc/riscv/etc. A lack of resources to test though I don't think is a great reason to merge this PR. The purpose of review is to head off bug reports about performance regressions and update libm if necessary ahead of time.

One thing I'm also realizing now is that if we add a sqrt symbol to compiler-builtins then it will also affect all C/C++ code linked into a Rust application, so if C/C++ code uses the sqrt symbol on AArch64, for example, this runs a risk of regressing performance there even though the platform has a native instruction (which the native platform sqrt symbol probably uses but our libm implementation doesn't yet)

@Lokathor
Copy link
Contributor Author

Hmm, can the compiler-builtins and/or libm distribution packaged with rust utilize nightly features like core and std can? If so, we could cfg in the use of nightly intrinsics, or even just inline asm as necessary.

@alexcrichton
Copy link
Member

Yes, both crates can use nightly features:

@Lokathor
Copy link
Contributor Author

I had noticed that, but wasn't clear on if that was just someone using it for nightly wanting it to be better for them or if even stable folks got those perks.

So, I think that we need to

  • improve libm to always give the best possible sqrt, without fail, on every arch
  • even then, expose our libm through compiler-builtins as little as possible so that we don't override system libm

Is it correct to assume that we already export stuff from compiler-builtins as little as possible? Can we just focus the work on libm?

@alexcrichton
Copy link
Member

improve libm to always give the best possible sqrt, without fail, on every arch

That's certainly always good to have regardless! I was under the impression the purpose of this PR and what you wanted to do was to avoid this step. We haven't had this happen a lot before so I've been trying to figure this out as I go along. I think we've concluded that this isn't required in the specific case of sqrt for Rust on most platforms due to native instructions, but it is required on some architectures and it may be required by C referencing the sqrt symbol.

All in all I think while it may have been possible originally to avoid updating libm, I don't think we can avoid that now. I think we need to make sure that if we move a symbol from libstd to libcore that we've improved our libm implementation to be more than "we just copied it from musl".

Is it correct to assume that we already export stuff from compiler-builtins as little as possible? Can we just focus the work on libm?

Yes and no. There's no way we can export something from compiler-builtins to only be used by Rust code due to how lowering to libcalls works in LLVM. (we can't configure symbol names, LLVM already picked them for us). In that sense if we export anything from compiler-builtins it is exposed for all C/C++/Rust code.

That being said we don't just blindly reexport everything from libm, so we're still actually choosing what to export.

@Lokathor
Copy link
Contributor Author

I would say that the main reason for this PR is that right now math libraries can't also easily be no_std because there's no floating ops in core, and if you just call libm yourself in Stable rust you don't get the LLVM intrinsic support and your code slows to a crawl.

So sqrt, and eventually other floating ops, need to make it into core so that it's sorted out well just once, not having each math lib try to solve the problem on its own.

There's no way we can export something from compiler-builtins to only be used by Rust code due to how lowering to libcalls works in LLVM. (we can't configure symbol names, LLVM already picked them for us).

I understand that the exports are global to the whole program across all languages, but does cfg still apply? Like, if we export sqrt on one compile target, it doesn't have to be exported on all other compile targets right? Because I mostly meant that per-compile-target we should make sure the cfg settings are exporting as little as possible.

@Lokathor
Copy link
Contributor Author

The concern is that we don't want to accidentally conflict with any existing libm package shipped with the system (which is probably written in C and/or ASM).

If the target is bare-metal, then it has no existing anything (right?), so I don't think we can have a conflict problem.

@alexcrichton
Copy link
Member

I think the best way forward here is to do something like:

  • Update libm to use the llvm intrinsic on nightly on select targets
  • Export sqrt from compiler builtins
  • Land this PR moving the function to libcore

That would mean we have support in for the symbol being optimized on relevant platforms and it's optimized in rust code in libcore. Platforms like mips and powerpc may be slow but all the infrastructure is in place to speed it up so we can handle that as it comes up.

How's that sound?

@Lokathor
Copy link
Contributor Author

Yes, sounds good.

I will attempt to PR an update to libm later today. I'm less familiar with compiler-builtins, so I'll do that one second.

@ollie27
Copy link
Member

ollie27 commented Aug 25, 2019

The soft-float ARM targets like arm-unknown-linux-gnueabi also generate calls to sqrt and sqrtf: godbolt. As do x86 targets with -Ctarget-feature=+soft-float: godbolt.

Wouldn't overriding sqrt and sqrtf break C code that relies on those functions setting errno?

@Lokathor
Copy link
Contributor Author

Ah bl is the instruction, branching to the sqrt function. I should have remembered that. Good catch!

As to the other question: The libm crate attempts to cause FP exceptions, though at the moment it probably doesn't do it right. It doesn't use errno at all.

Please check the open libm PR for any other possible errors if you have time: rust-lang/libm#222

@alexcrichton
Copy link
Member

Hm yes that's correct that we may not be providing a 100% standards-compliant implementation of sqrt and sqrtf. @ollie27 do you know of any cases where errno is used after a sqrt?

@ollie27
Copy link
Member

ollie27 commented Aug 30, 2019

do you know of any cases where errno is used after a sqrt?

I don't have any specific examples but error handling is the main difference between the LLVM intrinsics and libm functions.

Would it be so bad to expect #![no_std] binaries to supply the sqrt and sqrtf symbols themselves the same way fmod, fmin, fmax, memcpy, memmove, memset etc. are required? I think as long as there is an easy to import library supplying the libm symbols it shouldn't be too much trouble.

@alexcrichton
Copy link
Member

The expectation of symbols is basically what already happens today in the sense that you can't use sqrt from core, you need to bring your own.

Symbols like memcpy and such are intended to be provided by compiler-builtins on appropriate platforms and things like fmod are sort of just unfortunate accidents of how things work.

I think though that the difference between LLVM's sqrt intrinsic and the actual sqrt symbol means that we can't export a sqrt symbol from compiler-builtins in theory. It would be good though to investigate the existing symbols exported from compiler-builtins and see if any of them violate this guarantee. Ideally we'd just simply get LLVM to rewrite the sqrt intrinsic to our own custom symbol name like __rust_sqrt so we wouldn't have to deal with any of this.

@Alexendoo Alexendoo added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 11, 2019
@Alexendoo
Copy link
Member

Labelling this as blocked (rust-lang/libm#222) per #63455 (comment)

@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Oct 18, 2019
@bors
Copy link
Contributor

bors commented Nov 9, 2019

☔ The latest upstream changes (presumably #63871) made this pull request unmergeable. Please resolve the merge conflicts.

@Mark-Simulacrum
Copy link
Member

I'm going to close this as I think it is indeed blocked and until the underlying issues here are worked out there's not much point in keeping it open. I also feel like with regards to the platform testing, in theory that'll start getting answered soonish with the "Tiers" RFC currently open; I suspect that will hopefully help answer the level of support we should provide given inability to test.

Regardless, though, at this time it sounds like there's not any way to make progress on this PR here -- the work needs to happen elsewhere first.

@Mark-Simulacrum Mark-Simulacrum added S-blocked-closed and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Nov 17, 2019
@Lokathor
Copy link
Contributor Author

Agreed.

@jyn514 jyn514 added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-blocked-closed labels Mar 10, 2021
lopopolo added a commit to artichoke/artichoke that referenced this pull request Aug 6, 2021
`spinoso-math` documents itself to be `no_std` and has a `std` feature,
but the crate did not declare the `#![no_std]` pragma.

Many `f64` math functions require `std` and are not available in `core`.
Despite missing support in `core`, without the crate properly declaring
itself as `no_std`, tests with `--no-default-features` incorrectly
passed since `std` is still linked in.

It does not seem likely that `f64` math will be enabled in `core` any
time soon, so make `spinoso-math` require `std`. See rust-lang/rust#63455.

This commit bumps the version of `spinoso-math` to 0.2.0 since removing
a feature is a breaking change (and did break the build of
`artichoke-backend`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Blocked on something else such as an RFC or other implementation work. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.