Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f64::log precision change in 1.82 nightly #128386

Closed
ogoffart opened this issue Jul 30, 2024 · 24 comments · Fixed by #129715
Closed

f64::log precision change in 1.82 nightly #128386

ogoffart opened this issue Jul 30, 2024 · 24 comments · Fixed by #129715
Labels
A-floating-point Area: Floating point numbers and arithmetic C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@ogoffart
Copy link
Contributor

Our CI started to report an error today with nightly on Linux (other platforms are fine).
Simplified the test case is because of a small difference in the computation of log.

(I know it's bad practice to do equality tests on float, but I thought I'd report this behavior change anyway as i don't know if this was intentional).

Code

I tried this code:

fn main() {
    println!("Hello, world! {}", 9f64.log(3.) );
    assert_eq!(9f64.log(3.) , 2.)
}

I expected to see this happen:

Hello, world! 2 and no panics. As this happens in stable

Instead, this happened:

Hello, world! 2.0000000000000004
thread 'main' panicked at src/main.rs:3:5:
assertion `left == right` failed
  left: 2.0000000000000004
 right: 2.0

Version with regression

rustc 1.82.0-nightly (612a33f 2024-07-29)
(Linux only)

@ogoffart ogoffart added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jul 30, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 30, 2024
@apiraino
Copy link
Contributor

apiraino commented Jul 30, 2024

Bisection seems to indicate this comes after updating out compiler-builtins crate: 80d8270

I guess this is "expected"? cc: @nicholasbishop

@tgross35
Copy link
Contributor

This is probably a result of the linkage change in rust-lang/compiler-builtins#594 since the actual implementation hasn't changed in 5 years https://github.com/rust-lang/libm/blob/279e5f6abe0a2ca9066962d9ec894f0df1f417ac/src/math/log.rs..

Could you dwarfdump the final binaries and take a look whether system or compiler_builtins symbols are being linked?

@tgross35
Copy link
Contributor

tgross35 commented Jul 30, 2024

Obviously we want to have the correct values and will try to make things work, thanks bringing it up. Just note that the docs are extremely lenient here:

The precision of this function is non-deterministic. This means it varies by platform, Rust version, and can even differ within the same execution from one invocation to the next.

I think that if you are relying on exact results for CI then that may prove to be pretty platform-dependent.

@adrian17
Copy link

adrian17 commented Jul 30, 2024

FWIW, we just hit it in our CI too; looks like the runtime value no longer equals the compiler constant-folded result on our platforms.
Not sure how we'll handle it yet.

(EDIT: forgot to add: we noticed this with exp instead of log, but I'm assuming it's caused by the same change)

@saethlin
Copy link
Member

looks like the runtime value no longer equals the compiler constant-folded result on our platforms.

I believe that is #124364

I'm deliberately not labeling this issue as I-unsound because I think this issue is just noticing a specific instance of the above-linked issue.

@alexpyattaev
Copy link

Arguably, this is a true regression, and not a simple "log is fuzzy anyway". Reason being that the resulting error is 2 * f64::EPSILON, and not one ulp as advertised in the implementation of the log function.

FYI dwarfdump is attached. As far as I could read it, it is using the compiler's version of log function.
dwarfdump.txt.gz

Also interesting that in release build there is no issue.

@tgross35
Copy link
Contributor

tgross35 commented Jul 30, 2024

Thanks for posting that, relevant bit of the above:

0x000e7fbb:         DW_TAG_subprogram
                      DW_AT_low_pc      (0x0000000000058ff0)
                      DW_AT_high_pc     (0x0000000000058ffd)
                      DW_AT_frame_base  (DW_OP_reg6 RBP)
                      DW_AT_name        ("log")
                      DW_AT_decl_file   ("/rust/deps/compiler_builtins-0.1.114/src/macros.rs")
                      DW_AT_decl_line   (468)
                      DW_AT_external    (true)

Just to make sure, could you do the same with an older nightly and check that the system version was getting linked before?

@saethlin saethlin added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 30, 2024
@alexpyattaev
Copy link

I do not have a previous nightly left, but 1.81.0-beta.2 that does not exhibit the bug produces the following

dwarfdump.txt.gz

The specific section for log function that you refer to is not there.

@Amanieu
Copy link
Member

Amanieu commented Jul 31, 2024

I looked into it and it seems that the log function in compiler_builtins is turned into a strong symbol and libm.so is therefore not linked to.

Since this isn't the intended behavior, we probably have to revert rust-lang/compiler-builtins#594 and selectively only provide math symbols on targets that don't have a system libm.

@alexpyattaev
Copy link

Is system libm a good idea? It is rather unlikely that it would have support for recent CPUs in distros like debian... Not sure it would matter though (as log does not seem to be simd-friendly anyway).

@Amanieu
Copy link
Member

Amanieu commented Jul 31, 2024

An alternative would be to just fix our log implementation: it is different from the latest version in musl, so maybe updating it would fix this issue.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

Arguably, this is a true regression, and not a simple "log is fuzzy anyway". Reason being that the resulting error is 2 * f64::EPSILON, and not one ulp as advertised in the implementation of the log function.

Where is this advertised? Certainly not in our documentation (to the contrary, the documentation explicitly says that the precision is not guaranteed). Comments in the implementation are not stable guarantees.

So no, there's no regression here in the formal sense, though getting higher precision could still be nice.

@alexpyattaev
Copy link

Well I think it is not about accuracy but about consistency. The calculation may not be accurate as such, but it should at least be consistent, i.e. you should not be getting different answers depending on the mood of the optimizer. And if there is an error range that is allowed, it should be mapped out and announced to the user. If no error bound is guaranteed, this function could just as well always return 42.

@RalfJung
Copy link
Member

RalfJung commented Aug 8, 2024

We explicitly document that the optimizer can affect results of these functions. You may not be happy about that, but there's no bug here. If you want to change the specification for these operations to require them to be deterministic, that's a feature request that should go through our usual process -- like exploring what it would actually take to achieve that (AFAIK this is very non-trivial and might even require LLVM changes), and writing up the rationale and the arguments against and in favor in a pre-RFC or a t-libs ACP or something like that. And yes there are arguments against this; achieving this will likely require disabling some optimizations and hence cost performance. I don't have the expertise to help explore the design space here; you may find folks that can help with this on https://internals.rust-lang.org/ or on Zulip.

If no error bound is guaranteed, this function could just as well always return 42.

Please stop using strawman arguments, that is not constructive.

Rust has many places where we don't make hard guarantees for various reasons, but we still aim to provide reasonable outcomes and would consider it a bug if we failed to do so. The line between "reasonable" and "unreasonable" is fuzzy and subjective. Obviously, a log function that ignores its input would not be reasonable and would be considered a bug.

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2024

To anyone interested in concretely fixing this, consider submitting a PR to port the new log implementation from musl to our implementation in the libm crate, which is used by compiler-builtins.

Also it would be good to check if any other math functions in libm are similarly outdated compared to the latest implementation in musl.

@alexpyattaev
Copy link

I can look into porting the musl code at some point, could be fun to mess with some bits. Are there any guidelines specific to that kind of code? I imagine using platform-specific SIMD is a big no-no there, right?

@beetrees
Copy link
Contributor

beetrees commented Aug 9, 2024

If a new log implementation is going to be ported, it may as well be one that gives correctly rounded results in all cases. Looking at this paper, MUSL's implementation doesn't provide correctly rounded results in all cases, whereas it appears that LLVM libc has such an implementation for both f32 and f64; the CORE-MATH project also has a correctly rounded implementation of log (f32, f64). While Rust doesn't require a correctly rounded implementation of these functions, there's no reason not to use one when there's one available, all else being equal.

@beetrees
Copy link
Contributor

beetrees commented Aug 9, 2024

Regarding platform-specific SIMD: AFAIK it should be fine as long as there is an equivalent non-SIMD implementation for other platforms and for when compiling with the force-soft-floats feature. Basic arithmetic operations (+-*/) are fine regardless.

@beetrees
Copy link
Contributor

beetrees commented Aug 9, 2024

@rustbot label +A-floating-point

@rustbot rustbot added the A-floating-point Area: Floating point numbers and arithmetic label Aug 9, 2024
@alexpyattaev
Copy link

Ok I see. I'll try to dig into this properly at some point in the next week or two, will post here if I've given up.

@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Aug 12, 2024
@jdh8
Copy link

jdh8 commented Aug 14, 2024

Both f32::log and f64::log simply call their ln twice and divide. This will introduce another rounding error even if ln were correctly rounded. If we want to have a log whose precision comparable to other functions, we have to internally make a kernel function exceeding f64 precision and then use it across logarithmic functions.

See also my implementation of f32::log (only faithful rounding (error < 1 ulp) so far. I'm considering upgrading to correct rounding (error ≤ 0.5 ulp))
https://github.com/jdh8/metallic-rs/blob/62b9c3e670e487b6c1aea9947bff0b2ec29af564/src/f32/mod.rs#L456

@tgross35
Copy link
Contributor

There is ongoing work to improve our libm implementations, but we are not there yet. Since we plan to do better but it just isn't ready yet, and since we are close to the beta branch (<5 days), I think the best thing to do is just revert this for now and add it back once our implementations are more ready.

@Amjad50 could you post a revert of rust-lang/compiler-builtins#594 and rust-lang/compiler-builtins#577 as applicable, and open an issue (in builtins) to reapply it?

@Amjad50
Copy link
Contributor

Amjad50 commented Aug 26, 2024

Yah, I think I can just revert to the old expected behavior for windows/Mac/Linux platforms and only link include the math module for other targets.

I'll do it later today.

But I'm a bit confused as to why this happened,
since those PRs are making all the symbols 'weak', not sure how they turned 'strong', but in that sense it should have collided with the system math library.

@tgross35
Copy link
Contributor

tgross35 commented Aug 26, 2024

I don't think it was making the symbols weak that caused the problem, instead that we made them available at all. rust-lang/rust didn't get an updated compiler-builtins between when those PRs merged and about a month ago, IIRC.

They aren't getting turned strong, the system math symbols are just also weak (referencing /usr/lib/x86_64-linux-gnu/libm-2.39.a). I'm a bit fuzzy on the actual story is, but this is my understanding from digging before: the linker is allowed to choose any weak symbol when there are >1. Typically, it picks the first symbol it sees, which is usually the first library in its command invocation to provide the symbol. We pass builtins to the linker first, so unless the system symbol is strong we wind up using compiler-builtins by default.

The precision effects here are one concern, but another concern I have is that our routines may not be optimized as well as what is available from system libm. So I think it worth some effort to ensure that (which is in progress) before we effectively make our symbols be used by default everywhere, or otherwise change things so our symbols are truly a fallback rather than more or less the default.

Thanks for being willing to pick this up. I agree that we only need to change back on platforms that provide their own symbols.

Amjad50 added a commit to Amjad50/compiler-builtins that referenced this issue Aug 27, 2024
This fixes such as (rust-lang/rust#128386)
where, our implementation is being used on systems where there is
already `math` library and its more performant and accurate.

So with this change, linux will go back to the previous behavior and not
include these functions, windows and apple were generally not affected.

Looking at the targets we have builtin now in rust, everything else is
probably good to have the math symbols.

> A note on the above, the `hermit` os uses `libm` directly for itself,
> but I think its Ok to keep providing math in `compiler_builtin` for it,
> its technically the same implementation either from `compiler_builtin`
> or `hermit-builtins`.

Signed-off-by: Amjad Alsharafi <26300843+Amjad50@users.noreply.github.com>
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 29, 2024
…r=tgross35

Update `compiler_builtins` to `0.1.123`

Includes rust-lang/compiler-builtins#680 and fixes rust-lang#128386.

Fixed by not including math symbols of `compiler_builtins` into any `unix` target or `wasi`, old behavior is restored

r? tgross35
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 29, 2024
…r=tgross35

Update `compiler_builtins` to `0.1.123`

Includes rust-lang/compiler-builtins#680 and fixes rust-lang#128386.

Fixed by not including math symbols of `compiler_builtins` into any `unix` target or `wasi`, old behavior is restored

r? tgross35
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Aug 29, 2024
Rollup merge of rust-lang#129715 - Amjad50:update-compiler-builtins, r=tgross35

Update `compiler_builtins` to `0.1.123`

Includes rust-lang/compiler-builtins#680 and fixes rust-lang#128386.

Fixed by not including math symbols of `compiler_builtins` into any `unix` target or `wasi`, old behavior is restored

r? tgross35
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Aug 30, 2024
Update `compiler_builtins` to `0.1.123`

Includes rust-lang/compiler-builtins#680 and fixes rust-lang/rust#128386.

Fixed by not including math symbols of `compiler_builtins` into any `unix` target or `wasi`, old behavior is restored

r? tgross35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.