Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miri float ops are less precise than what the standard library expects #4208

Open
RalfJung opened this issue Feb 25, 2025 · 33 comments
Open
Assignees
Labels
A-shims Area: This affects the external function shims C-bug Category: This is a bug.

Comments

@RalfJung
Copy link
Member

RalfJung commented Feb 25, 2025

Tons of standard library float tests for ln, exp, and trigonometric functions fail with Miri's float non-determinism since they expect more precise results than what Miri delivers. I have temporarily disabled float non-determinism while we decide what to do.

A bunch of cases actually expect precise answers for certain cases, e.g.:
https://github.com/rust-lang/rust/blob/29166cd61711776e6d43239d6d18a0eafe6515b1/library/std/tests/floats/f64.rs#L468
https://github.com/rust-lang/rust/blob/29166cd61711776e6d43239d6d18a0eafe6515b1/library/std/tests/floats/f64.rs#L482-L483

Furthermore, there are quite a few cases where the standard library expects more precision than what Miri provides. assert_approx_eq! checks for a divergence of up to 1e-6, which is less than 10 ULP on a value of 1.0 represented as f32. Miri currently adds 16 ULP of error (on top of whatever error the host libm we use to implement these function has). Some of these tests chain multiple operations, such as 60.0f64.sinh().asinh(), so the error accumulates; some of these tests use values sufficiently larger than 1 that 16 ULP end up even exceeding 1e-5. Furthermore, some doctests actually check that the precision is within f32::EPSILON, which is 1 ULP on a value of 1.0; that is obviously never going to fly if Miri adds any kind of non-deterministic error.

I am open to reducing the error Miri applies, and happy for suggestions of what would make a good error scale. (I'd prefer it to be a power of 2, that makes the logic a lot simpler in Miri. ;) However, I also think some of these tests should change, specifically all tests that require either full equality or just 1 ULP error.

@tgross35 @Jules-Bertholet thoughts, opinions?

@Jules-Bertholet
Copy link
Contributor

IEEE 754 § 9.2.1 defines several special cases, including 0.0f64.exp(), where the result must be exact.

@RalfJung
Copy link
Member Author

Does it also define 5.0f64.exp2()?

@Jules-Bertholet
Copy link
Contributor

Does it also define 5.0f64.exp2()?

No.

@RalfJung
Copy link
Member Author

So some of the assert_eq! are still making assumptions about unspecified precision, then.

@Jules-Bertholet
Copy link
Contributor

Actually, scratch that. §9.2 says “a conforming operation shall return results correctly rounded for the applicable rounding direction for all operands in its domain”, which implies that technically, to be conforming an implementation can’t have any error ever. Of course, actual implementations fail this, which means they technically aren’t conforming.

I think that, beyond the special values explicitly called out in §9.2.1, some of this stuff (like 5.0f64.exp2() being exact) should be treated as a guarantee also, as no sane implementation is likely to break it.

@RalfJung
Copy link
Member Author

§9.2 covers which operations?

@Jules-Bertholet
Copy link
Contributor

All the optional mathematical operations. (The required math ops have a similar proviso)

@RalfJung
Copy link
Member Author

Ah, lol. Does anyone actually do this right? Even @eduardosm's softfloat implementations had 1 ULP of error.

Well, the goal of this non-determinism is to expose bugs in code that incorrectly relies on precision assumptions. I'll happily take the advice of a domain expert for what is the best strategy for that. :)

@Jules-Bertholet
Copy link
Contributor

The C standard is explicitly more forgiving, so maybe we should rely on it instead?

(Also, I’m not a domain expert, I just read the standard!)

@tgross35
Copy link
Contributor

tgross35 commented Feb 26, 2025

Since we bind the C functions, we wind up following the same rules. Aiui this is extremely forgiving, sin could be a linear interpolation from a four-entry LUT. In practice, most system libm implementations are likely to be within ~2ULP for most ops, with a few exceptions (musl for example https://wiki.musl-libc.org/mathematical-library).

Representing this well in Miri is tricky because, while it's technically allowed, an implementation that can't produce $e^0=1$ would not be a very good one. For the tests in std that are failing, it feels like maybe we don't want the non-determinism? Since it seems like the goal would be to test the platform's real implementation, as opposed to ensuring that our tests pass on worst case platforms. I think the non-determinism makes more sense for downstream users that aren't just testing the routines themselves.

@beetrees may also have some ideas here.

Ah, lol. Does anyone actually do this right? Even @eduardosm's softfloat implementations had 1 ULP of error.

There are some implementations that provide the cr_* (correctly rounded) version of various routines, I am working on porting some of these from CORE-MATH to our libm. This isn't very common though.

@RalfJung
Copy link
Member Author

FWIW results of 0 remain precise with the current scheme, since we add a relative error not an absolute one. (No idea what you meant by e^x = 0 though.)

@RalfJung
Copy link
Member Author

Since it seems like the goal would be to test the platform's real implementation, as opposed to ensuring that our tests pass on worst case platforms.

Note that some of these are doctests and hence can be read as guarantees that the implementation will be within EPSILON of the expected answer.

@tgross35
Copy link
Contributor

FWIW results of 0 remain precise with the current scheme, since we add a relative error not an absolute one. (No idea what you meant by e^x = 0 though.)

Whoops, that should have been $e^0=1$.

Note that some of these are doctests and hence can be read as guarantees that the implementation will be within EPSILON of the expected answer.

I don't really like that we do this either, it builds the wrong intuition that EPSILON is always a reasonable value to compare float operations to. Assuming this comes from blending the definition of machine epsilon with the mathematical usage of $\epsilon$. It is nice that it looks clean, but it would probably be a good idea to change these tests anyway (being 1ULP > EPSILON for |x| > 2.0).

Is it possible to enable the randomness for most things but exclude tests in coretests / std/tests that are intended to validate system float ops?

@RalfJung
Copy link
Member Author

Is it possible to enable the randomness for most things but exclude tests in coretests / std/tests that are intended to validate system float ops?

That's tricky. We could add a flag to disable that and just set that flag for the entire "run std tests in Miri" run, though.

But I want to first experiment with reducing the error Miri adds, e.g. to 4 ULP.

@RalfJung
Copy link
Member Author

The C standard is explicitly more forgiving, so maybe we should rely on it instead?

Oh, that actually has some quite trivial points in it:

"For each single-argument function f in <math.h> whose mathematical counterpart is symmetric
(even), f(-x) is f(x) for all rounding modes and for all x in the (valid) domain of the function. For
each single-argument function f in <math.h> whose mathematical counterpart is antisymmetric
(odd), f(-x) is -f(x) for the ISO/IEC 60559 rounding modes roundTiesToEven, roundTiesToAway,
and roundTowardZero, and for all x in the (valid) domain of the function."

I think LLVM actually violates this, since the result is effectively non-deterministic (if LLVM evaluates libm calls at compile-time), and therefore cannot satisfy that requirement.

But then F.10.1-3 has a nice list of guarantees that are being made. I would add to that that functions with a clearly defined output range (such as [-1, 1] for sin, cos) must produce output in that range. And I'd suggest we reduce the error down to 4 ULP to avoid breaking so many of the std tests. Just having any non-determinism here should already help, I think.

@LorrensP-2158466 do you want to give this a try? Unfortunately I had to disable the code you added in your first PR as it made a bunch of test fail; turns out this is more complicated than I thought. I would say in a first PR, deal just with the ones defined in src/tools/miri/src/intrinsics/mod.rs.

@RalfJung RalfJung added C-bug Category: This is a bug. A-shims Area: This affects the external function shims labels Feb 27, 2025
@LorrensP-2158466
Copy link
Contributor

Yes, I'd like to give it a try, and just to be sure you want me to:

  • reduce the error to 4 ULP and change the approx_eq! macro to account for that
  • clamp output values to their defined ranges according to the standard

Am I missing something in this thread?

@RalfJung
Copy link
Member Author

Yes, please also make sure the guarantees listed in F.10.1-3 of the C standard are met.

@LorrensP-2158466
Copy link
Contributor

Alright, will do!

@tgross35
Copy link
Contributor

tgross35 commented Feb 27, 2025

My read of the standard was that F.10 paragraph 16 walks back the statement in paragraph 3:

Recommended practice

16 ISO/IEC 60559 specifies correct rounding for the operations in the F.3 table of operations recommended by ISO/IEC 60559, and thereby preserves useful mathematical properties such as symmetry, monotonicity, and periodicity. The corresponding functions with (potentially) reserved cr_-prefixed names (7.33.8) do the same. The C functions in the table, however, are not required to be correctly rounded, but implementations should still preserve as many of these useful mathematical properties as possible.

The "should" wording here seems to imply that the implementations are not required to uphold symmetry/monotonicity/periodicy. Likely not a problem, in any case.

@RalfJung
Copy link
Member Author

What the heck, why would you have paragraph 16 directly contradict what paragraph 3 says?!?

Anyway, we'll just ignore both of them then. 🤷

@Jules-Bertholet
Copy link
Contributor

I don’t see a contradiction. Paragraph 3 specifies a subset of IEEE 754 requirements that C also requires, paragraph 16 notes that this subset is a proper subset.

@LorrensP-2158466
Copy link
Contributor

How do you think I should best implement this? I'm currently looking at this piece of code:

  | "sinf32"
  | "cosf32"
  | "expf32"
  | "exp2f32"
  | "logf32"
  | "log10f32"
  | "log2f32"
  => {
      let [f] = check_intrinsic_arg_count(args)?;
      let f = this.read_scalar(f)?.to_f32()?;
      // Using host floats (but it's fine, these operations do not have
      // guaranteed precision).
      let host = f.to_host();
      let res = match intrinsic_name {
          "sinf32" => host.sin(),
          "cosf32" => host.cos(),
          "expf32" => host.exp(),
          "exp2f32" => host.exp2(),
          "logf32" => host.ln(),
          "log10f32" => host.log10(),
          "log2f32" => host.log2(),
          _ => bug!(),
      };
      let res = res.to_soft();
      // Apply a relative error of 16ULP to introduce some non-determinism
      // simulating imprecise implementations and optimizations.
      // FIXME: temporarily disabled as it breaks std tests.
      // let res = apply_random_float_error_ulp(
      //     this,
      //     res,
      //     4, // log2(16)
      // );
      let res = this.adjust_nan(res, &[f]);
      this.write_scalar(res, dest)?;
  }

For example sin and cos:
sin(x) and cos(x) are in [-1, 1] this can be done with .minimum(-1).maximum(1) ( or clamp(-1, 1)) which also propagates any NaN's and is thus correct. But sin(±0) = 0 and cos(±0) = 1, I'm not quite sure what the best way is to account for this, because 0 means 0 and not 4 ULP around 0. And that is not considering the other math functions.

Should I:

  • split these match arms into their categories (sin and cos, exps together, logs together)?
  • Use something that exists in miri and I don't know about?
  • Something else I'm not seeing?

@RalfJung
Copy link
Member Author

RalfJung commented Feb 28, 2025

You'll have to add a few helper functions, and some of the logic has to move inside the match intrinsic_name. You can't directly use clamp as that's a host float function and once the error has been applied, we shouldn't get back to host floats, so you'll have to implement a clamp on soft-floats.

To return specific values for specific inputs, we need a separate codepath that kicks in before f.to_host(). I would do it something like this:

let fixed_res = match intrinsic_name {
  "sinf32" if f == 0.0 => Some(0.0),
  "cosf32" if f == 0.0 => Some(1.0),
  ...
  _ => None
};
let res = fixed_res.unwrap_or_else(|| {
  let host = f.to_host();
  let res = match intrinsic_name { ... };
  let res = res.to_soft();
  let res = apply_random_float_error_ulp(...);
  // clamp
  match intrinsic_name {
    "sinf32" => res.clamp(-1.0, 1.0),
    ...
    _ => res,
  }
};
let res = this.adjust_nan(res, &[f]);
 this.write_scalar(res, dest)?;

except that the logic for fixed_res and clamp has to be done with soft-floats.

Note that "4 ULP relative error around 0" is still 0. But around 1 there could be some error.

@LorrensP-2158466
Copy link
Contributor

Alright, I was going in that direction but wasn't quite sure. Thanks for the great help!

@rustbot claim

@LorrensP-2158466
Copy link
Contributor

made a bunch of test fail; turns out this is more complicated than I thought. I would say in a first PR, deal just with the ones defined in src/tools/miri/src/intrinsics/mod.rs

Sorry, just to clarify: Do I have to make the changes in the rustc repo or this one?

@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2025

It's probably better to do them in the rustc repo so that we can quick feedback whether the tests there still pass. But if you are having trouble setting that up, you can do them here if you want.

Over in the rustc repo, you can run Miri's own tests with ./x test miri -- float (where float is a filter, meaning you only run the tests that have float in their filename). You can run the relevant tests from the standard library with ./x miri core coretests -- f32 f64.

@LorrensP-2158466
Copy link
Contributor

So I implemented the logic and enabled the non-determinism again like so:

let res = fixed_float_value(intrinsic_name, f).unwrap_or_else(||{
    // Using host floats (but it's fine, these operations do not have
    // guaranteed precision).
    let host = f.to_host();
    let res = match intrinsic_name {
        "sinf32" => host.sin(),
        "cosf32" => host.cos(),
        "expf32" => host.exp(),
        "exp2f32" => host.exp2(),
        "logf32" => host.ln(),
        "log10f32" => host.log10(),
        "log2f32" => host.log2(),
        _ => bug!(),
    };
    let res = res.to_soft();

    // Apply a relative error of 4ULP to introduce some non-determinism
    // simulating imprecise implementations and optimizations.
    let res = apply_random_float_error_ulp(
        this,
        res,
        2, // log2(4)
    );

    match intrinsic_name {
        // sin and cos: [-1, 1]
        "sinf32" | "cosf32" => res.clamp(-1.0f32, 1.0f32), 
        // exp: [0, +INF]
        "expf32" | "exp2f32" => res.maximum(Single::ZERO), 
        _ => res,
    }
});
let res = this.adjust_nan(res, &[f]);
this.write_scalar(res, dest)?;

And I test it with this command: ./x miri --no-fail-fast core coretests std -- f32 f64, and to summarize:

  • integer decode tests fail because they do assert_eq!(2f32.powf(100.0).integer_decode(), (8388608, 77, 1)); but powf is non-deterministic so this will almost always fail. So, do you think I should remove this test and explain why?
  • The other tests fail where they expect a diff of 1e-6, but these fail even with a 4ULP relative error. Should I alter the approx_eq macro in the std tests to use the same one as in Miri?

I also added tests to the Miri float tests.
My Branch

@beetrees
Copy link
Contributor

beetrees commented Mar 5, 2025

  • integer decode tests fail because they do assert_eq!(2f32.powf(100.0).integer_decode(), (8388608, 77, 1)); but powf is non-deterministic so this will almost always fail. So, do you think I should remove this test and explain why?

This test is just using powf as a convenience functions to generate the float to test - replacing 2f32.powf(100.0) with 1.2676506e30 and 2f64.powf(100.0) with 1.2676506002282294e30 should make the test past (probably leave a comment explaining that the constants are just 2100).

  • The other tests fail where they expect a diff of 1e-6, but these fail even with a 4ULP relative error. Should I alter the approx_eq macro in the std tests to use the same one as in Miri?

Yes.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 5, 2025 via email

@LorrensP-2158466
Copy link
Contributor

Okey thanks!

But Just looking at the tests of std I noticed that using the the assert_approx_eq! macro of Miri requires a significant refactor. The 4 float testing modules (1 for each float type) rely heavily on the current macro which does EPSILON checking, it uses a lot of constants like:

  • TOL
  • TOL_IMPR or TOL_PREC
  • TOL_0, TOL_P2 (TOL_*)

Different tests use different constants and I don't know how I should change them.

@LorrensP-2158466
Copy link
Contributor

I would say just increase the error to 1e-5.

I thought about it but it still fails:

---- f32::test_acosh stdout ----

thread 'f32::test_acosh' panicked at library/std/tests/floats/f32.rs:642:5:
60.0 is not approximately equal to 60.000015 (threshold 1e-5, difference 1.5258789e-5)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
note: in Miri, you may have to set `MIRIFLAGS=-Zmiri-env-forward=RUST_BACKTRACE` for the environment variable to have an effect

---- f32::test_exp stdout ----

thread 'f32::test_exp' panicked at library/std/tests/floats/f32.rs:485:5:
148.41316 is not approximately equal to 148.4132 (threshold 1e-5, difference 3.0517578e-5)

@RalfJung
Copy link
Member Author

RalfJung commented Mar 5, 2025

I thought about it but it still fails:

Just make it 1e-4 for those tests. 🤷
For now, the key is to get some reasonable proposal that passes CI. In the PR, we'll discuss with a libs person how to best proceed.

@LorrensP-2158466
Copy link
Contributor

Works For me :), Thanks for helping!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-shims Area: This affects the external function shims C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

5 participants