Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math_rem test fails on the i686 target #243

Closed
tranzystorekk opened this issue Dec 2, 2024 · 39 comments · Fixed by #245
Closed

math_rem test fails on the i686 target #243

tranzystorekk opened this issue Dec 2, 2024 · 39 comments · Fixed by #245

Comments

@tranzystorekk
Copy link
Contributor

When testing jaq 2.0.0 for Void Linux, the test run fails on the i686 platform: https://github.com/void-linux/void-packages/actions/runs/12129949574/job/33819327834?pr=50494

@tranzystorekk
Copy link
Contributor Author

Backtrace after recompiling tests in debug:

---- math_rem stdout ----
thread 'math_rem' panicked at /builddir/jaq-2.0.0/jaq-core/src/lib.rs:221:9:
assertion failed: out.eq(ys)
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: jaq_core::Filter<F>::yields
             at /builddir/jaq-2.0.0/jaq-core/src/lib.rs:221:9
   4: funs::common::yields
             at ./tests/common/mod.rs:14:5
   5: funs::common::give
             at ./tests/common/mod.rs:22:5
   6: funs::math_rem
             at ./tests/funs.rs:134:5
   7: funs::math_rem::{{closure}}
             at ./tests/funs.rs:82:14
   8: core::ops::function::FnOnce::call_once
             at /builddir/rust-1.82.0/library/core/src/ops/function.rs:250:5

@wader
Copy link
Contributor

wader commented Dec 3, 2024

Looks like this test

jaq/jaq-json/tests/funs.rs

Lines 134 to 138 in 1ac39f8

give(
json!(null),
"try (4000000001 % 0) catch .",
json!("cannot calculate 4000000001 % 0"),
);
hmm guessing 4000000001 overflows etc 32 bit system? catches error message with different number?

@tranzystorekk
Copy link
Contributor Author

I also suspected that, but 4000000001 should fit in 32 bits just fine, and the previous tests also deal with this value without failure;

so it seems this is about the logic behind div-by-zero error catching?

@wader
Copy link
Contributor

wader commented Dec 3, 2024

jaq seems to use isize so i think 4000000001 won't fitt signed 32 bit, that might be the issue. Could you try change 4000000001 to something smaller? (and also change in the error string)

@tranzystorekk
Copy link
Contributor Author

Ah, you're right, I didn't take signed integer into account.

I patched the numbers to make them much smaller in that test and now it succeeds, but it makes me worry that the other tests with that value are only passing by accident...

@01mf02
Copy link
Owner

01mf02 commented Dec 3, 2024

@tranzystorekk, I would be happy to accept a PR from you with the smaller numbers.

@tranzystorekk
Copy link
Contributor Author

tranzystorekk commented Dec 3, 2024

I'm wondering if switching the int repr in jaq-json to i64 would be a viable option (I saw the comment that isize allows ints to easily index arrays, but it seems the trade-off isn't as good if int is also comprehensively tested against arithmetic operations)

simonrupf pushed a commit to simonrupf/jaq that referenced this issue Dec 9, 2024
Reduces large isize to remain below maximum for signed 32 bit int,
resolves 01mf02#243
@GaetanLepage
Copy link

It fails on x86_64-darwin too.

@wader
Copy link
Contributor

wader commented Dec 10, 2024

@GaetanLepage does a smaller number work? could try #245

@GaetanLepage
Copy link

GaetanLepage commented Dec 10, 2024

@GaetanLepage does a smaller number work? could try #245

Well, after more testing, I found out that the problematic line was give(json!(null), "2.1 % 0 | isnan", json!(true));.
Commenting this one made the whole test succeed.

@wader
Copy link
Contributor

wader commented Dec 10, 2024

Oh ok, what does cargo run -- -n '2.1 % 0 | ., isnan, isinfinite' say?

@GaetanLepage
Copy link

Oh ok, what does cargo run -- -n '2.1 % 0 | ., isnan, isinfinite' say?

null
false
false

@GaetanLepage
Copy link

Whereas it is

null
true
false

on aarch64-darwin

@wader
Copy link
Contributor

wader commented Dec 10, 2024

Huh, is this macOS or some other darwin based os?

@01mf02 any idea what is going on? known float behaviour?

@GaetanLepage
Copy link

Huh, is this macOS or some other darwin based os?

MacOS, running on an aarch64 M2 CPU, but with rosetta2 emulation of x86.

@01mf02
Copy link
Owner

01mf02 commented Dec 10, 2024

@01mf02 any idea what is going on? known float behaviour?

I'm sorry, I have no idea what is happening here. If it helps, the documentation for what should happen is here: https://doc.rust-lang.org/std/primitive.f64.html#impl-Rem-for-f64. Perhaps, @GaetanLepage, could you tell us what 2.1 / 0 | ., isnan, isinfinite yields on both architectures that you tested?

@wader
Copy link
Contributor

wader commented Dec 10, 2024

Hmm confusing, i got i M3 macbook so i guess i could install rosetta somehow. I wonder what other language implementation does, could you try this with rosetta?

$ node -p '2.1 % 0'
NaN
$ node -p '2.1 / 0'
Infinity

@GaetanLepage
Copy link

GaetanLepage commented Dec 11, 2024

Perhaps, @GaetanLepage, could you tell us what 2.1 / 0 | ., isnan, isinfinite yields on both architectures that you tested?

In the meantime, our darwin builder got updated to an M4 machine.
Here are the results on this new machine:

$ cargo run -- -n '2.1 % 0 | ., isnan, isinfinite

# aarch64-darwin
null
true
false

# x86_64-darwin (rosetta2)
null
false
false

and

$ cargo run -- -n '2.1 / 0 | ., isnan, isinfinite

# aarch64-darwin
null
false
false

# x86_64-darwin (rosetta2)
null
false
true

@wader
Copy link
Contributor

wader commented Dec 11, 2024

Tried to expand and translate the jq code into what it will end up in rust:

2.1 % 0 | isnan
2.1 % 0 | . == nan
2.1 % 0 | . == 0 / 0
(2.1 % 0) == 0 / 0
(2.1 % 0.0) == 0.0 / 0.0

$ cargo run -- -n '(2.1 % 0.0) == 0.0 / 0.0'
true
$ cargo run --target x86_64-apple-darwin -- -n '(2.1 % 0.0) == 0.0 / 0.0'
false

But seems i'm missing something:

$ rustc -o test <(echo 'fn main() { dbg!(f64::total_cmp(&(2.1 % 0.0), &(0.0 / 0.0))); }') && ./test
[/dev/fd/11:1:13] f64::total_cmp(&(2.1 % 0.0), &(0.0 / 0.0)) = Equal
$ rustc --target x86_64-apple-darwin -o test <(echo 'fn main() { dbg!(f64::total_cmp(&(2.1 % 0.0), &(0.0 / 0.0))); }') && ./test
[/dev/fd/11:1:13] f64::total_cmp(&(2.1 % 0.0), &(0.0 / 0.0)) = Equal

@01mf02 01mf02 closed this as completed in 18aa0a8 Dec 13, 2024
@01mf02 01mf02 reopened this Dec 13, 2024
@01mf02
Copy link
Owner

01mf02 commented Dec 13, 2024

@wader, this rustc <(...) pattern is really nice.

@GaetanLepage: Can you post the output of rustc -o test <(echo 'fn main() { dbg!(2.1 / 0.0); }') && ./test on your two platforms, just to make sure that the problem is on the Rust side and not on the jaq side? If they yield different outputs on your two platforms, could you perhaps file an issue on the Rust issue tracker, given that multiplication and division should be consistent across platforms?

@GaetanLepage
Copy link

GaetanLepage commented Dec 13, 2024

@GaetanLepage: Can you post the output of rustc -o test <(echo 'fn main() { dbg!(2.1 / 0.0); }') && ./test on your two platforms, just to make sure that the problem is on the Rust side and not on the jaq side? If they yield different outputs on your two platforms, could you perhaps file an issue on the Rust issue tracker, given that multiplication and division should be consistent across platforms?

This gives me inf on all x86_64-linux, aarch64-linux, x86_64-darwin and aarch64-darwin.

@wader
Copy link
Contributor

wader commented Dec 13, 2024

Dugg a bit more

With this float_cmp

fn float_cmp(left: f64, right: f64) -> Ordering {
    if left == 0. && right == 0. {
        Ordering::Equal
    } else {
        let lb = left.to_bits() as i64;
        let rb = right.to_bits() as i64;
        dbg!(&left, &right);
        dbg!(lb, rb);
        f64::total_cmp(&left, &right)
    }
}

I get this:

$cargo run --target x86_64-apple-darwin -- -n '0.0/0.0 == 0.0%0.0'
[jaq-json/src/lib.rs:892:9] &left = NaN
[jaq-json/src/lib.rs:892:9] &right = NaN
[jaq-json/src/lib.rs:893:9] lb = -2251799813685248 # NOTE: on aarch64 this is same value as below
[jaq-json/src/lib.rs:894:9] rb = 9221120237041090560
false

So i wonder if one of the NaN-values gets "normalized" etc? ex the NaN payload/flags gets zeroed or something?

Still can't reproduce with same-ish code as simple rust code.

algitbot pushed a commit to alpinelinux/aports that referenced this issue Dec 14, 2024
- bump from 1.6.0 to 2.0.1
- remove cargo patch
- add patch for tests on 32bit architecture,
  see: 01mf02/jaq#243
- package MIT license
@01mf02
Copy link
Owner

01mf02 commented Dec 16, 2024

@wader, thanks for digging! Taking on your approach, may I ask you, @GaetanLepage, to post the output of the following?

rustc -o test <(echo 'fn main() { dbg!(f64::total_cmp(&(2.1 / 0.0), &f64::INFINITY)); }') && ./test

I hope that this will yield different outputs for different architectures ... that would explain jaq's behaviour, and we could file an issue at rust-lang.

@wader
Copy link
Contributor

wader commented Dec 16, 2024

One thought that struck me: could compiling a simple program with rustc and building via cargo differ some way that changes float behaviour? some jaq crates dependency pokes at something? different compile optimization levels?

@GaetanLepage
Copy link

may I ask you, @GaetanLepage, to post the output of the following?

On all architectures, I get:

[/dev/fd/63:1:13] f64::total_cmp(&(2.1 / 0.0), &f64::INFINITY) = Equal

@wader
Copy link
Contributor

wader commented Dec 16, 2024

More digging

The two different NaNs as u64 are these:

[jaq-json/src/lib.rs:890:5] lb = 18444492273895866368
[jaq-json/src/lib.rs:891:5] rb = 9221120237041090560

If one inspect them using my favorite website they are:
https://float.exposed/0xfff8000000000000 signed NaN
https://float.exposed/0x7ff8000000000000 "positive" NaN

Update: seems float.exposed it buggy, for 18444492273895866368 one have to provide it in the "Raw Decimal Integer Value" input field.

I don't really follow where and how in the jaq code we end up with a signed NaN but f64::total_cmp will see them as no equal so maybe a solution to this and possible other NaN-differences is to do:

fn float_cmp(left: f64, right: f64) -> Ordering {
    if (left == 0. && right == 0.) || (left.is_nan() && right.is_nan()) {
        Ordering::Equal
    } else {
        f64::total_cmp(&left, &right)
    }
}

Are the more places to look for this? and it is ok to normalize NaN values like this? maybe only consider signness difference equal?

@01mf02
Copy link
Owner

01mf02 commented Dec 16, 2024

One thought that struck me: could compiling a simple program with rustc and building via cargo differ some way that changes float behaviour? some jaq crates dependency pokes at something? different compile optimization levels?

I am not aware of such behaviour. On the other hand, for integers, I know that compiling with --release disables overflow checks.

@01mf02
Copy link
Owner

01mf02 commented Dec 16, 2024

Are the more places to look for this? and it is ok to normalize NaN values like this? maybe only consider signness difference equal?

It may be OK to normalize NaN values like this, but first, I would like to understand what is going on here. I'd really like to reproduce the platform-specific difference without jaq-related code.

@01mf02
Copy link
Owner

01mf02 commented Dec 16, 2024

Let's get the big cannon out. Can you post the output of the following program?

fn main() {
    let t: f64 = "2".parse().unwrap();
    let z: f64 = "0".parse().unwrap();
    let vals = [t / z, z / z, t % z, z % z];
    for v in vals {
        println!("{v} (bytes = {})", v.to_bits());
    }
    for v1 in &vals {
        for v2 in &vals {
            print!("{:?}\t", v1.total_cmp(v2));
        }
        println!();
    }
}

On my computer, this yields:

inf (bytes = 9218868437227405312)
NaN (bytes = 18444492273895866368)
NaN (bytes = 18444492273895866368)
NaN (bytes = 18444492273895866368)
Equal	Greater	Greater	Greater	
Less	Equal	Equal	Equal	
Less	Equal	Equal	Equal	
Less	Equal	Equal	Equal	

@wader
Copy link
Contributor

wader commented Dec 16, 2024

On macbook M3

$ rustc --target x86_64-apple-darwin  -o test test.rs && ./test
inf (bytes = 9218868437227405312)
NaN (bytes = 18444492273895866368)
NaN (bytes = 9221120237041090560)
NaN (bytes = 9221120237041090560)
Equal	Greater	Less	Less
Less	Equal	Less	Less
Greater	Greater	Equal	Equal
Greater	Greater	Equal	Equal

$ rustc --target aarch64-apple-darwin  -o test test.rs && ./test
inf (bytes = 9218868437227405312)
NaN (bytes = 9221120237041090560)
NaN (bytes = 9221120237041090560)
NaN (bytes = 9221120237041090560)
Equal	Less	Less	Less
Greater	Equal	Equal	Equal
Greater	Equal	Equal	Equal
Greater	Equal	Equal	Equal

(not exactly sure how x86_64 gets executed on arm but i guess it's transparent thru rosetta, i get same result if do arch --x86_64 ./test which seems to the explicit way of doing it)

@01mf02
Copy link
Owner

01mf02 commented Dec 17, 2024

@wader: YES! I think we've got it. Just to be sure, can you check one last thing, namely whether replacing t and z with 2.0 and 0.0 yields the same results?

@wader
Copy link
Contributor

wader commented Dec 17, 2024

You mean like this?

    let t: f64 = 2.0;
    let z: f64 = 0.0;

hmm then i get:

$  rust_float rustc --target x86_64-apple-darwin  -o test test.rs && ./test
inf (bytes = 9218868437227405312)
NaN (bytes = 9221120237041090560)
NaN (bytes = 9221120237041090560)
NaN (bytes = 9221120237041090560)
Equal	Less	Less	Less
Greater	Equal	Equal	Equal
Greater	Equal	Equal	Equal
Greater	Equal	Equal	Equal

$  rust_float rustc --target aarch64-apple-darwin  -o test test.rs && ./test
inf (bytes = 9218868437227405312)
NaN (bytes = 9221120237041090560)
NaN (bytes = 9221120237041090560)
NaN (bytes = 9221120237041090560)
Equal	Less	Less	Less
Greater	Equal	Equal	Equal
Greater	Equal	Equal	Equal
Greater	Equal	Equal	Equal

Now i'm confused. After some poking around it seems this is what differs:

$ rustc --target x86_64-apple-darwin -o test <(echo 'fn main() { dbg!(("0".parse::<f64>().unwrap() / "0".parse::<f64>().unwrap()).to_bits()); }') && ./test
[/dev/fd/11:1:13] ("0".parse::<f64>().unwrap() / "0".parse::<f64>().unwrap()).to_bits() = 18444492273895866368

$ rustc --target x86_64-apple-darwin -o test <(echo 'fn main() { dbg!((0.0f64 / 0.0f64).to_bits()); }') && ./test
[/dev/fd/11:1:13] (0.0f64 / 0.0f64).to_bits() = 9221120237041090560

but i don't see any difference in float etc:

$ rustc --target aarch64-apple-darwin -o test <(echo 'fn main() { dbg!("0".parse::<f64>().unwrap().to_bits()); }') && ./test
[/dev/fd/11:1:13] "0".parse::<f64>().unwrap().to_bits() = 0

$t rustc --target aarch64-apple-darwin -o test <(echo 'fn main() { dbg!(0.0f64.to_bits()); }') && ./test
[/dev/fd/11:1:13] 0.0f64.to_bits() = 0

Hmm is this the compiler doing some transformations etc?

@01mf02
Copy link
Owner

01mf02 commented Dec 17, 2024

@wader, great! So parsing seems to have an impact here.

I opened a bug report on rust-lang. Let's see what people have to say there.

@workingjubilee
Copy link

The aarch64 and x86-64 CPUs use different NaNs.

@workingjubilee
Copy link

workingjubilee commented Dec 17, 2024

This is hidden from you by LLVM doing constant propagation, as LLVM normalizes to a particular NaN, irrespective of architecture, if you use the literal float. Otherwise, it is not.

@saethlin
Copy link

saethlin commented Dec 17, 2024

I think this was misdiagnosed/mis-minimized, and the actual bug here is that jaq is using total_cmp (not that parsing is funky, the examples here are just poking codegen in ways that induce different NaN generation). NaN generation in Rust is effectively nondeterminstic (https://rust-lang.github.io/rfcs/3514-float-semantics.html or https://doc.rust-lang.org/stable/std/primitive.f32.html#nan-bit-patterns) and since total_cmp cares about which NaN is being compared, you're going to have problems like this one.

I don't know what your floating point comparison semantics are supposed to be, but I think you want all NaNs to compare less than all NaNs, or something like that.

@workingjubilee
Copy link

workingjubilee commented Dec 17, 2024

It seems you attempted to use total_cmp as a way of getting out of checking for NaN, but you should probably just handle cases explicitly. It's possible there's some other handling of the following missing truth table entries that you may want, but I don't know what they would be without knowing your other constraints.

fn float_cmp(left: f64, right: f64) -> Ordering {
    match (left.partial_cmp(&right), left.is_nan(), right.is_nan()) {
        (None, true, true) => Ordering::Equal,
        (None, false, true) => todo!(),
        (None, true, false) => todo!(),
        (Some(v), false, false) => v,
    }
}

@01mf02
Copy link
Owner

01mf02 commented Dec 19, 2024

I don't know what your floating point comparison semantics are supposed to be, but I think you want all NaNs to compare less than all NaNs, or something like that.

This is actually what happens in jq. I have just made a PR (#248) that proposes to adopt this behaviour after all.

@01mf02 01mf02 closed this as completed in ba24e38 Dec 19, 2024
@01mf02
Copy link
Owner

01mf02 commented Dec 19, 2024

@workingjubilee, thanks for your suggestion, that's what I settled on in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants