Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExplicitConversion_FromSingle failing due to NaN != NaN #103347

Open
stephentoub opened this issue Jun 12, 2024 · 14 comments
Open

ExplicitConversion_FromSingle failing due to NaN != NaN #103347

stephentoub opened this issue Jun 12, 2024 · 14 comments
Assignees
Labels
arch-wasm WebAssembly architecture area-System.Numerics blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Jun 12, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=768837&view=results
Build error leg or test failing: System.Tests.HalfTests.ExplicitConversion_FromSingle
Pull request:

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "FAIL.*System.Tests.HalfTests.ExplicitConversion_FromSingle",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=768837
Error message validated: [FAIL.*System.Tests.HalfTests.ExplicitConversion_FromSingle]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 8/8/2024 7:39:18 PM UTC

[14:40:21] info: [FAIL] System.Tests.HalfTests.ExplicitConversion_FromSingle(f: NaN, expected: NaN)
[14:40:21] info: Assert.Equal() Failure: Values differ
[14:40:21] info: Expected:   NaN
[14:40:21] info: Actual:     NaN
[14:40:21] info:    at System.AssertExtensions.Equal(Half expected, Half actual)
[14:40:21] info:    at System.Tests.HalfTests.ExplicitConversion_FromSingle(Single f, Half expected)
[14:40:21] info:    at System.Object.InvokeStub_HalfTests.ExplicitConversion_FromSingle(Object , Span`1 )
[14:40:21] info:    at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
[14:40:21] info: [FAIL] System.Tests.HalfTests.ExplicitConversion_FromSingle(f: NaN, expected: NaN)
[14:40:21] info: Assert.Equal() Failure: Values differ
[14:40:21] info: Expected:   NaN
[14:40:21] info: Actual:     NaN
[14:40:21] info:    at System.AssertExtensions.Equal(Half expected, Half actual)
[14:40:21] info:    at System.Tests.HalfTests.ExplicitConversion_FromSingle(Single f, Half expected)
[14:40:21] info:    at System.Object.InvokeStub_HalfTests.ExplicitConversion_FromSingle(Object , Span`1 )
[14:40:21] info:    at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

Report

Build Definition Test Pull Request
868508 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #109602
865074 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #107230
864857 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #109645
862539 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #109501
858442 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #109261
855817 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #108992
854066 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution
853397 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #109229
843247 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #108788
842800 dotnet/runtime WasmTestOnChrome-ST-System.Runtime.Tests.WorkItemExecution #105206

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 3 10
@stephentoub stephentoub added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Jun 12, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jun 12, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-infrastructure-libraries
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@lewing lewing added this to the 9.0.0 milestone Jun 19, 2024
@lewing lewing removed the untriaged New issue has not been triaged by the area owner label Jun 19, 2024
@ericstj
Copy link
Member

ericstj commented Jul 30, 2024

This issue is matching to tests jobs which log test names. Need to further constrain the match.

@jeffhandley jeffhandley added the arch-wasm WebAssembly architecture label Aug 6, 2024
@mkhamoyan mkhamoyan removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Aug 7, 2024
@mdh1418
Copy link
Member

mdh1418 commented Aug 7, 2024

Just hit this in #106040 for WasmTestOnChrome-ST-System.Runtime.Tests
config: net9.0-browser-Release-wasm-Mono_Release-WasmTestOnChrome

[18:27:21] info: [STRT] System.Tests.HalfTests.ExplicitConversion_FromSingle(f: NaN, expected: NaN)
[18:27:21] info: [FAIL] System.Tests.HalfTests.ExplicitConversion_FromSingle(f: NaN, expected: NaN)
[18:27:21] info: Assert.Equal() Failure: Values differ
[18:27:21] info: Expected:   NaN
[18:27:21] info: Actual:     NaN
[18:27:21] info:    at System.AssertExtensions.Equal(Half expected, Half actual)
[18:27:21] info:    at System.Tests.HalfTests.ExplicitConversion_FromSingle(Single f, Half expected)
[18:27:21] info:    at System.Object.InvokeStub_HalfTests.ExplicitConversion_FromSingle(Object , Span`1 )
[18:27:21] info:    at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
[18:27:21] info: [STRT] System.Tests.HalfTests.ExplicitConversion_FromSingle(f: NaN, expected: NaN)
[18:27:21] info: [FAIL] System.Tests.HalfTests.ExplicitConversion_FromSingle(f: NaN, expected: NaN)
[18:27:21] info: Assert.Equal() Failure: Values differ
[18:27:21] info: Expected:   NaN
[18:27:21] info: Actual:     NaN
[18:27:21] info:    at System.AssertExtensions.Equal(Half expected, Half actual)
[18:27:21] info:    at System.Tests.HalfTests.ExplicitConversion_FromSingle(Single f, Half expected)
[18:27:21] info:    at System.Object.InvokeStub_HalfTests.ExplicitConversion_FromSingle(Object , Span`1 )
[18:27:21] info:    at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

Not sure why it didn't match in build analysis

@mdh1418 mdh1418 added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Aug 7, 2024
@tannergooding
Copy link
Member

The test here should likely be filtered out on WASM for the time being, with a tracking issue against it.

WASM isn't technically doing anything wrong here, normalizing NaN is fully allowed by the IEEE 754 spec. However, it is undesirable and not recommended for most cases and should typically be avoided if possible.

WASM also has instructions that should guarantee the underlying bits are preserved, so it should be possible for it to be preserved and match the behavior of other targets (both for RyuJIT and Mono).

@akoeplinger
Copy link
Member

Hm the weird thing here is that this seems to be an intermittent failure. I've checked and we definitely have passing runs of the test on the exact same config.

@tannergooding
Copy link
Member

Is there perhaps a difference in the WASM version or configuration options for the failing platform as compared to others?

Maybe some machine specific issue that's only triggering in some scenarios?

@akoeplinger
Copy link
Member

akoeplinger commented Aug 8, 2024

the tests are running in containers on the same helix queue so even the VM/machine should be the same...

Passing run: https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-106013-merge-6f4bade3cf974edcb8/WasmTestOnChrome-ST-System.Runtime.Tests/1/console.3fb2056a.log?helixlogtype=result

Failing run: https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-106040-merge-0b77eabb49f848cab5/WasmTestOnChrome-ST-System.Runtime.Tests/1/console.5edeb661.log?helixlogtype=result

The only difference is the order of the tests, but I'm having a hard time imagining how that could be relevant

@akoeplinger
Copy link
Member

akoeplinger commented Aug 8, 2024

hm one interesting thing is that the failing run reports two tests that failed i.e. five total, but the passing run only ran four NaN tests... that seems just a log quirk, the testResult.xml only contains four NaN tests and two of them failing.

@akoeplinger
Copy link
Member

akoeplinger commented Aug 9, 2024

According to Kusto test data this started happening on or before 2024-06-06.
I also found that it's happening on Windows-based containers too so it's not related to the underlying OS.

Given it's only happening intermittently and doesn't appear to be a blocker for 9.0 I'm moving it to 10.0

@akoeplinger akoeplinger modified the milestones: 9.0.0, 10.0.0 Aug 9, 2024
@lambdageek
Copy link
Member

Couple of things could be responsible which might be explained by the test order:

  1. jiterp might be kicking in
  2. browser JIT might be kicking in

But I'd like to push back against hte idea that bitwise comparison of two NaNs is a good idea. there are many NaN bitpatterns; all are semantically indistinguishable. Reasonable code should not depend on the bit pattern staying identical after computation

@tannergooding
Copy link
Member

tannergooding commented Aug 23, 2024

But I'd like to push back against hte idea that bitwise comparison of two NaNs is a good idea. there are many NaN bitpatterns; all are semantically indistinguishable. Reasonable code should not depend on the bit pattern staying identical after computation

This is a core thing frequently depended upon for SIMD, NaN boxing, etc. It also follows the IEEE 754 "recommended" guidelines, even if it's not strictly required by the IEEE 754 spec.

The WASM 1.0 and 2.0 specs similarly:

  • distinguish canonical NaN from arithmetic NaN (a nan with payload n)
  • provide explicit APIs for getting the bits of any NaN
    • fbits(+/-nan(n)) is defined as fsign(+/-) * 1^expon(n) * ibits(sinif(n))
    • which is to say, its doing BitConverter.SingleToInt32Bits or BitConverter.DoubleToInt64Bits or vice-versa
  • recommend that operators propagate NaN payloads (matching the IEEE 754 recommendation)
  • provides explicit support for specifying the payload of a NaN

However, the WASM specs do notably explicitly allow non-determinism for fneg, fabs, and fcopysign; explicitly specifying that the sign and payload of the result are non-deterministic

So while a given implementation is technically allowed to do "other things" (such as always canonicalizing NaN), its highly irregular to do so and that's why we have tests that validate such conversions are preserving bits and propagating payloads as expected. -- It's also notably cheaper to propagate and to preserve NaN as is then it is to canonicalize (particularly for things like fbits), so there's really no reason for a browser to deviate here

@pavelsavara
Copy link
Member

@tannergooding @lambdageek what are the next steps here ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly architecture area-System.Numerics blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

No branches or pull requests