-
-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing tests on .NET7 #2117
Comments
Do we have the output images? Are they badly off? If so that would likely indicate a JIT error which we can report upstream. |
Visually, I cant tell the difference, but the report says 30% of the pixels are different. I could not pinpoint whats going wrong, since it only happens on Release mode. |
I cannot replicate this locally. 😔 I had a look at the latest preview release notes and it would take a smarter person than I to narrow down a potential cause without replication. |
I can replicate it locally, but it seems not to happen always (but very often). |
Intermittent! Whoa that’s weird! |
To add another weird observation to the list: When executing the test via visual studio they always pass. Only executing the tests via command line fails occasionally. |
I think it's got something to do with the way we unpremultiply We actually get 1 of two possible values from both the scalar and simd versions. If the input Now what I think is happening is that when we are converting the values to bytes on some chipsets |
Maybe handling of NaN/Infinite has changed in .Net7.0? It's still weird that this does not always happens. edit: the only thing I could find NaN related changed in .Net7.0 seems to be the I cant see, how this could affect the Skew Processor, though. |
We need to add a bunch of diagnostic information to the output when something fails. I'm going to write some code in our tests to do this. |
@brianpopow I managed to capture the environmental values for the Windows failure.
|
Yep. Definitely different. What I don't understand is why |
The |
@EgorBo thanks for the feedback, happy to see the I think there is another issue with the two |
Saw this on Twitter, gave some replies there going to paste them here as well... At first glance, I don't see anything obvious. The algorithm for division hasn't changed here in a long time. Conversion of Could I get pointers to the test and code that's causing issues and I can look at disassembly dumps and try to reproduce locally? I have a few different machines I can test on across Linux, MacOS, and Windows; just not intimately familiar with the best way to test just the item failing here. |
Hi @tannergooding, thanks for trying to help here, I am out of idea's here and any suggestion on how to find this issue is very appreciated. The failing test is
note: You need to add net7.0 to the test project I can see this happening on my machine, but its very rare. Maybe 1 out of 10 run's this occurs. If you want, I can create a branch with a new sub test project with just the failing test. Not sure If this would be more helpful then just executing the test via dotnet test. Let me know if that would be helpful. edit: also note: this only seems to happen in Release mode and also we are not targeting Arm64 in the CI yet. |
Thanks @tannergooding for having a look at this! @brianpopow The intermittent replication on your machine has me thoroughly stumped. I cannot understand why would be the case at all. Setting the environmental setting
That's a surprise! Perhaps we should be looking at a high performance strategy that allows us to avoid |
I have updated to |
Well that's interesting! And it only happens during release? |
Yes, only in Release mode. |
A bug was found and fixed for floating-point corruption on MacOS: dotnet/runtime#75440 But this was reproing on Windows as well, so its likely not the root cause. |
|
I have tried |
Ok, so I could reliably repro on RC1, but I cannot repro on RC2 (nightly build There were ~161 commits between these two: dotnet/runtime@release/7.0-rc1...release/7.0-rc2 Of those commits, the most likely to impact this would have been:
Both of these should only have impacted It would be great if someone else could validate that it is fixed as well, and if so I can dig a little bit deeper to finalize the root cause. -- Noting that the latest .NET 7 RC2 nightly build may also require the 6.0.10 SDK which I'm not sure where to get atm. I ended up changing the |
Hmmm, maybe I spoke too soon. I'm still seeing the following in a clean build, but it won't repro after having been hit once:
|
I've root caused the bug. There looks to be a bug with This is the disassembly for push rdi
push rsi
sub rsp,38h
vzeroupper
mov rdi,rcx
mov rsi,rdx
vmovupd xmm0,xmmword ptr [rsi]
vmovupd xmmword ptr [rsp+28h],xmm0
mov rcx,7FF8BAA93EA8h
mov edx,156h
call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE (07FF919E3B470h)
mov rax,1A9DCFC5410h
mov rax,qword ptr [rax]
vmovupd xmm0,xmmword ptr [rsp+28h]
vmulps xmm0,xmm0,xmmword ptr [rax+8]
vmovupd xmmword ptr [rsi],xmm0
vmovupd xmm0,xmmword ptr [rsi]
mov rax,1A9DCFC5418h
mov rax,qword ptr [rax]
vaddps xmm0,xmm0,xmmword ptr [rax+8]
vmovupd xmmword ptr [rsi],xmm0
vmovupd xmm0,xmmword ptr [rsi]
vxorps xmm1,xmm1,xmm1
mov rax,1A9DCFC5410h
mov rax,qword ptr [rax]
vmovupd xmm2,xmmword ptr [rax+8]
vmaxps xmm0,xmm0,xmm1
vminps xmm0,xmm0,xmm2
vmovupd xmmword ptr [rsi],xmm0
vcvttss2si eax,dword ptr [rsi]
mov byte ptr [rdi+2],al
vcvttss2si eax,dword ptr [rsi+4]
mov byte ptr [rdi+1],al
vcvttss2si eax,dword ptr [rsi+8]
mov byte ptr [rdi],al
vcvttss2si eax,dword ptr [rsi+0Ch]
mov byte ptr [rdi+3],al
add rsp,38h
pop rsi
pop rdi
ret This is the codegen for the same method in .NET 7: push rdi
push rsi
sub rsp,38h
vzeroupper
mov rdi,rcx
mov rsi,rdx
vmovupd xmm0,xmmword ptr [rsi]
vmovupd xmmword ptr [rsp+28h],xmm0
mov rcx,7FF887B04688h
mov edx,155h
call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE (07FF8E6C4C890h)
mov rax,22D8700EF80h
mov rax,qword ptr [rax]
add rax,8
vmovupd xmm0,xmmword ptr [rsp+28h]
vmulps xmm0,xmm0,xmmword ptr [rax]
vmovupd xmmword ptr [rsi],xmm0
vmovupd xmm0,xmmword ptr [rsi]
mov rdx,22D8700EF88h
mov rdx,qword ptr [rdx]
vaddps xmm0,xmm0,xmmword ptr [rdx+8]
vmovupd xmmword ptr [rsi],xmm0
vxorps xmm0,xmm0,xmm0
vmaxps xmm0,xmm0,xmmword ptr [rsi]
vminps xmm0,xmm0,xmmword ptr [rax]
vmovupd xmmword ptr [rsi],xmm0
vmovss xmm0,dword ptr [rsi]
vcvttss2si eax,xmm0
mov byte ptr [rdi+2],al
vmovss xmm0,dword ptr [rsi+4]
vcvttss2si eax,xmm0
mov byte ptr [rdi+1],al
vmovss xmm0,dword ptr [rsi+8]
vcvttss2si eax,xmm0
mov byte ptr [rdi],al
vmovss xmm0,dword ptr [rsi+0Ch]
vcvttss2si eax,xmm0
mov byte ptr [rdi+3],al
add rsp,38h
pop rsi
pop rdi
ret You'll note that these are basically identical (you can ignore the vmovupd xmm0, [vector4] ; read vector4 into xmm0
vxorps xmm1, xmm1, xmm1 ; zero xmm1
vmovupd xmm2, [maxBytes] ; read maxBytes into xmm2
vmaxps xmm0, xmm0, xmm1 ; vector4 = max(vector4, zero)
vminps xmm0, xmm0, xmm2 ; vector4 = min(vector4, maxBytes) But .NET 7 is doing (simplified): vxorps xmm0, xmm0, xmm0 ; zero xmm0
vmaxps xmm0, xmm0, [vector4] ; vector4 = max(zero, vector4)
vminps xmm0, xmm0, [maxBytes] ; vector4 = min(vector4, maxBytes) This might not seem like much, but it has big impact for I believe this is non-deterministic because it somewhat depends on TieredCompilation and when the method becomes optimized. It more reliably reproduces if Going to see if I can figure out why the JIT is deciding to swap operands here and will try to get a fix up. In the interim, the simple workaround here should be to change |
@tannergooding: very happy to see that you found the root cause of this. Thanks a lot for working on this issue and providing a fix! edit:
I will make a PR for that. |
…tative (#75683) * Adding a regression test for SixLabors/ImageSharp#2117 * Ensure Max/Min for floating-point on x86/x64 are not handled as commutative * Applying formatting patch
…tative (dotnet#75683) * Adding a regression test for SixLabors/ImageSharp#2117 * Ensure Max/Min for floating-point on x86/x64 are not handled as commutative * Applying formatting patch
…t handled as commutative (#75761) * Adding a regression test for SixLabors/ImageSharp#2117 * Ensure Max/Min for floating-point on x86/x64 are not handled as commutative * Applying formatting patch Co-authored-by: Tanner Gooding <tagoo@outlook.com>
closing this now with #2230 merged |
…ndled as commutative (#75772) * Ensure Max/Min for floating-point on x86/x64 are not handled as commutative (#75683) * Adding a regression test for SixLabors/ImageSharp#2117 * Ensure Max/Min for floating-point on x86/x64 are not handled as commutative * Applying formatting patch * Directly access the gtHWIntrinsicId field since the helper method doesn't exist in .NET 6 * Fixing a regression test to use the GetElement API available in .NET 6
Prerequisites
DEBUG
andRELEASE
modeImageSharp version
Current main branch
Other ImageSharp packages and versions
none
Environment (Operating system, version and so on)
Windows 10
.NET Framework version
Description
The following tests fail on .NET 7.0 Windows only. Introduced in 7.0.100-preview.4.22252.9:
Skew_IsNotBoundToSinglePixelType<Bgra32>(provider: TestPattern100x50[Bgra32], x: 20, y: 10)
Skew_IsNotBoundToSinglePixelType<Bgra32>(provider: TestPattern100x50[Bgra32], x: -20, y: -10)
The issue seems to be only occurring in Release mode. Also it seems somehow related to the PixelFormat
Bgra32
Environment Info
In addition on Mac .NET 7.0 we are seeing the following triggered by a a call to `System.Runtime.Intrinsics.Vector256.Create(Single value)`
Environment Info
Steps to Reproduce
Run the tests with .NET 7.0 in Release mode.
Images
No response
The text was updated successfully, but these errors were encountered: