-
-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in glm_mat4_inv_sse2 #289
Comments
Thanks for the feedbacks, I'll take a look at asap, many thanks, if anyone could find the issue before me a PR would also be helpful |
This is ARM64 output (macOS M1 Max and Virtual Windows11_ARM MSVC are similar): Identity matrix:
Matrix (float4x4):
| 1.00000 0.00000 0.00000 0.00000 |
| 0.00000 1.00000 0.00000 0.00000 |
| 0.00000 0.00000 1.00000 0.00000 |
| 0.00000 0.00000 0.00000 1.00000 |
Translated matrix:
Matrix (float4x4):
| 1.00000 0.00000 0.00000 2.00000 |
| 0.00000 1.00000 0.00000 5.00000 |
| 0.00000 0.00000 1.00000 -15.00000 |
| 0.00000 0.00000 0.00000 1.00000 |
Model matrix:
Matrix (float4x4):
| 0.94142 0.11716 0.31623 2.00000 |
| 0.11716 0.76569 -0.63246 5.00000 |
| -0.31623 0.63246 0.70711 -15.00000 |
| 0.00000 0.00000 0.00000 1.00000 |
Inverse matrix:
Matrix (float4x4):
| 0.94142 0.11716 -0.31623 -7.21205 |
| 0.11716 0.76569 0.63246 5.42409 |
| 0.31623 -0.63246 0.70711 13.13643 |
| -0.00000 0.00000 -0.00000 1.00000 |
Transpose matrix:
Matrix (float4x4):
| 0.94142 0.11716 0.31623 -0.00000 |
| 0.11716 0.76569 -0.63246 0.00000 |
| -0.31623 0.63246 0.70711 -0.00000 |
| -7.21205 5.42409 13.13643 1.00000 | This is x86 output : (Virtual Windows11_ARM MSVC)Identity matrix:
Matrix (float4x4):
| 1.00000 0.00000 0.00000 0.00000 |
| 0.00000 1.00000 0.00000 0.00000 |
| 0.00000 0.00000 1.00000 0.00000 |
| 0.00000 0.00000 0.00000 1.00000 |
Translated matrix:
Matrix (float4x4):
| 1.00000 0.00000 0.00000 2.00000 |
| 0.00000 1.00000 0.00000 5.00000 |
| 0.00000 0.00000 1.00000 -15.00000 |
| 0.00000 0.00000 0.00000 1.00000 |
Model matrix:
Matrix (float4x4):
| 0.94142 0.11716 0.31623 2.00000 |
| 0.11716 0.76569 -0.63246 5.00000 |
| -0.31623 0.63246 0.70711 -15.00000 |
| 0.00000 0.00000 0.00000 1.00000 |
Inverse matrix:
Matrix (float4x4):
| 0.94142 0.11716 -0.31623 -7.21205 |
| 0.11716 0.76569 0.63246 5.42409 |
| 0.31623 -0.63246 0.70711 13.13643 |
| -0.00000 0.00000 -0.00000 1.00000 |
Transpose matrix:
Matrix (float4x4):
| 0.94142 0.11716 0.31623 -0.00000 |
| 0.11716 0.76569 -0.63246 0.00000 |
| -0.31623 0.63246 0.70711 -0.00000 |
| -7.21205 5.42409 13.13643 1.00000 | I got similar results. I'll try to test on my Intel Mac later. What's your environment may I ask? Also do you have latest cglm ? |
|
Thanks for the response. Using my example code again, against a fresh pull of the
|
Oh, and I'm on Windows 10 desktop. |
I suspect the compiler is converting mixed |
That was it! I removed /fp:fast, and am now getting identical results on SSE and non-SSE paths. Thank you, @recp and @gottfriedleibniz for all the troubleshooting and help! |
Ideally, functions which mix Given that the variable is used for extracting the sign of a float. I wonder if replacing those constants with its hex equivalent All uses of |
@deadwanderer thanks, @gottfriedleibniz good catch, many thanks! I used
+1 for this. It is better to make it work on compilers including msvc :)
Which pragmas may I ask? How to make it better with msvc pragmas on msvc? A PR to make improvements as you mentioned would also be awesome :) |
Finally had time to examine this, x8 = _mm_set_ps(-0.f, 0.f, -0.f, 0.f);
x9 = glmm_shuff1(x8, 2, 1, 2, 1); Resulting in with movaps xmm15, XMMWORD PTR __xmm@80000000000000008000000000000000
// ...
xorps xmm4, xmm15
// ...
xorps xmm13, xmm15
// ...
xorps xmm14, xmm15
// ...
xorps xmm12, xmm15 Without shufps xmm0, xmm10, 153 ; 00000099H
movaps xmm11, xmm4
movaps XMMWORD PTR x9$1$[rsp], xmm0
// ...
xorps xmm4, xmm10
// ...
xorps xmm14, xmm10
// ...
xorps xmm15, XMMWORD PTR x9$1$[rsp]
// ...
xorps xmm13, XMMWORD PTR x9$1$[rsp] Modifying the function to use
For reference: |
@gottfriedleibniz many thanks
any chance to get a PR? or I'll do it asap
thanks |
x8 = _mm_set_ps(-0.f, 0.f, -0.f, 0.f);
x9 = glmm_shuff1(x8, 2, 1, 2, 1); also to avoid similar scenarios, any idea why |
You can.
I suppose some byproduct of subexpression elimination with |
@gottfriedleibniz sure, I'll asap, thanks for the infos |
@gottfriedleibniz I created a PR for this, it would be nice to get a review: #291 @deadwanderer is it possible to test that PR with thanks |
Here are my results using that PR's changes:
Looks good to me! Thanks so much, @recp and @gottfriedleibniz! |
Seems good. Passes my tests. |
@deadwanderer @gottfriedleibniz many thanks, that PR is merged. |
PREFACE: I completely lack both the linear algebra and SIMD skills to fix this; I merely note the issue's existence!
I was calculating the inverse transpose of the model matrix to properly transform normals for my lighting shader, and noted that the inverse transpose was wrong when I calculated on the CPU with cglm. I compared the results with calculating it in the shader (OpenGL) and in GLM, and found that the issue occurred in
glm_mat4_inv
, when using SSE (thus following theglm_mat4_inv_sse2
path.When I manually commented out the SSE path and forced the non-SIMD path in
glm_mat4_inv
, the calculations came out correct, matching OpenGL shader and GLM results.Here is the code I used to calculate a sample model matrix, and its inverse transpose, along with the code to print out resulting matrices at each step (NOTE: I've flipped/corrected? the mat4 printing (#288), so the matrices print out column-major, to match glm's output which I was comparing. I also increased precision to 6 decimal places, again to match glm's output):
And here are the resulting values.
When I forced the non-SIMD path, I got the following results:
For comparison, here's the glm code I used:
And here are the glm results (which match the non-SIMD path of
glm_mat4_inv
, slight formatting differences notwithstanding):So pretty clearly, something goes wonky and wrong in the SSE path for
glm_mat4_inv
.The text was updated successfully, but these errors were encountered: