Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGFPE on ragdoll physics #784

Closed
yvt opened this issue Oct 28, 2018 · 0 comments
Closed

SIGFPE on ragdoll physics #784

yvt opened this issue Oct 28, 2018 · 0 comments
Labels
bug some feature is broken

Comments

@yvt
Copy link
Owner

yvt commented Oct 28, 2018

OpenSpades`spades::client::Corpse::Spring:
...
    0x100620aa0 <+400>:  mulss  %xmm6, %xmm2
    0x100620aa4 <+404>:  mulss  %xmm2, %xmm0
    0x100620aa8 <+408>:  mulss  %xmm7, %xmm0
    0x100620aac <+412>:  movss  (%rbx), %xmm2             ; xmm2 = mem[0],zero,zero,zero
    0x100620ab0 <+416>:  addss  %xmm0, %xmm2
    0x100620ab4 <+420>:  movss  %xmm2, (%rbx)
    0x100620ab8 <+424>:  movb   (%rcx), %al
    0x100620aba <+426>:  testb  %al, %al
    0x100620abc <+428>:  jne    0x100620e78               ; <+1384> [inlined] spades::Vector3::operator+=(spades::Vector3 const&) at Corpse.cpp:124
    0x100620ac2 <+434>:  movq   %rbx, -0x38(%rbp)
    0x100620ac6 <+438>:  movb   (%rdx), %al
    0x100620ac8 <+440>:  testb  %al, %al
    0x100620aca <+442>:  jne    0x100620e95               ; <+1413> [inlined] spades::Vector3::operator+=(spades::Vector3 const&) + 29 at Corpse.cpp:124
    0x100620ad0 <+448>:  movaps %xmm5, %xmm3
->  0x100620ad3 <+451>:  divps  %xmm4, %xmm3
    0x100620ad6 <+454>:  movaps %xmm0, %xmm2
    0x100620ad9 <+457>:  divss  %xmm7, %xmm2
(lldb) po (float __attribute__((ext_vector_type(4)))) $xmm4
(0.162500009, 0.162500009, 0, 0)

(lldb) po (float __attribute__((ext_vector_type(4)))) $xmm3
(0.00000334322158, 0.000603287306, 0, 0)

(lldb) po/x $mxcsr
0x00001f21

Apparently, LLVM's SLP vectorizer assumes that all floating point exceptions are masked (thus FP instructions never signal), which is the default setting, in order to take advantage of SIMD instructions even if the number of elements is fewer than the native SIMD width. However it isn't the case in this particular instance.

Curiously, when I inspect the value of mxcsr of every thread, only the threads started by GlobalDispatchThreadPool have its _MM_MASK_INVALID (0x80) bit cleared.

Confirmed on macOS 10.14 Mojave

@yvt yvt added the bug some feature is broken label Oct 28, 2018
@yvt yvt closed this as completed in cafb664 Oct 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug some feature is broken
Projects
None yet
Development

No branches or pull requests

1 participant