Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with CUDA in a sysimage #1314

Closed
ericphanson opened this issue Jan 10, 2022 · 2 comments · Fixed by #1319
Closed

Segfault with CUDA in a sysimage #1314

ericphanson opened this issue Jan 10, 2022 · 2 comments · Fixed by #1319
Labels
bug Something isn't working upstream Somebody else's problem.

Comments

@ericphanson
Copy link

ericphanson commented Jan 10, 2022

Workaround: take CUDA.jl out of the sysimage 😄. Also seems to work on Julia 1.6.5 (but not 1.7.1).

Cause: during sysimage creation we create broken code, according to @vchuravy who helped me try to debug a segfault on #gpu on Slack. He pointed to

async_send(data::Ptr{Cvoid}) = ccall(:uv_async_send, Cint, (Ptr{Cvoid},), data)

and

julia> @code_llvm CUDA.async_send(C_NULL)
;  @ /root/.julia/packages/CUDA/Rhl18/lib/cudadrv/execution.jl:143 within `async_send`
define i32 @julia_async_send_1861(i64 zeroext %0) #0 {
top:
  %1 = call i32 inttoptr (i64 139633100932974 to i32 (i64)*)(i64 %0)
  ret i32 %1
}

and said

You are reading from TLS on a foreign thread which is illegal

I don't understand most of that to be honest.

This is with CUDA#master (or v3.6.1), Julia 1.7.1, PackageCompiler v1.7.7.


Debugger session:

[New Thread 0x7f8ae0a22700 (LWP 660)]
[Thread 0x7f8ae0a22700 (LWP 660) exited]
[New Thread 0x7f8ae0a22700 (LWP 661)]

Thread 20 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8ae0a22700 (LWP 661)]
0x00007f8bab618905 in jlcapi_async_send_69899 () from /deps.so
(gdb) bt
#0  0x00007f8bab618905 in jlcapi_async_send_69899 () from /deps.so
#1  0x00007f8b30e2045a in ?? ()
   from /root/.julia/artifacts/b618eadab05abb3bef21da981a51b6f4f0183b27/lib/libcuda.so
#2  0x00007f8b30e88106 in ?? ()
   from /root/.julia/artifacts/b618eadab05abb3bef21da981a51b6f4f0183b27/lib/libcuda.so
#3  0x00007f8bd5628609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007f8bd554f293 in clone () from /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) disassemble jlcapi_async_send_69899
Dump of assembler code for function jlcapi_async_send_69899:
   0x00007f8bab6188c0 <+0>:     push   %r15
   0x00007f8bab6188c2 <+2>:     push   %r14
   0x00007f8bab6188c4 <+4>:     push   %r13
   0x00007f8bab6188c6 <+6>:     push   %r12
   0x00007f8bab6188c8 <+8>:     push   %rbx
   0x00007f8bab6188c9 <+9>:     sub    $0x20,%rsp
   0x00007f8bab6188cd <+13>:    mov    %rdi,%r14
   0x00007f8bab6188d0 <+16>:    xorps  %xmm0,%xmm0
   0x00007f8bab6188d3 <+19>:    movaps %xmm0,(%rsp)
   0x00007f8bab6188d7 <+23>:    movq   $0x0,0x10(%rsp)
   0x00007f8bab6188e0 <+32>:    mov    0x1e902459(%rip),%rax        # 0x7f8bc9f1ad40 <jl_tls_offset.real>
   0x00007f8bab6188e7 <+39>:    test   %rax,%rax
   0x00007f8bab6188ea <+42>:    je     0x7f8bab618999 <jlcapi_async_send_69899+217>
   0x00007f8bab6188f0 <+48>:    mov    %fs:0x0,%rcx
   0x00007f8bab6188f9 <+57>:    mov    (%rcx,%rax,1),%rbx
   0x00007f8bab6188fd <+61>:    movq   $0x4,(%rsp)
=> 0x00007f8bab618905 <+69>:    mov    (%rbx),%rax
--Type <RET> for more, q to quit, c to continue without paging--
@ericphanson ericphanson changed the title During sysimage creation we create broken code Segfault with CUDA in a sysimage Jan 10, 2022
@vchuravy
Copy link
Member

module Reproducer

async_send(data::Ptr{Cvoid}) = ccall(:uv_async_send, Cint, (Ptr{Cvoid},), data) 

function launch()
    callback = @cfunction(async_send, Cint, (Ptr{Cvoid},))
    return callback
end

end # module

With the precompilation script

using Reproducer

Reproducer.launch()

Leads to:

000000000074d350 <jlcapi_async_send_48533>:
  74d350:       41 57                   push   %r15
  74d352:       41 56                   push   %r14
  74d354:       41 55                   push   %r13
  74d356:       41 54                   push   %r12
  74d358:       53                      push   %rbx
  74d359:       48 83 ec 20             sub    $0x20,%rsp
  74d35d:       48 8b 05 9c 07 5c 08    mov    0x85c079c(%rip),%rax        # 8d0db00 <jl_tls_offset>
  74d364:       c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
  74d368:       49 89 fe                mov    %rdi,%r14
  74d36b:       48 c7 44 24 10 00 00    movq   $0x0,0x10(%rsp)
  74d372:       00 00 
  74d374:       c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
  74d379:       48 85 c0                test   %rax,%rax
  74d37c:       0f 84 a9 00 00 00       je     74d42b <jlcapi_async_send_48533+0xdb>
  74d382:       64 48 8b 0c 25 00 00    mov    %fs:0x0,%rcx
  74d389:       00 00 
  74d38b:       48 8b 1c 01             mov    (%rcx,%rax,1),%rbx
  74d38f:       48 c7 04 24 04 00 00    movq   $0x4,(%rsp)
  74d396:       00 
  74d397:       48 8b 0d 12 6c 13 00    mov    0x136c12(%rip),%rcx        # 883fb0 <jl_world_counter>
  74d39e:       48 89 e7                mov    %rsp,%rdi
  74d3a1:       be 70 05 00 00          mov    $0x570,%esi
  74d3a6:       ba 10 00 00 00          mov    $0x10,%edx
  74d3ab:       48 8b 03                mov    (%rbx),%rax

in the sysimage. Which I have confirmed is returned as well when invoked from a REPL running that sysimage.

Without that sysimage ENABLE_GDBLISTENER=1 julia -g2:

julia> Reproducer.launch()
Ptr{Nothing} @0x00007fff9c9bc9c0

Dump of assembler code for function jlcapi_async_send_15:
   0x00007fff9c9bc9c0 <+0>:     push   %r14
   0x00007fff9c9bc9c2 <+2>:     push   %rbx
   0x00007fff9c9bc9c3 <+3>:     sub    $0x8,%rsp
   0x00007fff9c9bc9c7 <+7>:     movabs $0x7ffff761dfc0,%rdx
   0x00007fff9c9bc9d1 <+17>:    movabs $0x7fffee5cecc8,%rsi
   0x00007fff9c9bc9db <+27>:    mov    %fs:0x0,%rax
   0x00007fff9c9bc9e4 <+36>:    mov    -0x8(%rax),%rax
   0x00007fff9c9bc9e8 <+40>:    mov    %rsp,%rbx
   0x00007fff9c9bc9eb <+43>:    mov    (%rsi),%rcx
   0x00007fff9c9bc9ee <+46>:    mov    (%rdx),%rdx
   0x00007fff9c9bc9f1 <+49>:    lea    0x8(%rax),%r8
   0x00007fff9c9bc9f5 <+53>:    cmp    %rdx,%rcx
   0x00007fff9c9bc9f8 <+56>:    mov    %rcx,%rsi
   0x00007fff9c9bc9fb <+59>:    cmovae %rdx,%rsi
   0x00007fff9c9bc9ff <+63>:    test   %rax,%rax
   0x00007fff9c9bca02 <+66>:    movabs $0x7fff9c9bca40,%rax
   0x00007fff9c9bca0c <+76>:    cmovne %r8,%rbx
   0x00007fff9c9bca10 <+80>:    movabs $0x7fff9c9bc900,%r8
   0x00007fff9c9bca1a <+90>:    cmovne %rdx,%rsi
   0x00007fff9c9bca1e <+94>:    mov    (%rbx),%r14
   0x00007fff9c9bca21 <+97>:    cmove  %r8,%rax
   0x00007fff9c9bca25 <+101>:   cmp    %rdx,%rcx
   0x00007fff9c9bca28 <+104>:   mov    %rsi,(%rbx)
   0x00007fff9c9bca2b <+107>:   cmovae %r8,%rax
   0x00007fff9c9bca2f <+111>:   call   *%rax
   0x00007fff9c9bca31 <+113>:   mov    %r14,(%rbx)
   0x00007fff9c9bca34 <+116>:   add    $0x8,%rsp
   0x00007fff9c9bca38 <+120>:   pop    %rbx
   0x00007fff9c9bca39 <+121>:   pop    %r14
   0x00007fff9c9bca3b <+123>:   ret    
End of assembler dump.

So precompiling a @cfunction causes a very different codegen path.

cc: @vtjnash

@mashu

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Somebody else's problem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants