Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS: use-after-free during debuginfo registration leads to segfault #44562

Closed
gbaraldi opened this issue Mar 11, 2022 · 14 comments · Fixed by #45016
Closed

MacOS: use-after-free during debuginfo registration leads to segfault #44562

gbaraldi opened this issue Mar 11, 2022 · 14 comments · Fixed by #45016
Labels
bug Indicates an unexpected problem or unintended behavior system:mac Affects only macOS

Comments

@gbaraldi
Copy link
Member

gbaraldi commented Mar 11, 2022

When running the read test on the m1 mac get non fatal segfaults like

signal (11): Segmentation fault: 11
in expression starting at none:0

signal (11): Segmentation fault: 11
in expression starting at none:0

I caught it on lldb on a non-debug buid:

in expression starting at none:0
libjulia-internal.1.9.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 55818 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x102527bc0)
    frame #0: 0x000000010187436c libjulia-internal.1.9.dylib`jl_exit_thread0_cb(exitstate=139) at signals-mach.c:408 [opt]
   405 	}
   406 	
   407 	static void JL_NORETURN jl_exit_thread0_cb(int exitstate)
-> 408 	{
   409 	CFI_NORETURN
   410 	   jl_critical_error(exitstate - 128, NULL, jl_current_task);
   411 	   jl_exit(exitstate);
Target 0: (julia) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x102527bc0)
  * frame #0: 0x000000010187436c libjulia-internal.1.9.dylib`jl_exit_thread0_cb(exitstate=139) at signals-mach.c:408 [opt]
    frame #1: 0x00000001ea3e4b68 libsystem_c.dylib
    frame #2: 0x0000000190c431e4 libsystem_c.dylib`__v2printf + 408
    frame #3: 0x0000000190c72038 libsystem_c.dylib`_vsnprintf + 268
    frame #4: 0x0000000101850cd4 libjulia-internal.1.9.dylib`ijl_safe_printf(fmt=<unavailable>) at jl_uv.c:623:5 [opt]
    frame #5: 0x0000000101874168 libjulia-internal.1.9.dylib`jl_critical_error(sig=11, context=0x0000000000000000, ct=0x000000010907c010) at signal-handling.c:315:9 [opt]
    frame #6: 0x0000000101874394 libjulia-internal.1.9.dylib`jl_exit_thread0_cb(exitstate=139) at signals-mach.c:410:5 [opt]
@giordano giordano added the system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips label Mar 11, 2022
@IanButterworth IanButterworth added the system:mac Affects only macOS label Mar 11, 2022
@IanButterworth
Copy link
Member

I see this on x86 MacOS on intel macs too.

Could this be related to #42836

@vtjnash
Copy link
Member

vtjnash commented Mar 11, 2022

very minor part to the fix:

diff --git a/src/signals-mach.c b/src/signals-mach.c
index 130603931c..b8bf25e84e 100644
--- a/src/signals-mach.c
+++ b/src/signals-mach.c
@@ -326,6 +326,7 @@ kern_return_t catch_exception_raise(mach_port_t            exception_port,
         return KERN_SUCCESS;
     }
     else {
+        thread0_exit_count++;
         jl_exit_thread0(128 + SIGSEGV, NULL, 0);
         return KERN_SUCCESS;
     }

I would need to get access to the M1 to see what the disassembly and registers look like at that crash site there.

@gbaraldi
Copy link
Member Author

I am trying to catch it again in lldb but it isn't cooperating :)

@gbaraldi
Copy link
Member Author

libjulia-internal.1.9.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 956 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x102802f10)
    frame #0: 0x000000010107436c libjulia-internal.1.9.dylib`jl_exit_thread0_cb(exitstate=139) at signals-mach.c:408 [opt]
   405 	}
   406 	
   407 	static void JL_NORETURN jl_exit_thread0_cb(int exitstate)
-> 408 	{
   409 	CFI_NORETURN
   410 	   jl_critical_error(exitstate - 128, NULL, jl_current_task);
   411 	   jl_exit(exitstate);
Target 0: (julia) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x102802f10)
  * frame #0: 0x000000010107436c libjulia-internal.1.9.dylib`jl_exit_thread0_cb(exitstate=139) at signals-mach.c:408 [opt]
    frame #1: 0x00000001eaa7cb68 libsystem_c.dylib
    frame #2: 0x000000019130a038 libsystem_c.dylib`_vsnprintf + 268
    frame #3: 0x0000000101050cd4 libjulia-internal.1.9.dylib`ijl_safe_printf(fmt=<unavailable>) at jl_uv.c:623:5 [opt]
    frame #4: 0x0000000101074168 libjulia-internal.1.9.dylib`jl_critical_error(sig=<unavailable>, context=0x0000000000000000, ct=0x000000010887c010) at signal-handling.c:315:9 [opt]
    frame #5: 0x0000000101074394 libjulia-internal.1.9.dylib`jl_exit_thread0_cb(exitstate=139) at signals-mach.c:410:5 [opt]
(lldb) dis 
libjulia-internal.1.9.dylib`jl_exit_thread0_cb:
->  0x10107436c <+0>:  stp    x20, x19, [sp, #-0x20]!
    0x101074370 <+4>:  stp    x29, x30, [sp, #0x10]
    0x101074374 <+8>:  add    x29, sp, #0x10            ; =0x10 
    0x101074378 <+12>: mov    x19, x0
    0x10107437c <+16>: sub    w20, w0, #0x80            ; =0x80 
    0x101074380 <+20>: bl     0x10105e420               ; ijl_get_pgcstack at threading.c:91:48
    0x101074384 <+24>: sub    x2, x0, #0x60             ; =0x60 
    0x101074388 <+28>: mov    x0, x20
    0x10107438c <+32>: mov    x1, #0x0
    0x101074390 <+36>: bl     0x10107408c               ; jl_critical_error at signal-handling.c:279
    0x101074394 <+40>: mov    x0, x19
    0x101074398 <+44>: bl     0x101051124               ; ijl_exit at jl_uv.c:637
(lldb) register read
General Purpose Registers:
        x0 = 0x000000000000008b
        x1 = 0x00000001eaa7cb68  __global_locale
        x2 = 0x00000001012dc143  "\nsignal (%d): %s\n"
        x3 = 0x00000001028082e0
        x4 = 0x00000001012dc143  "\nsignal (%d): %s\n"
        x5 = 0x00000001028082e0
        x6 = 0x00000001028082e0
        x7 = 0x00000000000007ff
        x8 = 0x00000001028081a8
        x9 = 0x0000000000000003
       x10 = 0x0000000101312268  libjulia-internal.1.9.dylib`ijl_safe_printf.buf
       x11 = 0x00000000ffff0208
       x12 = 0x0000000102808250
       x13 = 0x0000000000000000
       x14 = 0x000000000000000a
       x15 = 0x0000000000000000
       x16 = 0x00000001912de710  libsystem_c.dylib`vsnprintf
       x17 = 0x00000001ebc45fd8  (void *)0x0000000191429da0: _platform_strlcpy
       x18 = 0x0000000000000000
       x19 = 0x00000001eaa7cb68  __global_locale
       x20 = 0x00000001028082e0
       x21 = 0x0000000102808108
       x22 = 0x0000000000000000
       x23 = 0x00000001012dc143  "\nsignal (%d): %s\n"
       x24 = 0xffffffffffffffff
       x25 = 0xffffffff00114a00
       x26 = 0x0000000000000001
       x27 = 0x0000000000114b00
       x28 = 0x00114b00002e6887
        fp = 0x00000001028080f0
        lr = 0x0000000000000000
        sp = 0x0000000102802e20
        pc = 0x000000010107436c  libjulia-internal.1.9.dylib`jl_exit_thread0_cb at signals-mach.c:408
      cpsr = 0x00001000

@vtjnash do you want me to run other commands?

@vtjnash
Copy link
Member

vtjnash commented Mar 12, 2022

Hm, those numbers disagree in lldb. One states the exception happened while writing to 0x102802f10 (to 0x102802f20), the other states it happened while writing to 0x0000000102802e00 (to 0x0000000102802e10). Somehow two of the bits in that stack pointer are getting stuck with the wrong values while entering that function, causing it much unhappiness.

@staticfloat
Copy link
Member

staticfloat commented Apr 5, 2022

It's becoming clear to me that this issue has nothing to do with the actual code being run. E.g. here's an example where it triggers on a worker that is running the p7zip_jll tests: https://buildkite.com/julialang/julia-buildkite/builds/433#c968a246-8054-4784-b501-23bcf78c05ee/326-744

However, that test suite is just about as simple as possible: https://github.com/JuliaLang/julia/blob/master/stdlib/p7zip_jll/test/runtests.jl#L6

My guess is that this is actually happening during process teardown, as the distributed workers exit after finishing their work. Notice in the log above how many different workers all experience the issue at the same time, as they run out of work items to process.

@gbaraldi
Copy link
Member Author

gbaraldi commented Apr 5, 2022

I got it to segfault inside lldb but I forgot to attach the child processes :(

@maleadt
Copy link
Member

maleadt commented Apr 6, 2022

I can trigger this locally by just doing:

while julia
do
;
done

It takes a while though. Now trying to catch it in LLDB using Expect:

#!/usr/bin/expect -f

spawn lldb -- julia
expect "Current executable set to"
expect "(lldb) "

set timeout 2
send -- "run -e 42\r"
expect -re "Process .* launched"

expect -re "Process .* exited with status = 0" {
    exit
}

# probably a segfault, give back c ontrol
interact

@maleadt
Copy link
Member

maleadt commented Apr 6, 2022

Seems like the segfault cascade is caused by the jl_critical_error during segfault handling causing a segfault itself. I'll try now with @vtjnash's patch from above, and a debug build.

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1034072e0)
  * frame #0: 0x0000000101c7af08 libjulia-internal.1.8.dylib`jl_exit_thread0_cb
    frame #1: 0x00000001de557828 libsystem_c.dylib
    frame #2: 0x000000018579f758 libsystem_c.dylib`_vsnprintf + 224
    frame #3: 0x0000000101c55d0c libjulia-internal.1.8.dylib`ijl_safe_printf + 72
    frame #4: 0x0000000101c7ad04 libjulia-internal.1.8.dylib`jl_critical_error + 220
    frame #5: 0x0000000101c7af30 libjulia-internal.1.8.dylib`jl_exit_thread0_cb + 40

EDIT: putting a raise(SIGSTOP) in the segfault handler gives me the following backtrace in LLDB:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001856bcd80 libsystem_malloc.dylib`free + 524
    frame #1: 0x0000000109dadfbc libLLVM.dylib`std::__1::__tree<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::__map_value_compare<llvm::StringRef, std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::less<llvm::StringRef>, true>, std::__1::allocator<std::__1::__value_type<llvm::StringRef, llvm::StringRef> > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, void*>*) + 32
    frame #2: 0x0000000109dadfc8 libLLVM.dylib`std::__1::__tree<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::__map_value_compare<llvm::StringRef, std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::less<llvm::StringRef>, true>, std::__1::allocator<std::__1::__value_type<llvm::StringRef, llvm::StringRef> > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, void*>*) + 44
    frame #3: 0x0000000109dadfc8 libLLVM.dylib`std::__1::__tree<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::__map_value_compare<llvm::StringRef, std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::less<llvm::StringRef>, true>, std::__1::allocator<std::__1::__value_type<llvm::StringRef, llvm::StringRef> > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, void*>*) + 44
    frame #4: 0x0000000109dadfc8 libLLVM.dylib`std::__1::__tree<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::__map_value_compare<llvm::StringRef, std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::less<llvm::StringRef>, true>, std::__1::allocator<std::__1::__value_type<llvm::StringRef, llvm::StringRef> > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, void*>*) + 44
    frame #5: 0x0000000109dadfc8 libLLVM.dylib`std::__1::__tree<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::__map_value_compare<llvm::StringRef, std::__1::__value_type<llvm::StringRef, llvm::StringRef>, std::__1::less<llvm::StringRef>, true>, std::__1::allocator<std::__1::__value_type<llvm::StringRef, llvm::StringRef> > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::StringRef, llvm::StringRef>, void*>*) + 44
    frame #6: 0x000000010b3fd9a4 libLLVM.dylib`std::__1::set<llvm::StringRef, std::__1::less<llvm::StringRef>, std::__1::allocator<llvm::StringRef> >::~set() + 24
    frame #7: 0x0000000185799dc4 libsystem_c.dylib`__cxa_finalize_ranges + 464
    frame #8: 0x0000000185799b68 libsystem_c.dylib`exit + 44
    frame #9: 0x00000001040a7f70 julia-debug`main(argc=3, argv=0x000000016bd5b170) at loader_exe.c:62:5
    frame #10: 0x0000000104119088 dyld`start + 516

So a double free or so in LLVM during process clean-up. Maybe ASAN could help here

@maleadt
Copy link
Member

maleadt commented Apr 6, 2022

Finally got an ASAN build, and it caught the following:

==52538==ERROR: AddressSanitizer: heap-use-after-free on address 0x000102529bd0 at pc 0x000111303908 bp 0x000125efd8e0 sp 0x000125efd8d8

* thread #3, stop reason = breakpoint 1.1
  * frame #0: 0x0000000100456720 libclang_rt.asan_osx_dynamic.dylib`__asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool)
    frame #1: 0x00000001004578b4 libclang_rt.asan_osx_dynamic.dylib`__asan_report_load8 + 52
    frame #2: 0x0000000111303908 libLLVM.dylib`::LookupBucketFor() at StringMap.cpp:87:38 [opt]
    frame #3: 0x00000001057b53a8 libjulia-codegen-debug.1.9.dylib`std::__1::pair<llvm::StringMapIterator<_jl_code_instance_t*>, bool> llvm::StringMap<_jl_code_instance_t*, llvm::MallocAllocator>::try_emplace<>(this=0x0000000105a1fbc0, Key=(Data = "_julia_NamedTuple_10", Length = 20)) at StringMap.h:316:25
    frame #4: 0x00000001057971c8 libjulia-codegen-debug.1.9.dylib`llvm::StringMap<_jl_code_instance_t*, llvm::MallocAllocator>::operator[](this=0x0000000105a1fbc0, Key=(Data = "_julia_NamedTuple_10", Length = 20)) at StringMap.h:243:47
    frame #5: 0x00000001057970e0 libjulia-codegen-debug.1.9.dylib`jl_add_code_in_flight(name=(Data = "julia_NamedTuple_10", Length = 19), codeinst=0x000000015cf2e980, DL=0x0000000104833110) at debuginfo.cpp:126:5
    frame #6: 0x000000010556a59c libjulia-codegen-debug.1.9.dylib`jl_emit_codeinst(m=0x0000000125efeaa8, codeinst=0x000000015cf2e980, src=0x000000015cf15d40, params=0x0000000125efeae8) at codegen.cpp:7997:17
    frame #7: 0x00000001056e73e4 libjulia-codegen-debug.1.9.dylib`_jl_compile_codeinst(codeinst=0x000000015cf2e980, src=0x000000015cf15d40, world=32564, context=ThreadSafeContext @ 0x0000000125efef88) at jitlayers.cpp:128:37
    frame #8: 0x00000001056e6d30 libjulia-codegen-debug.1.9.dylib`::jl_generate_fptr_impl(mi=0x000000015cf16240, world=32564) at jitlayers.cpp:345:9
    frame #9: 0x00000001037c4078 libjulia-internal-debug.1.9.dylib`jl_compile_method_internal(mi=0x000000015cf16240, world=32564) at gf.c:2081:16
    frame #10: 0x00000001037ca230 libjulia-internal-debug.1.9.dylib`_jl_invoke(F=0x0000000127bc7fb0, args=0x0000000125eff948, nargs=1, mfunc=0x000000015cf16240, world=32564) at gf.c:2359:16
    frame #11: 0x00000001037ca458 libjulia-internal-debug.1.9.dylib`ijl_apply_generic(F=0x0000000127bc7fb0, args=0x0000000125eff948, nargs=1) at gf.c:2549:12
    frame #12: 0x0000000127378f54 sys-debug.dylib`julia_profile_printing_listener_65922 at logging.jl:361
    frame #13: 0x00000001273791c0 sys-debug.dylib`julia_YY.3_66383 at threadingconstructs.jl:258
    frame #14: 0x00000001273791d0 sys-debug.dylib`jfptr_YY.3_66384 + 8
    frame #15: 0x00000001037ca0c8 libjulia-internal-debug.1.9.dylib`_jl_invoke(F=0x0000000127bc7020, args=0x0000000102c10238, nargs=0, mfunc=0x0000000127bc6e60, world=32564) at gf.c:2348:35
    frame #16: 0x00000001037ca458 libjulia-internal-debug.1.9.dylib`ijl_apply_generic(F=0x0000000127bc7020, args=0x0000000102c10238, nargs=0) at gf.c:2549:12
    frame #17: 0x000000010382d260 libjulia-internal-debug.1.9.dylib`jl_apply(args=0x0000000102c10230, nargs=1) at julia.h:1830:12
    frame #18: 0x0000000103832670 libjulia-internal-debug.1.9.dylib`start_task at task.c:931:19

Not sure it's related yet. Would be so much easier to debug with rr...

@maleadt
Copy link
Member

maleadt commented Apr 7, 2022

Another one in the debuginfo registration:

==74124==ERROR: AddressSanitizer: heap-use-after-free on address 0x000101d51940 at pc 0x00010579f5d4 bp 0x000125cf8210 sp 0x000125cf8208
READ of size 8 at 0x000101d51940 thread T2

* thread #3, stop reason = breakpoint 1.1
  * frame #0: 0x0000000100456720 libclang_rt.asan_osx_dynamic.dylib`__asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool)
    frame #1: 0x00000001004578b4 libclang_rt.asan_osx_dynamic.dylib`__asan_report_load8 + 52
    frame #2: 0x000000010579f5d4 libjulia-codegen-debug.1.9.dylib`revcomp::operator(this=0x0000000105a1b9f0, lhs=0x0000000125cf8778, rhs=0x0000000101d51940)(unsigned long const&, unsigned long const&) const at debuginfo.cpp:186:18
    frame #3: 0x00000001057a123c libjulia-codegen-debug.1.9.dylib`std::__1::__map_value_compare<unsigned long, std::__1::__value_type<unsigned long, ObjectInfo>, revcomp, true>::operator(this=0x0000000105a1b9f0, __x=0x0000000125cf8778, __y=0x0000000101d51940)(unsigned long const&, std::__1::__value_type<unsigned long, ObjectInfo> const&) const at map:537:17
    frame #4: 0x00000001057a096c libjulia-codegen-debug.1.9.dylib`std::__1::__tree_node_base<void*>*& std::__1::__tree<std::__1::__value_type<unsigned long, ObjectInfo>, std::__1::__map_value_compare<unsigned long, std::__1::__value_type<unsigned long, ObjectInfo>, revcomp, true>, std::__1::allocator<std::__1::__value_type<unsigned long, ObjectInfo> > >::__find_equal<unsigned long>(this=0x0000000105a1b9e0, __parent=0x0000000125cf84c0, __v=0x0000000125cf8778) at __tree:1977:17
    frame #5: 0x00000001057a06f4 libjulia-codegen-debug.1.9.dylib`std::__1::pair<std::__1::__tree_iterator<std::__1::__value_type<unsigned long, ObjectInfo>, std::__1::__tree_node<std::__1::__value_type<unsigned long, ObjectInfo>, void*>*, long>, bool> std::__1::__tree<std::__1::__value_type<unsigned long, ObjectInfo>, std::__1::__map_value_compare<unsigned long, std::__1::__value_type<unsigned long, ObjectInfo>, revcomp, true>, std::__1::allocator<std::__1::__value_type<unsigned long, ObjectInfo> > >::__emplace_unique_key_args<unsigned long, std::__1::piecewise_construct_t const&, std::__1::tuple<unsigned long&&>, std::__1::tuple<> >(this=0x0000000105a1b9e0, __k=0x0000000125cf8778, __args=0x0000000105968080, __args=size=1, __args=size=0) at __tree:2093:36
    frame #6: 0x000000010579e6b0 libjulia-codegen-debug.1.9.dylib`std::__1::map<unsigned long, ObjectInfo, revcomp, std::__1::allocator<std::__1::pair<unsigned long const, ObjectInfo> > >::operator[](this=0x0000000105a1b9e0 size=6, __k=0x0000000125cf8778) at map:1560:20
    frame #7: 0x000000010579e314 libjulia-codegen-debug.1.9.dylib`JITObjectRegistry::registerJITObject(this=0x0000000125cf8a80)>, std::__1::function<void* (void*)>)::'lambda'()::operator()() const at debuginfo.cpp:376:21
    frame #8: 0x000000010579b7c0 libjulia-codegen-debug.1.9.dylib`void jl_profile_atomic<JITObjectRegistry::registerJITObject(llvm::object::ObjectFile const&, std::__1::function<unsigned long long (llvm::StringRef const&)>, std::__1::function<void* (void*)>)::'lambda'()>(f=(unnamed class) @ 0x0000000125cf8a80)>, std::__1::function<void* (void*)>)::'lambda'()) at debuginfo.cpp:93:5
    frame #9: 0x00000001057947c4 libjulia-codegen-debug.1.9.dylib`JITObjectRegistry::registerJITObject(this=0x0000000105a1b9e0, Object=0x0000000102b62c80, getLoadAddress= Lambda in File jitlayers.cpp at Line 614)>, std::__1::function<void* (void*)>) at debuginfo.cpp:366:13
    frame #10: 0x0000000105793ed0 libjulia-codegen-debug.1.9.dylib`jl_register_jit_object(Object=0x0000000102b62c80, getLoadAddress=<unavailable>, lookupWriteAddress=function<void *(void *)> @ 0x0000000125cf8e58)>, std::__1::function<void* (void*)>) at debuginfo.cpp:395:28
    frame #11: 0x000000010572a8e0 libjulia-codegen-debug.1.9.dylib`(anonymous namespace)::JLDebuginfoPlugin::notifyEmitted(this=0x0000000101fc3440, MR=0x00000001071b3890) at jitlayers.cpp:626:9
    frame #12: 0x00000001160cd160 libLLVM.dylib`::notifyEmitted() at ObjectLinkingLayer.cpp:678:41 [opt]
    frame #13: 0x00000001160d8958 libLLVM.dylib`::notifyFinalized() at ObjectLinkingLayer.cpp:302:26 [opt]
    frame #14: 0x0000000115f1a7a4 libLLVM.dylib`::linkPhase3() at JITLinkGeneric.cpp:178:8 [opt]
    frame #15: 0x0000000115f1c71c libLLVM.dylib`::operator()() [inlined] operator() at JITLinkGeneric.cpp:164:18 [opt]
    frame #16: 0x0000000115f1c6fc libLLVM.dylib`::operator()() [inlined] __invoke<(lambda at /Users/tim/Julia/src/julia/deps/srccache/llvm-julia-13.0.1-0/llvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp:162:29) &, llvm::Error> at type_traits:3918:1 [opt]
    frame #17: 0x0000000115f1c6f4 libLLVM.dylib`::operator()() [inlined] __call<(lambda at /Users/tim/Julia/src/julia/deps/srccache/llvm-julia-13.0.1-0/llvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp:162:29) &, llvm::Error> at invoke.h:61:9 [opt]
    frame #18: 0x0000000115f1c6f4 libLLVM.dylib`::operator()() [inlined] operator() at function.h:178:16 [opt]
    frame #19: 0x0000000115f1c6d0 libLLVM.dylib`::operator()() at function.h:352:12 [opt]
    frame #20: 0x0000000115f24448 libLLVM.dylib`::finalizeAsync() [inlined] operator() at function.h:505:16 [opt]
    frame #21: 0x0000000115f24418 libLLVM.dylib`::finalizeAsync() [inlined] operator() at function.h:1182:12 [opt]
    frame #22: 0x0000000115f24404 libLLVM.dylib`::finalizeAsync() at JITLinkMemoryManager.cpp:39:7 [opt]
    frame #23: 0x0000000115f187ac libLLVM.dylib`::linkPhase2() at JITLinkGeneric.cpp:167:10 [opt]
    frame #24: 0x0000000115f1bf8c libLLVM.dylib`::run() [inlined] operator() at JITLinkGeneric.cpp:111:32 [opt]
    frame #25: 0x0000000115f1be20 libLLVM.dylib`::run() at JITLink.h:1603:57 [opt]
    frame #26: 0x00000001160da5f8 libLLVM.dylib`::operator()() at ObjectLinkingLayer.cpp:185:29 [opt]
    frame #27: 0x00000001160d9d58 libLLVM.dylib`::CallImpl<(lambda at /Users/tim/Julia/src/julia/deps/srccache/llvm-julia-13.0.1-0/llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp:177:22)>() at FunctionExtras.h:216:12 [opt]
    frame #28: 0x0000000115ff1b70 libLLVM.dylib`::run() [inlined] operator() at FunctionExtras.h:378:12 [opt]
    frame #29: 0x0000000115ff1b1c libLLVM.dylib`::run() at Core.cpp:186:27 [opt]
    frame #30: 0x0000000115fefedc libLLVM.dylib`::CallImpl<void (*)(std::__1::unique_ptr<llvm::orc::Task, std::__1::default_delete<llvm::orc::Task> >)>() at FunctionExtras.h:216:12 [opt]
    frame #31: 0x0000000115fa329c libLLVM.dylib`::handleComplete() [inlined] operator() at FunctionExtras.h:378:12 [opt]
    frame #32: 0x0000000115fa327c libLLVM.dylib`::handleComplete() [inlined] dispatchTask at Core.h:1453:5 [opt]
    frame #33: 0x0000000115fa323c libLLVM.dylib`::handleComplete() at Core.cpp:196:6 [opt]
    frame #34: 0x0000000115fe0798 libLLVM.dylib`::OL_completeLookup() at Core.cpp:2662:8 [opt]
    frame #35: 0x00000001160001fc libLLVM.dylib`::complete() at Core.cpp:548:8 [opt]
    frame #36: 0x0000000115fb13a8 libLLVM.dylib`::OL_applyQueryPhase1() at Core.cpp:2421:11 [opt]
    frame #37: 0x0000000115fab568 libLLVM.dylib`::lookup() at Core.cpp:1974:3 [opt]
    frame #38: 0x00000001160d4ed8 libLLVM.dylib`::lookup() at ObjectLinkingLayer.cpp:195:8 [opt]
    frame #39: 0x0000000115f12cc0 libLLVM.dylib`::linkPhase1() at JITLinkGeneric.cpp:106:11 [opt]
    frame #40: 0x0000000115f29efc libLLVM.dylib`::link<std::__1::unique_ptr<llvm::jitlink::JITLinkContext, std::__1::default_delete<llvm::jitlink::JITLinkContext> >, std::__1::unique_ptr<llvm::jitlink::LinkGraph, std::__1::default_delete<llvm::jitlink::LinkGraph> >, llvm::jitlink::PassConfiguration>() at JITLinkGeneric.h:139:10 [opt]
    frame #41: 0x0000000115f271a0 libLLVM.dylib`::link_MachO_arm64() at MachO_arm64.cpp:699:3 [opt]
    frame #42: 0x0000000115f260e8 libLLVM.dylib`::link_MachO() at MachO.cpp:81:12 [opt]
    frame #43: 0x0000000115f06108 libLLVM.dylib`::link() at JITLink.cpp:410:12 [opt]
    frame #44: 0x00000001160cbdc8 libLLVM.dylib`::emit() at ObjectLinkingLayer.cpp:648:5 [opt]
    frame #45: 0x00000001160681fc libLLVM.dylib`::emit() at IRCompileLayer.cpp:40:15 [opt]
    frame #46: 0x00000001160696d8 libLLVM.dylib`::emit() at IRTransformLayer.cpp:25:15 [opt]
    frame #47: 0x00000001056e6490 libjulia-codegen-debug.1.9.dylib`JuliaOJIT::OptSelLayerT::emit(this=0x0000000104b15d70, R=nullptr, TSM=ThreadSafeModule @ 0x0000000125cfc500) at jitlayers.cpp:562:32
    frame #48: 0x000000011607e3f8 libLLVM.dylib`::materialize() at Layer.cpp:152:5 [opt]
    frame #49: 0x0000000115fd40e8 libLLVM.dylib`::run() at Core.cpp:1803:39 [opt]
    frame #50: 0x0000000115fefedc libLLVM.dylib`::CallImpl<void (*)(std::__1::unique_ptr<llvm::orc::Task, std::__1::default_delete<llvm::orc::Task> >)>() at FunctionExtras.h:216:12 [opt]
    frame #51: 0x0000000115fd8490 libLLVM.dylib`::dispatchOutstandingMUs() [inlined] operator() at FunctionExtras.h:378:12 [opt]
    frame #52: 0x0000000115fd8440 libLLVM.dylib`::dispatchOutstandingMUs() [inlined] dispatchTask at Core.h:1453:5 [opt]
    frame #53: 0x0000000115fd8428 libLLVM.dylib`::dispatchOutstandingMUs() at Core.cpp:2136:5 [opt]
    frame #54: 0x0000000115fe07a0 libLLVM.dylib`::OL_completeLookup() at Core.cpp:2665:3 [opt]
    frame #55: 0x00000001160001fc libLLVM.dylib`::complete() at Core.cpp:548:8 [opt]
    frame #56: 0x0000000115fb13a8 libLLVM.dylib`::OL_applyQueryPhase1() at Core.cpp:2421:11 [opt]
    frame #57: 0x0000000115fab568 libLLVM.dylib`::lookup() at Core.cpp:1974:3 [opt]
    frame #58: 0x0000000115fd89fc libLLVM.dylib`::lookup() at Core.cpp:2011:3 [opt]
    frame #59: 0x0000000115fd96dc libLLVM.dylib`::lookup() at Core.cpp:2036:24 [opt]
    frame #60: 0x0000000115fd9eec libLLVM.dylib`::lookup() at Core.cpp:2048:10 [opt]
    frame #61: 0x0000000115fda298 libLLVM.dylib`::lookup() at Core.cpp:2054:10 [opt]
    frame #62: 0x00000001056e21a4 libjulia-codegen-debug.1.9.dylib`JuliaOJIT::addModule(this=0x0000000104b15480, TSM=ThreadSafeModule @ 0x0000000125cfe148) at jitlayers.cpp:1066:21
    frame #63: 0x00000001056f84a4 libjulia-codegen-debug.1.9.dylib`jl_add_to_ee(M=0x00000001049622a8, NewExports=0x0000000125cfe9a0, Queued=0x0000000125cfe288, ToMerge=size=1, depth=1) at jitlayers.cpp:1417:29
    frame #64: 0x00000001056f35d0 libjulia-codegen-debug.1.9.dylib`jl_add_to_ee(M=0x00000001049622a8, NewExports=0x0000000125cfe9a0) at jitlayers.cpp:1439:5
    frame #65: 0x00000001056e4998 libjulia-codegen-debug.1.9.dylib`_jl_compile_codeinst(codeinst=0x000000015cf2e980, src=0x000000015cf15d40, world=32564, context=ThreadSafeContext @ 0x0000000125cfef88) at jitlayers.cpp:163:13
    frame #66: 0x00000001056e3b00 libjulia-codegen-debug.1.9.dylib`::jl_generate_fptr_impl(mi=0x000000015cf16240, world=32564) at jitlayers.cpp:345:9
    frame #67: 0x00000001037c4078 libjulia-internal-debug.1.9.dylib`jl_compile_method_internal(mi=0x000000015cf16240, world=32564) at gf.c:2081:16
    frame #68: 0x00000001037ca230 libjulia-internal-debug.1.9.dylib`_jl_invoke(F=0x0000000127bbffb0, args=0x0000000125cff948, nargs=1, mfunc=0x000000015cf16240, world=32564) at gf.c:2359:16
    frame #69: 0x00000001037ca458 libjulia-internal-debug.1.9.dylib`ijl_apply_generic(F=0x0000000127bbffb0, args=0x0000000125cff948, nargs=1) at gf.c:2549:12
    frame #70: 0x0000000127370f54 sys-debug.dylib`julia_profile_printing_listener_65922 at logging.jl:361
    frame #71: 0x00000001273711c0 sys-debug.dylib`julia_YY.3_66383 at threadingconstructs.jl:258
    frame #72: 0x00000001273711d0 sys-debug.dylib`jfptr_YY.3_66384 + 8
    frame #73: 0x00000001037ca0c8 libjulia-internal-debug.1.9.dylib`_jl_invoke(F=0x0000000127bbf020, args=0x0000000102c10238, nargs=0, mfunc=0x0000000127bbee60, world=32564) at gf.c:2348:35
    frame #74: 0x00000001037ca458 libjulia-internal-debug.1.9.dylib`ijl_apply_generic(F=0x0000000127bbf020, args=0x0000000102c10238, nargs=0) at gf.c:2549:12
    frame #75: 0x000000010382d260 libjulia-internal-debug.1.9.dylib`jl_apply(args=0x0000000102c10230, nargs=1) at julia.h:1830:12
    frame #76: 0x0000000103832670 libjulia-internal-debug.1.9.dylib`start_task at task.c:931:19

Does anybody know why the ASAN reports here are truncated?

@maleadt
Copy link
Member

maleadt commented Apr 7, 2022

So I'm not sure what's wrong, but it's definitely related to the two use-after-frees reported by ASAN in the debuginfo registration code. Disabling it, I can't reproduce the segfaults:

diff --git a/src/debuginfo.cpp b/src/debuginfo.cpp
index 0e246160a3..aabd6c96c0 100644
--- a/src/debuginfo.cpp
+++ b/src/debuginfo.cpp
@@ -123,7 +123,7 @@ static std::string mangle(StringRef Name, const DataLayout &DL)
 }
 void jl_add_code_in_flight(StringRef name, jl_code_instance_t *codeinst, const DataLayout &DL)
 {
-    codeinst_in_flight[mangle(name, DL)] = codeinst;
+    //codeinst_in_flight[mangle(name, DL)] = codeinst;
 }


@@ -363,20 +363,20 @@ public:
                 codeinst = codeinst_it->second;
                 codeinst_in_flight.erase(codeinst_it);
             }
-            jl_profile_atomic([&]() {
-                if (codeinst)
-                    linfomap[Addr] = std::make_pair(Size, codeinst->def);
-                if (first) {
-                    ObjectInfo tmp = {&Object,
-                        (size_t)SectionSize,
-                        (ptrdiff_t)(SectionAddr - SectionLoadAddr),
-                        *Section,
-                        nullptr,
-                        };
-                    objectmap[SectionLoadAddr] = tmp;
-                    first = false;
-                }
-            });
+            // jl_profile_atomic([&]() {
+            //     if (codeinst)
+            //         linfomap[Addr] = std::make_pair(Size, codeinst->def);
+            //     if (first) {
+            //         ObjectInfo tmp = {&Object,
+            //             (size_t)SectionSize,
+            //             (ptrdiff_t)(SectionAddr - SectionLoadAddr),
+            //             *Section,
+            //             nullptr,
+            //             };
+            //         objectmap[SectionLoadAddr] = tmp;
+            //         first = false;
+            //     }
+            // });
         }
         jl_gc_safe_leave(ptls, gc_state);
     }

@maleadt maleadt changed the title Segfault when running read test on apple-arm M1: use-after-free during debuginfo registration leads to segfault Apr 7, 2022
@IanButterworth
Copy link
Member

Regarding the title change, I see the repeat empty segfaults on my intel mac. I don't believe this is M1 only

@maleadt maleadt changed the title M1: use-after-free during debuginfo registration leads to segfault MacOS: use-after-free during debuginfo registration leads to segfault Apr 7, 2022
@staticfloat
Copy link
Member

Yeah, we see this on both x86_64 and aarch64 mac on CI pretty often as well.

@giordano giordano removed the system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips label Apr 7, 2022
@DilumAluthge DilumAluthge added the bug Indicates an unexpected problem or unintended behavior label Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior system:mac Affects only macOS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants