Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preloading problem with jemalloc on glibc ≥ 2.34 #46298

Open
kpamnany opened this issue Aug 9, 2022 · 12 comments
Open

Preloading problem with jemalloc on glibc ≥ 2.34 #46298

kpamnany opened this issue Aug 9, 2022 · 12 comments

Comments

@kpamnany
Copy link
Contributor

kpamnany commented Aug 9, 2022

We have been preloading jemalloc when running Julia. When we tried to upgrade GLIBC from 2.33, we found:

julia> exit()
munmap_chunk(): invalid pointer

signal (6): Aborted
in expression starting at REPL[2]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7ff62bef66f5)
unknown function (ip: 0x7ff62bf0dd7b)
unknown function (ip: 0x7ff62bf0e05b)
free at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
close_unit_1 at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:742
close_units at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:800
unknown function (ip: 0x7ff62c37624d)
unknown function (ip: 0x7ff62beb2494)
exit at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
jl_exit at /buildworker/worker/package_linux64/build/src/jl_uv.c:634
exit at ./initdefs.jl:28 [inlined]
exit at ./initdefs.jl:29
jfptr_exit_22756.clone_1 at /home/kpamnany/julia-1.7.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429

It looks like an atexit() registered function is calling free() in libc rather than in the (preloaded) jemalloc library. I'm not sure if this has something to do with RTLD_DEEPBIND because it looks like we've been using that flag for a while in Julia but this problem has only appeared now.

The GLIBC 2.34 release notes say:

  • In order to support smoother in-place-upgrades and to simplify
    the implementation of the runtime all functionality formerly
    implemented in the libraries libpthread, libdl, libutil, libanl has
    been integrated into libc. New applications do not need to link with
    -lpthread, -ldl, -lutil, -lanl anymore. For backwards compatibility,
    empty static archives libpthread.a, libdl.a, libutil.a, libanl.a are
    provided, so that the linker options keep working. Applications which
    have been linked against glibc 2.33 or earlier continue to load the
    corresponding shared objects (which are now empty). The integration
    of those libraries into libc means that additional symbols become
    available by default. This can cause applications that contain weak
    references to take unexpected code paths that would only have been
    used in previous glibc versions when e.g. preloading libpthread.so.0,
    potentially exposing application bugs.

This is the likely source of the problem, but again, I'm not sure.

I'm opening this issue for 3 reasons:

  1. In case someone has encountered this problem and has a solution.
  2. To inform other folks who preload libraries.
  3. Julia's build+linking should probably be updated to reflect the above at some point.
@gbaraldi
Copy link
Member

gbaraldi commented Aug 10, 2022

I encountered the same thing while doing something similar. It seems libgfortran is using libc's free but maybe they aren't using libc's malloc which leads to the issue. Not sure why that happens, maybe the change means RTLD_DEEPBIND now finds the malloc functions.

https://discourse.julialang.org/t/discussion-of-ccall-function-library/53887

http://carlsonj.blog.workingcode.com/2018/11/rtlddeepbind-has-deep-surprises.html?m=1

@staticfloat any ideas?

@DilumAluthge
Copy link
Member

Would an rr trace be helpful here?

@gbaraldi
Copy link
Member

It might, but I believe the issue is that RTLD_DEEPBIND doesn't play well with LD_PRELOAD

@kpamnany
Copy link
Contributor Author

I have an rr trace:

(rr) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140065906189760)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140065906189760) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140065906189760, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007f63a2cef476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007f63a2cd57f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007f63a2d366f6 in __libc_message (action=action@entry=do_abort,
    fmt=fmt@entry=0x7f63a2e88b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007f63a2d4dd7c in malloc_printerr (
    str=str@entry=0x7f63a2e8b230 "munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5664
#7  0x00007f63a2d4e05c in munmap_chunk (p=<optimized out>) at ./malloc/malloc.c:3060
#8  0x00007f63a2d5251a in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3381
#9  0x00007f638a3a1cdc in close_unit_1 (u=0x5561d496c5d0, locked=locked@entry=1)
    at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:742
#10 0x00007f638a3a1e7a in _gfortrani_close_units ()
    at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:800
#11 0x00007f63a31be24e in _dl_fini () at ./elf/dl-fini.c:142
#12 0x00007f63a2cf2495 in __run_exit_handlers (status=0, listp=0x7f63a2ec6838 <__exit_funcs>,
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:113
#13 0x00007f63a2cf2610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143
#14 0x00005561d37ce094 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/kpamnany/julia.master/cli/loader_exe.c:62

Not sure if it's helpful though.

@DilumAluthge
Copy link
Member

Can you upload the rr trace files somewhere and share them?

If this was a private code base, or if the rr trace might contain non-public information (e.g. sensitive environment variables), we can have you privately send the rr trace files to a small group of people.

@kpamnany
Copy link
Contributor Author

I'll send it over @DilumAluthge, thanks!

@jakebolewski
Copy link
Member

jakebolewski commented Aug 10, 2022

We also tried disabling libblastrampoline's use of deepbind as suggested here LBT_USE_RTLD_DEEPBIND=0, but that was no dice: https://github.com/JuliaLinearAlgebra/libblastrampoline/blob/2820b41baf6587dedc5258e82ae73e7c05e56668/src/deepbindless.c#L4-L7

@kpamnany kpamnany changed the title Preloading problem with glibc ≥ 2.34 Preloading problem with jemalloc on glibc ≥ 2.34 Aug 10, 2022
@gbaraldi
Copy link
Member

This isn't specific to jemalloc if you were wondering. I can reproduce it with mimalloc also

@kpamnany
Copy link
Contributor Author

Yes, we suspected that this could happen with any preloaded library, which is part of the reason I opened the issue -- to let other folks know about it.

At this point we're not very hopeful of finding a solution so I updated the detail in the title for posterity.

@gbaraldi
Copy link
Member

While working through this I couldn't get it to work anywhere. Even on glibc 2.31 and earlier. What is the setup of where it was working.

@gbaraldi
Copy link
Member

@kpamnany a workaround that might work is also preloading libgfortran.so. That makes it work for me
i.e export LD_PRELOAD="/usr/local/lib/libmimalloc.so <path/to/julia>/usr/lib/libgfortran.so"

@NHDaly
Copy link
Member

NHDaly commented Mar 7, 2024

(sorry for the delay; we tried that and it didn't help either, fyi)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants