On Linux this shims the malloc
family of functions as well as most of the pthread_
functions for actual thread handling and redirects them to their equivalents from bdwgc.
On other platforms this does nothing.
Looking at bdwgc's README.linux
, we can find the following sentence:
Every file that makes thread calls should define GC_THREADS, and then include gc.h. The latter redefines some of the pthread primitives as macros which also provide the collector with information it requires.
On Darwin and Windows bdwgc can and does utilize OS APIs to the necessary tracking, but Linux does not provide these.
But what do you do if a shared library spawns threads? The Crystal compiler does not make any effort to redirect those calls to bdwgc. This can lead bdwgc to clobbering the memory allocated there, since it allocates memory using brk(2)
/sbrk(2)
rather than the malloc interface and therefore needs to made aware of memory allocated otherwise. README.linux
continues with:
A new alternative to (3a) is to build the collector and compile GC clients with -DGC_USE_LD_WRAP, and to link the final program with [...]
Which isn't very practical, especially in the Crystal ecosystem. This shard presents an alternative.
Make sure you're using Crystal 0.35 or later. Then:
-
Add the dependency to your
shard.yml
:dependencies: malloc_pthread_shim: github: jhass/crystal-malloc_pthread_shim version: 0.1.0
-
Run
shards install
. -
Add
require "malloc_pthread_shim"
to your project. -
Ensure to link your program against glibc.
If you're unsure on how to do that, you're most likely already doing it. In doubt, just try to ignore this step.
That's it!
Luckily for us the linker looks up symbols starting in the main binary, so any function we define there wins over the one with the same name in any shared libraries. This allows us to override these functions and redirect them to bdwgc. When bdwgc then continues to call the originals, we use dlopen(3)
to call the unwrapped functions. However for malloc
and a few related functions we cannot do this, as dlopen
itself calls them and ends up at our shim function again, resulting in infinite recursion. Here the dependency on glibc comes into play, it provides alternative symbols for those essential functions that we can use to call the originals.
Currently this only handles glibc on Linux. Darwin and Windows seem to be immune to this issue. If you encounter this issue on any other platform or libc, collecting solutions here is welcome!