Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: c-shared builds fail with musllibc #13492

Open
mdempsky opened this issue Dec 5, 2015 · 51 comments · May be fixed by #69325
Open

runtime: c-shared builds fail with musllibc #13492

mdempsky opened this issue Dec 5, 2015 · 51 comments · May be fixed by #69325
Labels
compiler/runtime Issues related to the Go compiler and/or runtime.
Milestone

Comments

@mdempsky
Copy link
Contributor

mdempsky commented Dec 5, 2015

Currently, some of the init_array functions provided by a c-shared build expect to be called with (argc, argv, envp) arguments as is done by glibc, but this isn't specified by the ELF gABI for DT_INIT_ARRAY (http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#init_fini), and isn't done with other libc implementations like musllibc.

CC @ianlancetaylor @mwhudson

@jamesr
Copy link

jamesr commented Dec 5, 2015

musl does expose getauxval() for querying the auxiliary vector but does not provide any way to get the main program's argc/argv. The musl maintainers argue that this isn't something a shared library should have access to anyway.

@mdempsky
Copy link
Contributor Author

mdempsky commented Dec 5, 2015

Somewhat relatedly, C99 says [section 5.1.2.2.1]:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So in the shared library cases where a C main function runs, the Go runtime should conservatively assume the C program may legitimately mutate the argv strings. In particular, it can't use gostringnocopy on them like it currently does in goargs.

@mdempsky
Copy link
Contributor Author

mdempsky commented Dec 5, 2015

So I think in the cases where we need to play nicely with arbitrary C code:

  • syscall's {Clear,Get,Put,Set}env and Environ functions should all treat C as the source of truth for the environment, rather than keeping a local copy of the environment to manipulate and just trying to copy mutations to C. (E.g., even today, if cgo code calls setenv, it won't be visible to os.Getenv; whereas os.Setenv is visible to getenv.)
  • package runtime should use getauxval for accessing the aux value array, to avoid needing to locate the array in memory.

That would just leave needing to figure out a solution for os.Args. It would kinda suck, but maybe we could just leave it nil in the case where Go isn't acting as the program's entry point?

@ianlancetaylor
Copy link
Member

If we can't get os.Args, then we have to leave it nil. But I think we should only do that for environments where we can't get it. When we can get it, as for glibc, we should.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Dec 5, 2015
@mdempsky
Copy link
Contributor Author

mdempsky commented Dec 5, 2015

I just looked at glibc, Bionic, uClibc, musl, and dietlibc, as well as the dynamic linkers for FreeBSD, NetBSD, and OpenBSD. It looks like only glibc passes (argc, argv, envp) to the DSO init functions.

If we want to detect glibc at build time, it seems like we can include <limits.h> and test for defined(__GLIBC__) && !defined(__UCLIBC__). (The User-Agent saga continues as uClibc claims to be glibc.)

To detect at runtime, we could use a weak reference to a glibc-only symbol like gnu_get_libc_version and see if it resolves. I'm worried about making sure we pick a symbol that won't later be implemented by other C libraries though. (E.g., I found a thread where someone suggested adding a gnu_get_libc_version function to musl to make it compatible with Nvidia's binary drivers.)

Any other suggestions/ideas for detecting glibc?

@mdempsky mdempsky self-assigned this Dec 7, 2015
@mdempsky
Copy link
Contributor Author

mdempsky commented Dec 7, 2015

For what it's worth, I tracked down that glibc started passing (argc, argv, envp) to the DSO init functions in 1996: https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=elf/dl-open.c;h=76f6329762308de4ba1620c50ff32d2c02359766;hp=40b52247253cf045498761342afd09ba3c7e1187;hb=dcf0671d905200c449f92ead6cf43c184637a0d5;hpb=4884d0f03c5a3b3d2459655e76fa2d0684d389dc

So at least we don't need to worry about glibc version detection.

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Nov 15, 2016
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@inolen
Copy link

inolen commented Dec 29, 2016

I just ran into this issue while trying to help my partner debug a crash when using a golang plugin they'd written for fluentbit (https://github.com/fluent/fluent-bit).

When running on Alpine Linux which uses musl, I get a segfault inside of runtime.sysargs due to a bad argv pointer. The cause of this is as mentioned above, when the plugin is dlopened, musl does not pass any arguments to the members of DT_INIT_ARRAY. The platform-specific init function (_rt0_amd64_linux_lib in my case) assumes it's being passed a valid argc / argv, and eventually segfaults as they are not.

After finding this issue, it seems that the way forward is to:

  • update each platform-specific init file to contain an empty argument vector, and update _rt0_*_lib_argv to point at this empty vector
  • update each platform-specific init function to only overwrite the default argv pointer with the incoming arguments if glibc is detected

Can anyone comment if testing for glibc would still be desired or not?

@rrozestw
Copy link

The same problem appears when using c-archive build mode with musl.

@ianlancetaylor
Copy link
Member

If there is a reliable way to test for glibc, then I think it would be perfectly reasonable to do that. @mdempsky 's comment above suggests a way.

@mdempsky mdempsky removed their assignment Jan 9, 2017
@mdempsky
Copy link
Contributor Author

mdempsky commented Jan 9, 2017

Unassigning because I'm not planning to work on this, but still happy to review if anyone has suggestions.

@inolen
Copy link

inolen commented Jan 9, 2017

@mdempsky I started work on the x64 / x86 / arm / arm64 versions the other night, but it was becoming quite time consuming to test. I have docker images running qemu, which are bootstrapped with a cross-compiled toolchain from my host using bootstrap.sh, but then recompile the toolchain locally inside of qemu for CGO support. Is there a better / faster way to test across the various targets?

Also, do you have any recommendations for getting access to the above pre-processor defines from the platform-specific assembly files? Is it possible, or would they need to call into some C function to do the work, e.g.:

void override_args(int argc, char *argv, int *argc_out, char *argv_out) {
#if GLIBC
  *argc_out = argc;
  *argv_out = argv;
#endif
}

@mdempsky
Copy link
Contributor Author

mdempsky commented Jan 9, 2017

@inolen I think calling out to a C function (or Go function using cgo) in runtime/cgo is probably simplest/cleanest. Even if POSIX guaranteed there was a system header that could be safely #include'd into non-C files, cmd/asm doesn't support the full C preprocessor language.

We could potentially have cmd/dist detect glibc and generate cmd/asm-compatible .h files, but then we need to re-run make.bash depending on target libc, which seems unfortunate.

Lastly, sorry, I don't have any good solution to efficiently testing either. That's a contributing factor to why I haven't gotten around to it yet. :/

benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Jan 13, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 8, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
benoit-canet pushed a commit to benoit-canet/osv that referenced this issue Feb 9, 2017
As explained nicely in golang/go#13492, since 1996
glibc's dlopen() passes argv and a bunch of other
stuff to the initialization functions of shared
objects, while we don't pass any argument
(see object::run_init_funcs()). Golang code
(ref cloudius-systems#522), in particular, assumes it gets argv,
auxv, etc., this way.

Fixes: cloudius-systems#795

Signed-off-by: Benoît Canet <benoit@scylladb.com>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/37868 mentions this issue.

@inolen
Copy link

inolen commented Mar 9, 2017

@mdempsky pushed review to https://go-review.googlesource.com/c/37868/

Initially, I tried the approach I mentioned above, but setting up the default arguments / calling into the cgo function in each platform-specific assembler function became a lot of code. After digging around the runtime code more, I found the islibrary and isarchive bools which let me fix this outside of each platform's library init routines.

I'm not sure if the code allocating the default arguments is correct. I read through https://github.com/golang/go/blob/master/src/runtime/HACKING.md and I think using persitentalloc is sane for this case, but perhaps I need to use sysAlloc and setup the appropriate terminate functions to free it back up.

No new tests were added, as the old testcarchive / testcshared tests failed when using musl. However I did setup a few scripts which ran the golang library tests for various os / arch combinations through docker / binfmt_misc / qemu here:
https://gist.github.com/inolen/499da4e40a866b3f8fa5be3635d78721

@richfelker
Copy link

While the current situation is not ideal

The current situation is that it crashes dereferencing a pointer that came from interpreting garbage on the stack or in a register as a pointer, so anything that doesn't crash from that seems like an improvement.

If there are libraries written in Go that are trying to interpret the main program's initial arguments, or other random data left there after the main program overwrote that storage, this is surely a bug that needs to be identified and fixed. It's very intentional that musl does not provide these arguments to ctors because (1) it's nonstandard functionality with no means to detect its presence, meaning the only way you can use it is by writing nonportable code that commits UB when the functionality isn't available, and (2) it's functionality whose only purpose is to write library-unsafe library code that peeks (or worse, pokes) at data that doesn't belong to it.

@ianlancetaylor
Copy link
Member

From my perspective, the current situation is that these libraries work as expected when using glibc, which is the majority of Linux systems. I respect that musl has adopted a different approach, and I certainly think we should support that if we can figure out how. But while I don't know of any good answer, I don't think the approach of breaking existing code that works when using glibc is the best one.

@donob4n
Copy link

donob4n commented Sep 30, 2022

If the problem is breaking working code for glibc it could only skip 'goargs()' when no-glibc lib is detected (or at least musl).

This way the affected apps will only fail when someone tries to build them with a non-glibc and hopefully he will found the problem and report to upstrream.

Since it needs some hooks on sysargs() and other funcs, it could also write some debug warning like "Trying to read args but built as c-xxxx".

@ianlancetaylor
Copy link
Member

Is there a reasonable way for a Go archive to know whether it is being run on a glibc or a musl system?

@donob4n
Copy link

donob4n commented Oct 2, 2022

Do you mean in runtime? There is a GLIBC macro that can be used when building.

@ianlancetaylor
Copy link
Member

The runtime is not written in C. Building a pure Go program must not require a C compiler.

@richfelker
Copy link

I don't understand the question. If you're building a Go program which is pure Go interfacing with the underlying target's syscall layer only, with no C libs, and providing its own entry point, then of course you can do whatever you want.

My understanding is that we're talking about build modes where code is being built to be linked with C library code, or as a library that's loadable into C programs. (Here by "C" I'm being a little bit sloppy, but the meaning is programs built on the underlying host C implementation and that need to be compatible with it.)

In that case, the only place the arguments are available is via a local pointer object belonging to main, and only the current environment (via extern char **environ or getenv etc.), not the initial environment with aux vector attached, is available to inspect. So Go code running in a context where it was not in a position to provide main and save the args, and where no contract was made with the calling code to pass along args, simply does not have access to the args. If glibc and some other environments provide a way to get this (but note: they don't actually document what they provide, and since these objects "belong to" main, it's possible that a glibc-linked program will already have clobbered them by the time the go library code tries to inspect them) then I guess you can access that conditional on knowing you're on an implementation that provides that. musl specifically does not do this.

The question is not whether you're running on a musl-based host but whether you're linked to a host libc environment that provides these nonstandard interfaces.

@ianlancetaylor
Copy link
Member

@richfelker You probably know this, but just for clarity for all. We are talking about -buildmode=c-shared. The resulting shared library will be linked with a C program. But the shared library itself may be pure Go code. (Sorry for saying "a pure Go program" above; that was a misleading way to describe the scenario.)

I agree that what really matters is some way for the shared library to determine whether the dynamic linker is passing the argc/argv values to the DT_INIT_ARRAY constructor functions.

It is not my place to suggest musl changes, but given the existing glibc behavior, behavior that some projects other than Go also expect, would it be completely unreasonable for musl to clear the first three arguments passed to DT_INIT_ARRAY functions, so that those functions would at least see zero values rather than garbage?

@richfelker
Copy link

would it be completely unreasonable for musl to clear the first three arguments passed to DT_INIT_ARRAY functions, so that those functions would at least see zero values rather than garbage?

Yes. Specifically, it would invoke UB. The specification for the ctor functions in the init array is that they take no arguments. If we called them with arguments, that's a call where the signature of the callee mismatches the function type used to call it, and the behavior is undefined. This actually matters if you want to someday support fancy ABIs that can do runtime CFI-like checks. Establishing a norm that we support this usage, even just by passing zeroed args, precludes the possibility to do anything like that.

@jgowdy
Copy link

jgowdy commented Oct 11, 2022

Given the popularity and benefits of Alpine Linux in containerized applications, the dependency on non-standard / glibc-specific behaviors makes advocating for the use of Golang difficult. If we're to describe these issues in the terms of "first class ports" versus non-first class ports, it might make more sense to rename the ports linux/* to something like linux/glibc/* to indicate that it's only glibc based Linux distros that are first class supported.

@ianlancetaylor
Copy link
Member

The ports page https://go.dev/wiki/PortingPolicy does already say that only glibc ports are considered first class.

@ansiwen
Copy link
Contributor

ansiwen commented Oct 12, 2022

I think @jgowdy's subtext was: please reconsider that musl is not first class. It's kind of ironic that the main drivers for container deployments and the popularity of Go (docker, kubernetes, ...) are written in Go, but the the majority(?) of their payloads run on a "non-first-class" operating system (alpine linux).

@ianlancetaylor
Copy link
Member

Fair enough, sorry for the misunderstanding. This issue is not the place for that discussion, though.

@NicholasLYang
Copy link

Running into this issue as well when trying to build a Go library that's statically linked. It'd be nice if musl could be considered a first-class platform because Alpine Linux is an extremely common OS for cloud purposes.

tknickman added a commit to vercel/turborepo that referenced this issue Jan 6, 2023
Because of golang/go#13492, we cannot use Go as a shared library that we
link to Rust. Instead we are now building the Go code as a separate
binary that is executed by the Rust one.

Co-authored-by: Greg Soltis <greg.soltis@vercel.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: Greg Soltis <greg@goldfiglabs.com>
Co-authored-by: Thomas Knickman <tom.knickman@vercel.com>
@rolandshoemaker
Copy link
Member

Randomly came across this. In 2015 @mdempsky suggested attempting to resolve gnu_get_libc_version (#13492 (comment)), but with the concern that musl might implement this, breaking this detection method. In the intervening 9 years, it appears as if musl has not done this (as far as I can tell). Should we just to use this method?

adambenhassen added a commit to adambenhassen/go that referenced this issue Sep 6, 2024
…rchive

This change fixes a segmentation fault that occurs on musl-based systems
when using c-archive (or c-shared with LD_PRELOAD). The issue
was caused by uninitialized argv and envp, which led to crashes at runtime.

This fix ensures that both argv and envp are correctly initialized,
preventing the segmentation fault.

While this fix addresses crashes due to missing argv and envp, it does
not address the error described in issue golang#54805, where a c-shared library
loaded with dlopen triggers an error related initial-exec access to
TLS objects [1] in a dynamically loaded library.

[1]: http://git.musl-libc.org/cgit/musl/commit/?id=5c2f46a214fceeee3c3e41700c51415e0a4f1acd

Fixes golang#13492
adambenhassen added a commit to adambenhassen/go that referenced this issue Sep 6, 2024
This change fixes a segmentation fault that occurs on musl-based systems
when using c-archive (or c-shared with LD_PRELOAD). The issue
was caused by uninitialized argv and envp, which led to crashes at runtime.

This fix ensures that both argv and envp are correctly initialized,
preventing the segmentation fault.

While this fix addresses crashes due to missing argv and envp, it does
not address the error described in issue golang#54805, where a c-shared library
loaded with dlopen triggers an error related initial-exec access to
TLS objects [1] in a dynamically loaded library.

[1]: http://git.musl-libc.org/cgit/musl/commit/?id=5c2f46a214fceeee3c3e41700c51415e0a4f1acd

Fixes golang#13492
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/610837 mentions this issue: runtime: fix segfault due to missing argv on musl-linux c-archive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime.
Projects
None yet
Development

Successfully merging a pull request may close this issue.