-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link: segfault with statically linked binaries on linux #13470
Comments
I can't reproduce this. When using glibc, statically linking calls to getpwuid only works if the system has the exact shared libraries available when the program is run as were used when the program was built. If you build your program with If there is any discrepancy there--if, for example, you are building on one system and running on a different one--that could be the cause of your problem. |
I've updated the description some - the suggested reproduction is now a Interestingly, neither of the following invocations produce an error on this machine:
EDIT: ah, the FWIW, the external linker being used:
|
@tamird what's the output with |
Output with
|
verified on an I'll try a source install next, just for kicks. |
Do you have LD_LIBRARY_PATH or LD_PRELOAD set in the environment? |
No.
|
ditto for the ec2 instance. Source install behaves the same, fwiw. |
I'm out of ideas. What version of glibc are you using? What does |
Output below. I can give you root on the AWS box if that helps? root@ip-172-31-48-75:/go# /lib/x86_64-linux-gnu/libc.so.6 On Thu, Dec 3, 2015 at 6:31 PM Ian Lance Taylor notifications@github.com
|
I can reproduce this (on Ubuntu wily, which I guess you are too, judging by glibc version). I poked with gdb a bit. It's crashing here https://sourceware.org/git/?p=glibc.git;a=blob;f=nis/nss_compat/compat-pwd.c;h=e3e3dbb308c2cca45fa26a2631dd6deaf9ee3efd;hb=4e42b5b8f89f0e288e68be7ad70f9525aebc2cff#l555 because the second time it's called I think __ctype_b_loc (called from inside the expansion of isspace) returns the wrong value. The second time it's called is from a different thread and the value of $fs is different and maybe that's relevant? I don't know what's changed in glibc that might have caused this. |
The AWS box was indeed Ubuntu Wily. The Docker image uses this Dockerfile (run using |
and, FWIW, running this a bunch of times it seems like there's always at least a call to |
The offending instruction is
(of course,
So to my untrained eye it looks like
And here's what happens inside (at this point, I think I've caught up with @mwhudson), note the
From the looks of it |
Yes, I think you've followed the same threads as me :-) %fs is definitely related to thread local storage, but I think you've got the cases flipped around: it seems to me it crashes when $fs is 0 and works when it is 99. The thing is (AIUI), when cgo is involved, the c library is responsible for setting tls up, so I don't know what's going on. It might even by a glibc bug I guess. |
Can you send me a static binary build on Ubuntu Wily? |
@mwhudson you're right. After the fatal thread switch, it's zero.
|
I can't recreate it on my system but I'm fairly certain it's a glibc bug. The ctype code relies on TLS variables initialized by a call to __ctype_init. When you call getpwuid_r in a statically linked program, then, depending on the contents of /etc/nsswitch.conf, in some cases the program will dlopen a supporting shared library. Since the main executable is statically linked and has no dynamic symbol table, the supporting shared library can not refer to the same TLS variables. It has its own TLS variables, and when the library is loaded it will call __ctype_init to initialize them. However, as far as I can tell there is no code to call __ctype_init on any existing threads. If you then call into the shared library on an existing thread, then any references from the shared library to the TLS ctype variables will crash. Please try compiling this C program with -static and see what happens. I expect it to crash.
|
dice (@tamird you're missing the
|
Oops, yeah, missed the |
Thanks, I will open a glibc bug. Can you append the contents of /etc/nsswitch.conf on your system? |
Thanks for taking this upstream. I did a cursory check of the glibc bug tracker and couldn't find an issue for this (I checked mostly for various combinations of
|
Filed as https://sourceware.org/bugzilla/show_bug.cgi?id=19341. I can see one way to fix this in the Go code: arrange for all calls to getpwuid_r to go through a single goroutine, and have that goroutine call runtime.LockOSThread. That should ensure that that goroutine will always see the correct TLS values. However, I can't see a compelling reason to penalize all Go programs that use os/user in order to work around a glibc bug that only occurs when linking with -static. Since you want to use -static, I'm going to have to recommend that use that workaround yourself: make all your calls to os/user.Lookup from a single goroutine that calls runtime.LockOSThread. You can drop that workaround when you get a fixed version of glibc. |
names. The user of os/user causes trouble with statically linked binaries (see golang/go#13470 and https://sourceware.org/bugzilla/show_bug.cgi?id=19341) The default blessing name generated doesn't really need os/user so remove it. Change-Id: I7105a269f63c855483c0296ac2919a50dff1e7ac
Here's another way that people deal with this: https://github.com/tamird/cockroach/commit/9c93044ce7d3283e78f5941b8b9bcd836f80a7ef |
CL https://golang.org/cl/34175 mentions this issue. |
…d SEGV Due to an issue in handling thread-local storages, os/user can lead to SEGV when glibc is statically linked with. So we prefer os.Getenv("HOME") for guessing where is the home directory. See also: golang/go#13470 Change-Id: I1046ff93a71aa3b11299f7e6cf65ff7b1fb07eb9 Reviewed-on: https://go-review.googlesource.com/34175 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Fix moby#29344 If HOME is not set, the gcplogs logging driver will call os/user.Current() via oauth2/google. However, in static binary, os/user.Current() leads to segfault due to a glibc issue that won't be fixed in a short term. (golang/go#13470, https://sourceware.org/bugzilla/show_bug.cgi?id=19341) So we forcibly set HOME so as to avoid call to os/user/Current(). Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Static linking with gcc may cause segfaults at runtime on some systems (see golang/go#13470)
Static linking with gcc may cause segfaults at runtime on some systems (see golang/go#13470)
Because of issues with glibc, using the `os/user` package can cause when calling `user.Current()`. Neither the Go maintainers or glibc developers could be bothered to fix it, so we have to work around it by calling the uid and gid functions directly. This is probably better because we don't actually use much of the data provided in the `user.User` struct. This required some refactoring to have better control over when the uid and gid are resolved. Rather than checking the current user on every connection, we now resolve it once at initialization. To test that this provided an improvement in performance, a benchmark was added. Unfortunately, this exposed a regression in the performance of unix sockets in Go when `(*UnixConn).File` is called. The underlying culprit of this performance regression is still at large. The following open issues describe the underlying problem in more detail: golang/go#13470 https://sourceware.org/bugzilla/show_bug.cgi?id=19341 In better news, I now have an entire herd of shaved yaks. Signed-off-by: Stephen J Day <stephen.day@docker.com>
Discovered with @tschottdorf.
Note the "C" import is required, otherwise the go tool does not build a real static binary.
This was discovered in a docker image based on golang:1.5.1, but also tested against
go1.5.2
and 606d9a7 (tip at the time of writing), both built from source in the container. The segfault reproduces in all three. The docker image was running in a virtualbox VM.Output of
go env
:The text was updated successfully, but these errors were encountered: