-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENV is not thread safe with glibc #34726
Comments
Does musl also have the same issue? |
It certainly seems so. In fact there's even less locking: I should point out that independently of any implementation this is a POSIX issue which doesn't require these functions to be threadsafe. See notes at http://man7.org/linux/man-pages/man3/setenv.3.html for example. |
Having said that, I feel it's reasonable for libc implementations to mitigate this, because regardless of the posix standard, the way that C libraries use these functions in the wild has proven to be unreliable. But fixing this is not at all easy even inside glibc. As described here https://sourceware.org/bugzilla/show_bug.cgi?id=15607#c4 it may be necessary for glibc to leak some memory every time Amusingly windows gets all this right and |
There's been some discussion on musl mailing list back in 2015 about getenv and thread-safety. |
Putting locks around them does not make thread-unsafe functions thread-safe. A fairly large number of libc functions are specified to use the environment, and if you modify the environment, via The right answer is simply that you don't modify the environment; it's effectively immutable input state of the program. To execute another program with changes to the environment, you use one of the The only problem this does not solve is a need to set environment variables that change behavior of library functions within the current process, like |
Our process spawning already operates on copies of the environment, so that's not the issue (though as mentioned we do still have
C libraries (not just the libc) unfortunately often like to use the environment for configuration options that we dynamically need to change. Where we have influence over the C library in question, I think we should make it a policy for these libraries to have an API other than the environment for setting configuration options. For other libraries that we don't control and that are unwilling to take such patches, I'm not sure what to do though. Perhaps the only thing we can do is put a big scary warning in the docs like Rust did. There's also the question what to do with |
Thanks @richfelker for your perspective! You're right; the existence of direct access to
We can somewhat-mitigate this from within julia by using a lock and copying the result of Also thanks for pointing out something which I hadn't considered; that
Exactly. I imagine this is why Julia's If we'd prefer the leaky approach in julia, another nasty idea is to take matters into our own hands on linux and point |
I'm feeling like this is the way to go. In addition, make |
What do people think about compatibility here? I'm happy to do the work to fix this but I'm fairly sure we can't fix it completely without breaking compatibility. So we'll have to decide which trade off to take. Here's several possible plans: Plan 1
This will avoid julia-only crashes but will do nothing for Plan 2As in Plan 1, but essentially make This will fix the data races and preserve normal within-julia use of Plan 3Make This may be conceptually clean but it will break so many scripts and startup files that I think it's Julia 2.0 material. It has the benefit of preventing confusion for those who expect that |
Perhaps I'm being naive, but this doesn't seem so bad to me. We don't use a whole lot of libc functions and we can use even fewer pretty easily. Auditing which glibc calls we use that depend on the environment doesn't seem too bad. I guess the bad/intractable thing is that others may well write Julia code that uses libc directly via ccall or indirectly via calls to C that use libc, none of which would do the appropriate locking. |
Exactly. If it's intended that programs be able to call C library functions, then the problem is not tractable. |
Regarding #34726 (comment), I'm not a julia user, but I would think plan 2 is the best solution. |
Here's an alternative. Since calling setenv is rare and slow anyway, halt all threads but the currently executing one while doing it. I.e. not just holding a lock which other threads have to know about: shut them down entirely and only continue them after the environment is modified. |
Yeah, we can't entirely avoid them: we ship with e.g. BLAS which uses them, so working around this is problematic. It seems like it'd have to be mitigated at the libc level, but only Win32 exposes a thread-safe API. As c42f said above, it seems that libc maintainers could have chosen to mitigate this, though not required by posix to do so, but have apparently generally opted not to. I'd suggest a plan 4: |
Stopping the thread won’t work. You either cannot guarantee that the code will ever finish or you can’t know you didn’t interrupt in the middle of a critical region. |
@StefanKarpinski: Exactly what @yuyichao just said - that can't work unless you have a way to wait for each thread to reach a "quiescent" waypoint where you know it can't be in a critical section. |
I suppose this breaks processes launched in sub-tasks? withenv("FOO" => "BAR") do
@sync @async run(`sh -c 'echo $FOO'`)
end For this to work, I suppose we need something like Context Variables in Python (ref PEP 567)? For running commands, why not just encourage the pure API |
I was thinking more about this, especially about what happens when a library calls the C standard
It's interesting that FreeBSD seems to have chosen to leak memory in this case - see BUGS section in https://www.freebsd.org/cgi/man.cgi?query=setenv&sektion=3&manpath=freebsd-release-ports. Though the man pages don't say what happens when using direct access to
Right,
I feel like this is widely used and will break a fair few things if it's removed during Julia 1.x. Plan 2 gives us a way of not breaking most uses of this syntax (at the cost of a confusing decoupling between the Julia |
Yes, |
Yes, I agree that context variables support should be in a separate discussion. I rather wanted to mention that, because it would take some time to get context variables machinery, plan 1 is probably not the best solution.
How about having a deprecation period (a few minor releases) where |
I don't believe that we can just deprecate
We can make (1) threadsafe by maintaining our own shadow |
I think that pretty well covers it. I'm confused how (3) is actually a major issue though. Generally people don't do this ( |
Right, so in that case, we could just break (3) without breaking every single other use of |
Yes, I think that sounds very reasonable. |
Rust wound up making My extrapolation is that if any libraries read environment for configuration that is expected to be set at runtime, it should be considered a bug in that library. Instead the library needs to manage and lock its own state, read the environment once for defaults, and then provide API to update the state. It sounds like OpenBLAS may fall into this category?
glibc also recently merged bminor/glibc@7a61e7f, which improves the situation for existing code but doesn't change that the "correct" thing is to just never write the environment. |
For some reason I was reading our
ENV
code, and I noticed we don't have a lock around our calls togetenv
andsetenv
on linux, or for iterating the environment via theenviron
variable.This makes
ENV
unsafe for use with multithreaded julia, asgetenv
andsetenv
are not mutually threadsafe in glibc. (See https://github.com/bminor/glibc/blob/master/stdlib/getenv.c and https://github.com/bminor/glibc/blob/master/stdlib/setenv.c. Unsurprisinglygetenv
is safe by itself, but mysteriouslysetenv
is protected by a lock whichgetenv
ignores! So you can mutuallysetenv
safely, but nobody can presume togetenv
elsewhere in the same multithreaded code.)We can easily add some more locks on our side as mitigation, but unfortunately we can't really fix
setenv
without fixing glibc; a random C library we link against may decide to callgetenv
at any point. This definitely happens in practice and I've experienced it personally: OpenMathLib/OpenBLAS#716 (comment). There's a nice discussion on the rust issue tracker including this gem: rust-lang/rust#24741 (comment).Some options:
getenv/setenv/environ
access, and just hope that no C library we link against calls these functions itself. Add a big warning to the docs. This is the current rust approach though it's fragile.setenv
entirely; create a shadow environment which is a copy of the system environment. This is the C# approach but with Julia's tradition of calling into C libraries I doubt that will work out (it creates surprises even in C#; see https://yizhang82.dev/set-environment-variable)For now the only easy / possible thing to do is to clearly option (1): add some locking and a big warning. If glibc could be fixed it could morph into a long term plan. More recent bugs suggest the glibc maintainers may possibly accept a patch.
As a side note, I feel we should consider removing
withenv
in the future because the shape of that API misrepresents reality. On the surfacewithenv
appears to offer dynamic scoping, butENV
is global state. Sowithenv
can never work reliably for concurrent code. That is, unless we took option (2) and avoid the C environment completely.The text was updated successfully, but these errors were encountered: