ENV is not thread safe with glibc #34726

c42f · 2020-02-11T11:56:38Z

For some reason I was reading our ENV code, and I noticed we don't have a lock around our calls to getenv and setenv on linux, or for iterating the environment via the environ variable.

This makes ENV unsafe for use with multithreaded julia, as getenv and setenv are not mutually threadsafe in glibc. (See https://github.com/bminor/glibc/blob/master/stdlib/getenv.c and https://github.com/bminor/glibc/blob/master/stdlib/setenv.c. Unsurprisingly getenv is safe by itself, but mysteriously setenv is protected by a lock which getenv ignores! So you can mutually setenv safely, but nobody can presume to getenv elsewhere in the same multithreaded code.)

We can easily add some more locks on our side as mitigation, but unfortunately we can't really fix setenv without fixing glibc; a random C library we link against may decide to call getenv at any point. This definitely happens in practice and I've experienced it personally: OpenMathLib/OpenBLAS#716 (comment). There's a nice discussion on the rust issue tracker including this gem: rust-lang/rust#24741 (comment).

Some options:

Mitigation: Put our own lock around getenv/setenv/environ access, and just hope that no C library we link against calls these functions itself. Add a big warning to the docs. This is the current rust approach though it's fragile.
Avoidance: Ban using libc setenv entirely; create a shadow environment which is a copy of the system environment. This is the C# approach but with Julia's tradition of calling into C libraries I doubt that will work out (it creates surprises even in C#; see https://yizhang82.dev/set-environment-variable)
Fix glibc??: Every portable language runtime has this exact problem, so people have tried fixing this in the past. In older times they were met with surprising hostility. The seemingly-current bug is marked suspended https://sourceware.org/bugzilla/show_bug.cgi?id=15607#c2 whatever that means.

For now the only easy / possible thing to do is to clearly option (1): add some locking and a big warning. If glibc could be fixed it could morph into a long term plan. More recent bugs suggest the glibc maintainers may possibly accept a patch.

As a side note, I feel we should consider removing withenv in the future because the shape of that API misrepresents reality. On the surface withenv appears to offer dynamic scoping, but ENV is global state. So withenv can never work reliably for concurrent code. That is, unless we took option (2) and avoid the C environment completely.

The text was updated successfully, but these errors were encountered:

ViralBShah · 2020-02-11T12:06:46Z

Does musl also have the same issue?

c42f · 2020-02-11T12:19:41Z

Does musl also have the same issue?

It certainly seems so. In fact there's even less locking:
https://git.musl-libc.org/cgit/musl/tree/src/env/getenv.c
https://git.musl-libc.org/cgit/musl/tree/src/env/setenv.c

I should point out that independently of any implementation this is a POSIX issue which doesn't require these functions to be threadsafe. See notes at http://man7.org/linux/man-pages/man3/setenv.3.html for example.

c42f · 2020-02-11T12:34:01Z

this is a POSIX issue

Having said that, I feel it's reasonable for libc implementations to mitigate this, because regardless of the posix standard, the way that C libraries use these functions in the wild has proven to be unreliable. But fixing this is not at all easy even inside glibc. As described here https://sourceware.org/bugzilla/show_bug.cgi?id=15607#c4 it may be necessary for glibc to leak some memory every time setenv is called because some other part of the process may be holding onto the result of a previous getenv.

Amusingly windows gets all this right and GetEnvironmentVariableW and friends just do what you'd expect and want with no fuss.

ecsx1 · 2020-02-11T15:30:08Z

There's been some discussion on musl mailing list back in 2015 about getenv and thread-safety.
https://www.openwall.com/lists/musl/2015/03/04/11 cc @richfelker

richfelker · 2020-02-11T16:39:44Z

Putting locks around them does not make thread-unsafe functions thread-safe. A fairly large number of libc functions are specified to use the environment, and if you modify the environment, via setenv or direct manipulation of environ, any concurrent call to one of them produces a data race and thereby undefined behavior. musl is pretty strict about not introducing environment access into functions not specified to depend on the environment, but glibc is not, so it's hard to even make a viable approach of adding locks all the functions that might use the environment.

The right answer is simply that you don't modify the environment; it's effectively immutable input state of the program. To execute another program with changes to the environment, you use one of the exec/posix_spawn family functions that takes a complete new environment as an argument rather than editing your own environment first. If you have a language interpreter/runtime that allows logical modification to the environment (like the shell does), you don't actually modify your own environment, but keep a separate logical copy that's accessed under appropriate locks that you expose back to the hosted program and export to child processes when execing them.

The only problem this does not solve is a need to set environment variables that change behavior of library functions within the current process, like TZ.

Keno · 2020-02-12T01:11:49Z

Our process spawning already operates on copies of the environment, so that's not the issue (though as mentioned we do still have withenv and of course the ENV["FOO"] = "BAR" accessors that do call the underlying libc setenv). The big problem is the last issue mentioned:

The only problem this does not solve is a need to set environment variables that change behavior of library functions within the current process, like TZ.

C libraries (not just the libc) unfortunately often like to use the environment for configuration options that we dynamically need to change. Where we have influence over the C library in question, I think we should make it a policy for these libraries to have an API other than the environment for setting configuration options.

For other libraries that we don't control and that are unwilling to take such patches, I'm not sure what to do though. Perhaps the only thing we can do is put a big scary warning in the docs like Rust did.

There's also the question what to do with withenv. In theory, we could just have that be a task-local immutable overlay over the actual process environment. That would achieve the dynamically-scoped environment manipulation that I do think is probably useful for people using Julia for shell-like tasks. Of course at that point, we'd probably still need a way to modify the global environment (unsafe_setenv!?) for aforementioned miscellaneous C libraries, but perhaps that's a better situation in that it signals to users to avoid this interface if possible.

c42f · 2020-02-12T01:14:17Z

Thanks @richfelker for your perspective! You're right; the existence of direct access to environ and the fact that getenv returns a pointer to memory owned by libc (or the system) leaves no reasonable solution in sight for *libc. Either

Using the result of getenv in the presence of setenv leads to a data race or
setenv would have to permanently leak the memory from the previous environment.

We can somewhat-mitigate this from within julia by using a lock and copying the result of getenv into our own buffer before releasing the lock, but in no way does that fix the problem globally.

Also thanks for pointing out something which I hadn't considered; that setenv has an implicit data race with several libc functions. (The list for musl looks very short: https://wiki.musl-libc.org/environment-variables.html)

The only problem this does not solve is a need to set environment variables that change behavior of library functions within the current process, like TZ.

Exactly. I imagine this is why Julia's ENV currently uses setenv the way it does. We're now in a sticky situation because changing to an immutable environment model would leave no way to control libraries which rely on the environment. Perhaps we could backpedal a bit in future and expose an unsafe_setenv to change the C environment, but make ENV modify a copy of it.

If we'd prefer the leaky approach in julia, another nasty idea is to take matters into our own hands on linux and point environ to some memory we own. Of course then it's not clear what to do if some C library calls setenv.

c42f · 2020-02-12T01:25:01Z

There's also the question what to do with withenv. In theory, we could just have that be a task-local immutable overlay over the actual process environment. That would achieve the dynamically-scoped environment manipulation that I do think is probably useful for people using Julia for shell-like tasks. Of course at that point, we'd probably still need a way to modify the global environment (unsafe_setenv!?) for aforementioned miscellaneous C libraries, but perhaps that's a better situation in that it signals to users to avoid this interface if possible.

I'm feeling like this is the way to go. In addition, make ENV lookup also access that task-local immutable overlay of the process environment. At the moment, I feel like I see various julia libraries recommending ENV for configuration, and having syntax as simple and enticing as ENV["FOO"] = "BAR" as a surprising alias for please_create_a_data_race_with_some_random_c_library() doesn't seem great :-)

c42f · 2020-02-14T07:46:38Z

What do people think about compatibility here? I'm happy to do the work to fix this but I'm fairly sure we can't fix it completely without breaking compatibility. So we'll have to decide which trade off to take.

Here's several possible plans:

Plan 1

Make withenv add a Dict or possibly ImmutableDict{String,String} to task_local_storage (using :ENV_overlay as key?)
Make all ENV access first look that up before falling back to the global ccall to getenv. Including getindex and iteration.
Add locks around all the internal access to environ via getenv etc
Soft-deprecate setindex!(ENV) by adding a big warning discouraging it.
Add unsafe_setenv! and document that it's inherently unsafe, but is for configuring C libraries which insist on using environment variables for config.

This will avoid julia-only crashes but will do nothing for getenv data races in C libraries. It should keep most things working though there may be some edge cases with withenv and C libraries (which while they may work have no hope of ever working concurrently).

Plan 2

As in Plan 1, but essentially make ENV a normal Dict which is a copy of the environ at julia initialization. Add locks around it so that modifying it is threadsafe.

This will fix the data races and preserve normal within-julia use of ENV as it's done currently. However, it will break any code which is relying on C libraries seeing the updated ENV and those would have to transition to unsafe_setenv!. However doing that via Compat would be simple.

Plan 3

Make ENV an immutable copy of the environment. Encourage use of withenv (implemented as above) for convenience in launching processes. Provide unsafe_setenv! for desperate situations and discourage its use.

This may be conceptually clean but it will break so many scripts and startup files that I think it's Julia 2.0 material.

It has the benefit of preventing confusion for those who expect that ENV can be "truly" modified. On the other hand it would be less natural for simple imperative-style scripting which wants to do a lot of process spawning.

StefanKarpinski · 2020-02-14T16:28:59Z

musl is pretty strict about not introducing environment access into functions not specified to depend on the environment, but glibc is not, so it's hard to even make a viable approach of adding locks all the functions that might use the environment.

Perhaps I'm being naive, but this doesn't seem so bad to me. We don't use a whole lot of libc functions and we can use even fewer pretty easily. Auditing which glibc calls we use that depend on the environment doesn't seem too bad. I guess the bad/intractable thing is that others may well write Julia code that uses libc directly via ccall or indirectly via calls to C that use libc, none of which would do the appropriate locking.

richfelker · 2020-02-14T16:36:45Z

I guess the bad/intractable thing is that others may well write Julia code that uses libc directly via ccall or indirectly via calls to C that use libc, none of which would do the appropriate locking.

Exactly. If it's intended that programs be able to call C library functions, then the problem is not tractable.

richfelker · 2020-02-14T16:37:30Z

Regarding #34726 (comment), I'm not a julia user, but I would think plan 2 is the best solution.

StefanKarpinski · 2020-02-14T16:43:56Z

Here's an alternative. Since calling setenv is rare and slow anyway, halt all threads but the currently executing one while doing it. I.e. not just holding a lock which other threads have to know about: shut them down entirely and only continue them after the environment is modified.

vtjnash · 2020-02-14T17:11:56Z

Yeah, we can't entirely avoid them: we ship with e.g. BLAS which uses them, so working around this is problematic. It seems like it'd have to be mitigated at the libc level, but only Win32 exposes a thread-safe API. As c42f said above, it seems that libc maintainers could have chosen to mitigate this, though not required by posix to do so, but have apparently generally opted not to.

I'd suggest a plan 4:
Eliminate withenv, rename setenv! to unsafe_setenv!, use (and improve) setenv for launching processes.

yuyichao · 2020-02-14T17:27:20Z

Stopping the thread won’t work. You either cannot guarantee that the code will ever finish or you can’t know you didn’t interrupt in the middle of a critical region.

richfelker · 2020-02-14T17:31:21Z

@StefanKarpinski: Exactly what @yuyichao just said - that can't work unless you have a way to wait for each thread to reach a "quiescent" waypoint where you know it can't be in a critical section.

tkf · 2020-02-15T00:34:03Z

Make withenv add a Dict or possibly ImmutableDict{String,String} to task_local_storage (using :ENV_overlay as key?)

I suppose this breaks processes launched in sub-tasks?

withenv("FOO" => "BAR") do
    @sync @async run(`sh -c 'echo $FOO'`)
end

For this to work, I suppose we need something like Context Variables in Python (ref PEP 567)?

For running commands, why not just encourage the pure API setenv(cmd, pairs...)? Maybe it's worth adding a convenience API setenv(cmd, ::AbstractDict, ::Pair...) as it's rather inconvenient to modify existing environment variables using setenv.

c42f · 2020-02-17T04:10:49Z

It seems like it'd have to be mitigated at the libc level, but only Win32 exposes a thread-safe API.

I was thinking more about this, especially about what happens when a library calls the C standard getenv on windows instead of directly calling the Win32 API. Well of course libraries may be likely to do this, and of course (in hindsight) it doesn't work. So even on windows the average C library will have exactly the same bug. For example I found this old issue from 2014 OpenMathLib/OpenBLAS#394 ... and to my surprise it's @vtjnash reporting this exact issue to OpenBLAS.

As c42f said above, it seems that libc maintainers could have chosen to mitigate this, though not required by posix to do so, but have apparently generally opted not to.

It's interesting that FreeBSD seems to have chosen to leak memory in this case - see BUGS section in https://www.freebsd.org/cgi/man.cgi?query=setenv&sektion=3&manpath=freebsd-release-ports. Though the man pages don't say what happens when using direct access to environ.

I'd suggest a plan 4:
Eliminate withenv, rename setenv! to unsafe_setenv!, use (and improve) setenv for launching processes.

Right, withenv seems pretty redundant with setenv. But what should we do about our current attractive-but-dangerous support for C setenv via the syntax

ENV["A"] = "B"

I feel like this is widely used and will break a fair few things if it's removed during Julia 1.x.

Plan 2 gives us a way of not breaking most uses of this syntax (at the cost of a confusing decoupling between the Julia ENV and underlying C env).

c42f · 2020-02-17T04:17:07Z

I suppose this breaks processes launched in sub-tasks?
For this to work, I suppose we need something like Context Variables in Python (ref PEP 567)?

Yes, task_local_storage doesn't play nicely with @async at all, but this is "just another" case of it. So I'm kind of presuming any solution to that general problem will also fix withenv as a special case. (I definitely think we need to sort this out and I think we need "some kind" of context inheritance from the parent task. I've thought a bit about it but haven't written anything yet. I'm hoping we may have some type-based equivalent of that PEP ContextVar which is arranged so that users can access task local storage in a more type stable way. But that's a whole separate discussion we can have elsewhere.)

tkf · 2020-02-17T05:44:30Z

Yes, I agree that context variables support should be in a separate discussion. I rather wanted to mention that, because it would take some time to get context variables machinery, plan 1 is probably not the best solution.

Plan 2 gives us a way of not breaking most uses of this syntax (at the cost of a confusing decoupling between the Julia ENV and underlying C env).

How about having a deprecation period (a few minor releases) where ENV[k] = v still invokes unsafe_setenv! with a deprecation warning? During this period, users would do using Future: ENV to use plan 2 ENV.

StefanKarpinski · 2020-02-17T18:12:24Z

I don't believe that we can just deprecate ENV[k] = v without causing massive havoc in the ecosystem, so I'm trying hard to think of alternatives. There are basically three use cases for ENV[k] = v:

communicate to Julia code in the same process that looks at ENV;
spawn a subprocess with this changed environment;
affect libraries via the libc environment.

We can make (1) threadsafe by maintaining our own shadow ENV dict that is separate from the libc environment. This shadow ENV can also be passed to subprocesses, which addresses (2) as well. It seems that (3) causes all the problems. Am I missing any cases?

richfelker · 2020-02-17T18:55:37Z

I think that pretty well covers it. I'm confused how (3) is actually a major issue though. Generally people don't do this (setenv) in C when calling libraries that use the environment to get defaults/settings, because it's not safe to do so there either. Instead, most libraries just use the environment for defaults and let you set the same parameters globally or in a context via some sort of runtime mechanism too. Using the environment for runtime configuration after process lifetime has started is anti-idiomatic in C to begin with.

StefanKarpinski · 2020-02-17T19:07:50Z

Right, so in that case, we could just break (3) without breaking every single other use of ENV in the ecosystem. When people complain about their C libraries no longer picking up changes to ENV, we can say "don't do that, use the library's configuration API"; and we can also have an explicitly thread-unsafe unsafe_flush_env() function that copies the shadow ENV back to libc's environment if someone really just wants to make this work and doesn't care about thread safety.

richfelker · 2020-02-17T19:30:35Z

Yes, I think that sounds very reasonable.

tgross35 · 2024-12-23T04:14:36Z

Rust wound up making env::set_var unsafe starting with the 2024 edition, the discussion is mostly at rust-lang/rust#27970 (that is probably the most extensive discussion on this topic). The gist is that there is no way to lock environment in a way that is effective across ecosystems, and that libcs probably won't be fixed because POSIX doesn't specify this (and shouldn't need to). Instead, setenv should be discouraged in almost all circumstances, which is about the same conclusion in this thread.

My extrapolation is that if any libraries read environment for configuration that is expected to be set at runtime, it should be considered a bug in that library. Instead the library needs to manage and lock its own state, read the environment once for defaults, and then provide API to update the state. It sounds like OpenBLAS may fall into this category?

TZ seems to be the common exception here. Rich mentioned that there has been discussion about eventually replacing this but it sounds like this is unlikely to be usable anytime soon.

glibc also recently merged bminor/glibc@7a61e7f, which improves the situation for existing code but doesn't change that the "correct" thing is to just never write the environment.

KristofferC mentioned this issue Feb 17, 2020

be more correct when it comes to env handling JuliaLang/PackageCompiler.jl#338

Merged

KristofferC mentioned this issue Apr 2, 2020

Unset TMPDIR before calling tempnam in tempname #35342

Closed

This was referenced May 8, 2020

Test suite tweaks for rr #35539

Merged

RFC: Context variables #35833

Closed

tkf mentioned this issue Jun 30, 2020

ENV as a normal Dict: populate at boot, copy when spawning children #36494

Closed

tkf mentioned this issue Sep 23, 2020

Add opt-in auto precompilation after Pkg.add, instantiate, update JuliaLang/Pkg.jl#2033

Merged

giordano mentioned this issue Mar 21, 2021

Make running tokei thread-safe JuliaEcosystem/PackageAnalyzer.jl#42

Merged

mfsch mentioned this issue Dec 16, 2021

Segfault when running dlopen/dlclose (on NixOS 21.11) #43434

Closed

josephmckinsey mentioned this issue Mar 27, 2023

Support alternate color palette NREL-Sienna/PowerGraphics.jl#106

Merged

staticfloat mentioned this issue Oct 4, 2023

feat: add JULIA_PKG_GC_AUTO env var JuliaLang/Pkg.jl#3642

Merged

evanj mentioned this issue Oct 31, 2023

Consider deprecating and/or modifying behavior of std::env::set_var rust-lang/rust#90308

Closed

schveiguy mentioned this issue Dec 5, 2024

Fix #10574 - environment pointer sometimes changes before exec, causing error in child process. dlang/phobos#10575

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENV is not thread safe with glibc #34726

ENV is not thread safe with glibc #34726

c42f commented Feb 11, 2020 •

edited

Loading

ViralBShah commented Feb 11, 2020

c42f commented Feb 11, 2020

c42f commented Feb 11, 2020

ecsx1 commented Feb 11, 2020

richfelker commented Feb 11, 2020

Keno commented Feb 12, 2020

c42f commented Feb 12, 2020

c42f commented Feb 12, 2020 •

edited

Loading

c42f commented Feb 14, 2020

StefanKarpinski commented Feb 14, 2020 •

edited

Loading

richfelker commented Feb 14, 2020

richfelker commented Feb 14, 2020

StefanKarpinski commented Feb 14, 2020

vtjnash commented Feb 14, 2020

yuyichao commented Feb 14, 2020

richfelker commented Feb 14, 2020

tkf commented Feb 15, 2020 •

edited

Loading

c42f commented Feb 17, 2020

c42f commented Feb 17, 2020

tkf commented Feb 17, 2020

StefanKarpinski commented Feb 17, 2020 •

edited

Loading

richfelker commented Feb 17, 2020

StefanKarpinski commented Feb 17, 2020 •

edited

Loading

richfelker commented Feb 17, 2020

tgross35 commented Dec 23, 2024

ENV is not thread safe with glibc #34726

ENV is not thread safe with glibc #34726

Comments

c42f commented Feb 11, 2020 • edited Loading

ViralBShah commented Feb 11, 2020

c42f commented Feb 11, 2020

c42f commented Feb 11, 2020

ecsx1 commented Feb 11, 2020

richfelker commented Feb 11, 2020

Keno commented Feb 12, 2020

c42f commented Feb 12, 2020

c42f commented Feb 12, 2020 • edited Loading

c42f commented Feb 14, 2020

Plan 1

Plan 2

Plan 3

StefanKarpinski commented Feb 14, 2020 • edited Loading

richfelker commented Feb 14, 2020

richfelker commented Feb 14, 2020

StefanKarpinski commented Feb 14, 2020

vtjnash commented Feb 14, 2020

yuyichao commented Feb 14, 2020

richfelker commented Feb 14, 2020

tkf commented Feb 15, 2020 • edited Loading

c42f commented Feb 17, 2020

c42f commented Feb 17, 2020

tkf commented Feb 17, 2020

StefanKarpinski commented Feb 17, 2020 • edited Loading

richfelker commented Feb 17, 2020

StefanKarpinski commented Feb 17, 2020 • edited Loading

richfelker commented Feb 17, 2020

tgross35 commented Dec 23, 2024

c42f commented Feb 11, 2020 •

edited

Loading

c42f commented Feb 12, 2020 •

edited

Loading

StefanKarpinski commented Feb 14, 2020 •

edited

Loading

tkf commented Feb 15, 2020 •

edited

Loading

StefanKarpinski commented Feb 17, 2020 •

edited

Loading

StefanKarpinski commented Feb 17, 2020 •

edited

Loading