-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix:Introduce an interface to expose the current Command
captured env var logic
#194
Comments
To add to my deprecation section: I feel that the interface with
I think that If I was re-doing this same interface from scratch and wanted to expose "what env var state was explicitly set on this object," I might use a custom enum with two states, I think naming is hard, though. I think it needs to start with
Split into two methods is an option:
|
For what it's worth, I prefer Of course it can be changed if necessary, but if it's already two collections that get chained together, then that's all the more reason to split up the methods. If we do split them up, then I'd like to expose |
This is the internal struct: https://github.com/rust-lang/rust/blob/e10eab5956f91c26dcc5ae29a19cfcd747047e4d/library/std/src/sys_common/process.rs#L14-L18. It's a single collection that stores env state pretty much the same way it's presented in the public API. It's a Then right before the command executes, this is how the env vars are resolved: https://github.com/rust-lang/rust/blob/e10eab5956f91c26dcc5ae29a19cfcd747047e4d/library/std/src/sys_common/process.rs#L36-L51. |
I'm unsure of what happens next in this process so I made a reference implementation for "[sketch] Expose envs via a new method on Command" its at rust-lang/rust#110327. I would like feedback there or here Ultimately I think I still prefer this interface: let envs: HashMap<OsString, OsString> = command.get_envs().with_inherited(); Though it will require a bit of refactoring. If it helps, I can code up that implementation too. |
I think returning an iterator makes a lot more sense. You can always collect it into a map. Plus that way you get to choose the kind of map you want, if you want the results sorted you can use BTreeMap for instance. |
Note that talking about inherited values only makes limited sense because those are only snapshotted at process creation time and may be changed before or after via If you want to inspect the environment of a process with you can, at least on unix, read Also note that for debugging purposes the state already is available via the alternate formatting |
Conceptually this limitation is no different than any other command that interacts with the system. For example: After a file is read from the disk, it could be mutated or modified. It's worth noting this caveat in the documentation: the developer is responsible for timing and isolation.
We can access that information via https://doc.rust-lang.org/std/env/fn.vars_os.html, though without a stable/consistent way to determine if clear has been called, developers can't resolve the actual values used.
I didn't note that here. But I did find that in my original investigation https://www.reddit.com/r/learnrust/comments/11sgsqt/detect_if_clear_env_has_been_called_on_a_command/. It's good to note that capability here for future developers who stumble on this issue. For my main use case: I need to be able to programmatically determine what PATH the command was run with. I don't want to have to resort to parsing the debug output. I worry that the output format is fragile, and I don't think we want to write documentation that stabilizes the debug output format. |
I wasn't suggesting and wouldn't encourage that. I just noted it since one of the stated motivations was debugging which can be covered by that.
I was talking about the environment of the child process.
Well, looking at the environment that a child process was spawned with is less susceptible to this than trying to reassemble that state from different parts within the parent. What I'm saying is that if what you want to debug a child process then maybe it would be better to actually inspect the child process, not the |
&Command
Command
captured env var logic
Thanks a ton for the feedback. I misunderstood your original comment. Special thanks to this comment:
I realized I wasn't giving this context in the issue: Rust implements So, while I wrote the issue stating I've got one issue, I'm actually talking about two problems:
Even if I could observe the Based on our conversation, I went back and updated my issue (added the bit about debug formatting, mentioned the TOCTOU issue is out of scope, added a second problem statement, added the Reddit link, updated some wording, and updated the title). I still think reading from
Ultimately we don't need to depend on the OS since Rust is already owning the logic of assembling the different env parts and passing them to the child. |
Yeah, that seems like a simple thing we can do. Though note that technically it's not quite sufficient since a Accepting a sort of
The most reliable way to debug those is to trace the exec syscalls of the forked child. Everything else is only a proxy measure.
A zombie will stay around until you call |
This comment was marked as resolved.
This comment was marked as resolved.
man pages disagree
|
I think I got most of the way there with my linked PR, however tests were failing on windows CI and now there’s an issue rebasing that branch with HEAD. If someone wants to take a stab at it, I would be grateful 🙏. Even if there are edge cases, I would still like to be able to read the config value that was written. |
Proposal
Context
The
Command
struct allows developers to execute system commands (such asbundle install
through a Rust interface). It includes modifying (set) and reading (get): env, current working directory, args, and program name.When you execute a command, it will resolve both "explicit" environment variables (set via
env
,envs
,env_remove
) and implicit "inherited" environment variables (modifiable byenv_clear
) via this logic https://github.com/rust-lang/rust/blob/3a5c8e91f094bb1cb1346651fe3512f0b603d826/library/std/src/sys_common/process.rs#L36-L51 and then pass the result to the operating system. I will refer to this code as environment variable "capture" logic.There are two problems with this process, detailed in the next section.
When a
Command
runs, the environment variables it uses are affected by:Command::envs
andCommand::env
)Command::remove_env
)Command::env_clear
)To reliably determine the exact environment variables a child process used when it ran, we must have those four pieces of information. There is a public API to determine the explicit mappings
Command::get_envs
. This API also tells us what values were explicitly removed viaCommand::remove_env
. However, it does not include information about inherited environment variables. From the docs:That means we've got four possible states an environment variable can be in:
Without knowing if
env_clear
was called, states A & B and B & D are indistinguishable. This lack of information prevents us from determining the exact environment variables aCommand
used when it ran.Problem statement(s)
Command::env_clear
Problem 1 is a capability problem (I can't work around this). Problem 2 is a usability and stability problem.
Problem 1: Cannot observe effects of
Command::env_clear
As documented in my merged PR, calling
env_clear
will cause the "capture" logic to skip inheriting from the parent process. While a developer can set this value, they cannot programmatically observe if it has been set (though they can manually observe it via "alternative" debug formatting{:#?}
).The end goals of programmatically observing that effect is listed below in "Motivation".
This problem prevents me from reproducing the capture logic in a library where I cannot observe if
env_clear
has been called. Even if this problem is solved, a related problem is detailed below.Problem 2: The "capture" logic has no exposed single source of truth
Even if we could directly observe the effects of calling
env_clear
, every developer who needed this information would need to reproduce this "capture" logic https://github.com/rust-lang/rust/blob/3a5c8e91f094bb1cb1346651fe3512f0b603d826/library/std/src/sys_common/process.rs#L36-L51 and hope that it does not change or that their implementation does not diverge.If the outcome of this logic is exposed, then it will:
Out of scope
Like other system resources (such as files on disk), environment variables may be modified between the time of check and use. This TOCTOU problem is out of the scope of my motivation for opening this issue. If you have that problem, please open a separate issue.
Motivation, use-cases
I want to use a
Command
struct as the single point of truth for what will run in the future or what already ran, including environment variables.Motivation: Context
I maintain the Ruby buildpack, written in Rust. The job of a buildpack is to take in a user's application, perform work, and then output an OCI image. A core part of this work is running system commands. Such as
bundle install
andrake assets:precompile
. Other buildpacks target other ecosystems: nodejs, python, PHP, go, and Java.As part of providing this experience, I need to:
To do that with Rust, I want to:
I cannot accomplish these while using
& mut Command
as a single point of truth today. See below for more details.Motivation: Programmatically displaying a command's environment variables before it executes for an end-user.
Commands like
rake assets:precompile
depend on environment variables for configuration (likeRAILS_ENV=production
). It's important that I can display the command before it executes because:To that end, I want my user to see an output in their build log like:
While I can provide this type of output for my users today, I cannot generalize it into a library for other buildpacks while only consuming
&mut Command
without introducing another point of truth.Here's an example function:
You could call it like:
Motivation: Programmatically adding system information after a failed command execution for an end-user (e.g. which_problem)
I want to be able to debug
PATH
issues on aCommand
. The developer may accidentally callenv_clear
and fail to realize that it also removesPATH
, or they might think they added something to thePATH
but had a typo.To help debug those cases, I wrote a library that performs a
which
lookup and adds debugging information which_problem crate. I want to use it to debug when a command has failed.Something like this:
However, the code must assume something about the state of
Command
as it cannot be directly observed. Giving the wrong path information to the person debugging will actively harm the debugging experience.Solution sketches
A quick note on naming things: I've stubbed in names for methods, but I'm not tied to any of them.
Internally the combination of explicit and implicit env vars are retrieved via a
capture
orcaptured
method. These are described as the env vars that will be captured (or were previously) by the command.I've also considered the following:
canonical
(like used for paths).resolved
finalized
They're used interchangeably below.
[sketch] Expose captured env logic via a new
captured_envs
method onCommand
We could add a method that exposes that information directly:
This sketch would solve problems 1 and 2. I'm open to naming suggestions.
[sketch] Expose captured env logic via a pure function that takes
Command
andVarOs
or an iteratorFor example:
The benefit of this approach is that it's now a pure function and does not rely on
env::vars_os()
call to the current environment. For this to work, we would also have to exposeCommand.is_env_clear
internal state.This sketch would solve problems 1 and 2. I'm open to naming suggestions.
[sketch] Expose the clear boolean state
If I had a flag on whether or not clear had been called on the
Command
struct, I would have enough information to resolve environment variables manually. Something like:Internally the state is preserved on
CommandEnvs
, so another option could be to mirror the internal state and expose it on the iterator:This sketch would solve problem 1 but not 2. The end programmer must still reimplement the environment variable capture logic, which may diverge. I'm open to naming suggestions.
Even if we choose another solution, we may want to consider exposing this information anyway, as all other explicitly set states on
Command
can be queried except for this one.[sketch] Expose captured env logic via method chain method on
Command
As an alternative to the above
Command::captured_envs
method, we could make the information chainable:Like the above chained
command.get_envs().is_clear()
, this maps somewhat to the internal representation.Benefits: The methods of
Command
are not changed.Downsides: More OOP than functional style. It removes the ability to use
Command::captured_envs
in a chain, like callingmap
on an option.This sketch solves problems 1 and 2. I'm open to naming suggestions.
[sketch] Something else
Any ideas?
[sketch] Deprecate
get_envs
and introduce a new interface.Deprecate
get_envs
and introduce other APIs. Motivation is provided below. An example could be:While this idea is a stretch, I wanted to document that it was possible (in brainstorming, quantity leads to quality) and give some additional context on the existing interface.
When I first saw
get_envs
, I assumed it was referring to getting the state that would be set when it was run rather than the explicit state that must be resolved.I was not paying close attention when I first encountered
get_envs
. I guessed what I thought it meant (and was REALLY wrong).In the case of a
None
value, it doesn't mean:It means:
That behavior is the opposite of what I assumed.
While it's ultimately my mistake to own, I want to explore how we could improve that experience and provide a new interface for the captured/canonical/resolved envs.
I introduced a documentation PR and added some wording in the header, specifically "explicitly," which I hope will catch people's attention. rust-lang/rust#109272, though there's still behavior inconsistency. Of the
get
methods, these have no ambiguity:While these have some:
I've talked at length about
get_envs
. The other,get_current_dir
, returns eitherSome
if an explicit value was set orNone
if it wasn't. I'm not personally surprised by what happens when aNone
is seen, but I'm not all users. It is confusing thatNone
forget_current_dir
andNone
forget_envs
means opposite things.Links and related work
&mut Command
.Command
(written when I knew much less about Rust, so the design and implementation aren't the greatest) https://github.com/heroku/buildpacks-ruby/blob/a63337c764a816122e7ccaea50005cfa85482e90/commons/src/env_command.rsCommand
issue: A mix of structs to hold state transitions but relying on&mut Command
to hold canonical data: heroku/buildpacks-ruby@cad093a (this commit message may also help you to understand my design goals and needs when working withCommand
)What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
The text was updated successfully, but these errors were encountered: