Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-mini: controlling environment variables included #3141

Closed
SteVwonder opened this issue Aug 13, 2020 · 16 comments
Closed

flux-mini: controlling environment variables included #3141

SteVwonder opened this issue Aug 13, 2020 · 16 comments

Comments

@SteVwonder
Copy link
Member

AFAICT, we don't have a way to control which environment variables get captured by flux mini into the jobspec and passed onto the job.

Slurm uses the --export=<[ALL,]environment variables|ALL|NONE> argument. Do we want to adopt something similar? Any ideas for a better interface?

@grondo
Copy link
Contributor

grondo commented Aug 14, 2020

It feels like we could do better.

@trws added something like the following to flux-run

It would be nice to support functionality to the --export=NONE in jobspec srun. Even if it's named differently or has different semantics.

Agreed. I added something, let me know what you think. This one works like this:

create initial dictionary either if nothing is specified about environment or --env-all is specified, initialize to current env initialize to an empty dict
add environment entries from the command-line, with repeatable --env=key[=val] or -e key[=val] arguments
Also considering having an --env-file argument that would be between step 1 and step 2, but did this for now.

This seems preferable to Slurm's approach. For sophisticated users, perhaps an option could be used to supply a Python function that could act as a filter or would be called in place of the default method to fetch the environment?

One question though: what is the use case for no environment (i.e. --export=NONE)? Can anything run with an empty environment?

@SteVwonder
Copy link
Member Author

This seems preferable to Slurm's approach.

Yeah. I like that suggestion!

One question though: what is the use case for no environment (i.e. --export=NONE)?

My personal use case was to use it in combination with --dry-run. It makes the json output much more concise and readable (especially when using Spack). Something similar can be achieved with flux mini run --dry-run ... | jq '.attributes.system.environment = {}', but --export=NONE (or similar) is more convenient. Not sure how frequently users will reach for --dry-run, but if we expect them to, I suspect they will want to optionally drop the env too.

Can anything run with an empty environment?

That's a great question. I don't know. What is the minimal set of env vars that you need? Would anyone ever try flux mini run --export=NONE --env=PATH --env=HOME ...? That's probably a less common potential use case.

According to the srun docs, it looks like --export=NONE is incompatible with specifying individual env vars, but it is "particularly important for jobs that are submitted on one cluster and execute on a different cluster (e.g. with different paths)." That could be useful when using something like flux proxy.

For sophisticated users, perhaps an option could be used to supply a Python function that could act as a filter

Maybe a good starting point is supporting regex? Strawman: flux mini run --env-filter-regex='OMPI_.*' .... If we went that route, we wouldn't need an explicit --export=NONE since --env-filter-regex='.*' would cover it.

@trws
Copy link
Member

trws commented Aug 15, 2020

One question though: what is the use case for no environment (i.e. --export=NONE)? Can anything run with an empty environment?

I actually do this, or really the moral equivalent env -i, relatively frequently when I’m trying to debug crazy compute environments at HPC centers. There tends to be a lot of crud in the environment, so I pretty frequently want to clear all of that out and start a shell or job with a script that sets only what I want. Part of the key there is setting what I want, which is easier when you can add things on the same command, and using something that can run my controlled init scripts without having to deal with system files. It’s not something I think most users would use a lot, but being able to clear all current state and have a job kinda “sealed” with only what it specifies can be useful when you want to be sure you can reproduce things.

@grondo
Copy link
Contributor

grondo commented Aug 16, 2020

Thanks @SteVwonder and @trws - that makes total sense.

I like the idea of --env-filter-regex as a starting point. For a simpler interface, a glob might work as well:

 $ flux mini run --env-filter="*" ...
 $ flux mini run --env-filter="OMPI_*" ...

I think most users would use a lot, but being able to clear all current state and have a job kinda “sealed” with only what it specifies can be useful when you want to be sure you can reproduce things.

Great point!

In Slurm we added a use-env plugin which allowed flexible control over environment via "named" configurations. (e.g. you could run a job with srun --use-env=normal ... or srun --use-env=testing to set up a "normal" vs "testing" environment.

It occurs to me that we could do something even more powerful by optionally passing the formed jobspec object to a plugin or set of plugins after it is created in flux-mini. Not only could these plugins set, unset, or modify the jobspec environment, but they could modify anything in the jobspec. Perhaps this could part of a solution for #3143.

@grondo
Copy link
Contributor

grondo commented Aug 17, 2020

It occurs to me that we could do something even more powerful by optionally passing the formed jobspec object to a plugin or set of plugins after it is created in flux-mini

Oh duh, @SteVwonder already brought this one up in #2875

@grondo
Copy link
Contributor

grondo commented Aug 18, 2020

How about this?

 --env=RULE

Where RULE is a generic environment modification RULE with syntax like:

  • -<pattern>: filter out environment variables matching pattern pattern. pattern is shell glob syntax for simplicity, unless it is prefixed with /, in which case it is a regex (with optional trailing '/')
  • VAR: Set environment variable VAR from current environment
  • VAR=VAL set env var VAL explicitly to VAL
  • VAR+=VAL prepend VAL to colon separated env var VAR (e.g. PATH)
  • VAR=+VAL append VAL to colon separated env var VAR (e.g. PATH)

As a convenience, the --env-filter=PATTERN option can still be offered, but is the same as --env=-PATTERN

e.g. to clear environment

$ flux mini run --env="-*" --dry-run hostname | jq .attributes.system.environment
{}

To only propagate the current PATH:

$ flux mini run --env="-*" --env=PATH --dry-run hostname | jq .attributes.system.environment
{
  "PATH": "/home/grondo/git/flux-core.git/src/cmd:/usr/bin:/bin"
}

To propagate PATH, appending or prepending a path element:

$ flux mini run --env="-*" --env="PATH=+/foo" --dry-run hostname | jq .attributes.system.environment
{
  "PATH": "/home/grondo/git/flux-core.git/src/cmd:/usr/bin:/bin:/foo"
}
$ flux mini run --env="-*" --env="PATH+=/foo" --dry-run hostname | jq .attributes.system.environment
{
  "PATH": "/foo:/home/grondo/git/flux-core.git/src/cmd:/usr/bin:/bin"
}

The env RULE prefix could later be expanded, for example a ^file or |program to read env vars from a file or external program.

Just throwing this approach out there. I have a working prototype.

@SteVwonder
Copy link
Member Author

I like that a lot!

The one question I have is w.r.t. ordering of the filters. Do the filters get applied in the order they are provided on the command line? If so, presumably this would produce an empty environment: flux mini run --env=PATH --env="-*" hostname? I think that is fine, I just want to make sure I have the semnatics straight in my head.

One alternate semnatic would be to sort the modifications and then apply them in the order you have above:

  • Removing with glob/regex
  • Copying from current environment
  • Explicitly setting
  • Prepending
  • Appending

@grondo
Copy link
Contributor

grondo commented Aug 19, 2020

The one question I have is w.r.t. ordering of the filters. Do the filters get applied in the order they are provided on the command line? If so, presumably this would produce an empty environment: flux mini run --env=PATH --env="-*" hostname? I think that is fine, I just want to make sure I have the semnatics straight in my head.

Yeah, the filters would be applied in the order they are specified on the command line. This not only might be slightly more predictable, but would be much easier to implement. However, as you note above this does allow you to undo something you've just done.

The alternate semantics you proposed above seem like they would work as well. I can't think of a use case they would not handle. Obviously if you added features to get environment from files and or filter or fetch environment with a program you'd have to be explicit about the order those features are processed in and keep that documented. It might be easier in the long run to just state once that "--env and --env-filter options are processed in the order they are given on the command line".

@SteVwonder
Copy link
Member Author

It might be easier in the long run to just state once that "--env and --env-filter options are processed in the order they are given on the command line".

Sounds good to me!

@trws
Copy link
Member

trws commented Aug 19, 2020

I'd probably go with in order FWIW. If there needs to be some further ordering applied, perhaps the way gcc handles order of arguments would help, where it takes each type of argument in order but does not interleave them. All -I arguments are processed in order, but apply to all files specified for example. All -L are applied before -l. So we could say "all filters, then all env" or something if we had to but just having it be in order seems simplest and probably least surprise for now.

As to the rules, I like the concept of each of those, I'm not sure I'm onboard with making it implicit based on a prefix, or perhaps especially the =+ variant. Environment variables in shells and classic utilities are relatively kind things, but the rules are a great deal looser on non-standard utilities use of them. From The Open Group:

"These strings have the form name=value; names shall not contain the character '='. For values to be portable across systems conforming to IEEE Std 1003.1-2001, the value shall be composed of characters from the portable character set (except NUL and as indicated below)."

So... yeah, this is valid:

~/build master*
127 ❯ env -i - '--meh=bah' bash -c 'env'
--meh=bah
PWD=/Users/scogland1/build
SHLVL=1
_=/usr/bin/env

As is this:

~/build master*
❯ env -i - 'meh=+bah' bash -c 'env'
meh=+bah
PWD=/Users/scogland1/build
SHLVL=1
_=/usr/bin/env

And meh+=bah parses as variable env+ being assigned to bah.

Most shells don't let you use environment variables like this, so we might get away with just saying "you shall not pass!" these environment variables through this interfaces, but at least the chance we'll get someone who wants to set an environment variable to something starting with a + seems relatively likely.

@grondo
Copy link
Contributor

grondo commented Aug 19, 2020

Good points. I will back off the append/prepend variants.
Especially since a user could do:

 --env=PATH=$PATH:/foo

Sorry got carried away there.

@trws
Copy link
Member

trws commented Aug 19, 2020

FWIW I really like the idea of having easy access to that functionality, especially since it makes it composable multiple times on the same command which you can't really do otherwise, just perhaps expressed with separate flags?

@grondo
Copy link
Contributor

grondo commented Aug 19, 2020

Yeah, the only reason I was using the admittedly naive syntax was to be able to append all rules to the same internal list, then allow a special "read from file" syntax which allowed the same "rules" to be listed in a file, e.g.:

PATH+=/foo
DEBUG=t
TERM

This can't be accomplished if you require a separate option for each environment manipulation rule.

However, there's probably a different, but more sophisticated way to handle env-files.

@grondo
Copy link
Contributor

grondo commented Aug 19, 2020

Hm, I just discovered os.path.expandvars() which could be applied to an environment file, which allows:

PATH=/foo:$PATH

🤷

For now we can leave the append/prepend off and a specific option can be added later as @trws suggests.

@grondo
Copy link
Contributor

grondo commented Aug 19, 2020

Ok, here's where I ended up with prototype for now:

      --env=RULE            Control how environment variables are exported. If
                            RULE starts with '-' apply rest of RULE as a
                            filter (see --env-filter), if '^' then read rules
                            from a file (see --env-file). Otherwise, set an
                            environment variable from the current environment
                            (--env=VAR) or set a value explicitly
                            (--env=VAR=VALUE). Rules are applied in the order
                            they are used
      --env-filter=PATTERN  Filter environment variables matching PATTERN. If
                            PATTERN starts with a '/', then it is matched as a
                            regular expression, otherwise PATTERN is a shell
                            glob expression. (multiple use OK)
      --env-file=FILE       Read a set of environment rules from FILE

Allowing "rules" like -PATTERN and ^FILE allows a user to have a file like:

$ cat envfile 
-FLUX_URI
-LS_COLORS
-DBUS*
PATH=$PATH:/foo

Then

$ flux mini run --env-file=envfile hostname

Prepend/append on the command line would have to be handled separately since they don't have a "rule" syntax in this scheme.

This was just meant as an experiment. I'm willing to go a different direction if there is a perceived need for users to set environment variables with leading ^ and - characters on the command line.

@grondo
Copy link
Contributor

grondo commented Aug 22, 2020

This should have been closed by #3150

@grondo grondo closed this as completed Aug 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants