Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPMD Mode: Environment propagation per app context #607

Closed
jjhursey opened this issue Jun 19, 2020 · 4 comments
Closed

MPMD Mode: Environment propagation per app context #607

jjhursey opened this issue Jun 19, 2020 · 4 comments
Milestone

Comments

@jjhursey
Copy link
Member

It would be useful if there was a per-app-context version of -x that allowed each app-context in an MPMD launch to set its own version of an environment variable.

For example, the CUDA_VISIBLE_DEVICES environment variable is used to restrict the set of GPUs visible to a process. An application may want different processes in the same namespace assigned to different GPUs. Currently, users need to create a custom wrapper script to set this environment variable after launch. But it would be nice if we had a version of -x that allowed them to set it on the command line.

For example (here using mpirun, but we can translate it to prun in relation with Issue #605)

# current behavior (same behavior with -x and -mca mca_base_env_list)
mpirun -np 1 -x CUDA_TEST=1 ./test-env.sh  : -np 1 -x CUDA_TEST=2 ./test-env.sh 
0 [node1]:CUDA_TEST=1
1 [node1]:CUDA_TEST=1
# desired behavior
mpirun -np 1 --gx CUDA_TEST=1 ./test-env.sh  : -np 1 --gx CUDA_TEST=2 ./test-env.sh 
0 [node1]:CUDA_TEST=1
1 [node1]:CUDA_TEST=2
@rhc54
Copy link
Contributor

rhc54 commented Jun 19, 2020

Should be trivial to do. However, we normally do this the other way around - our convention is that the "g" prefix means "global" and applies to all app-contexts. I see no issue making that change given that we haven't released PRRTE v2 yet - it's a good time to do it.

@jjhursey
Copy link
Member Author

Yeah. We'd need to think of a good name for the option (since -x already has wide usage). I figured it would be relatively easy, and might be able to be knocked out with the other -x option open issue.

@rhc54
Copy link
Contributor

rhc54 commented Jun 19, 2020

IIRC, our convention is to apply anything given to the first app-context to all app-contexts unless directed otherwise (i.e., by including it again for the second app-context). This is how we treat the --hostfile and --host options, and I think it applies in general. So the issue may really be: how do I indicate this only applies for the first app-context?

In other words, it sounds to me like the current implementation of -x is incorrect. It should apply to all app-contexts if it is given in the first one, but it should be overridden for later app-contexts if it is also given there. So this might be nothing more than fixing a bug as opposed to adding something new.

@jjhursey jjhursey added this to the v2.0.0 milestone Mar 25, 2021
@jjhursey jjhursey added bug and removed enhancement labels Mar 25, 2021
@jjhursey jjhursey assigned jjhursey and unassigned jjhursey Mar 25, 2021
@jjhursey jjhursey added enhancement and removed bug labels Mar 25, 2021
@jjhursey jjhursey modified the milestones: v2.0.0, Future Mar 25, 2021
@jjhursey
Copy link
Member Author

I tested again today (since we made fixes to the -x option) and Ralph's comment is correct. The -x option is global, and can be overridden if passed multiple times. I tested the current PRRTE master:

shell$ cat test-env.sh 
#!/bin/bash

if [ -z $OMPI_COMM_WORLD_RANK ] ; then
    if [ -z $PMIX_RANK ] ; then
        rank="-1"
    else
        rank=$PMIX_RANK
    fi
else
    rank=$OMPI_COMM_WORLD_RANK
fi
thost=`hostname`

value=`env | grep "^CUDA_VISIBLE_DEVICES"`

printf "%s [%s]:%s\n" $rank $thost $value

exec $@

exit 0
shell$ prterun -np 2 -x CUDA_VISIBLE_DEVICES="0,1" ./test-env.sh : -np 2 -x CUDA_VISIBLE_DEVICES="2,3" ./test-env.sh 
0 [c712f6n01]:CUDA_VISIBLE_DEVICES=2,3
1 [c712f6n01]:CUDA_VISIBLE_DEVICES=2,3
2 [c712f6n01]:CUDA_VISIBLE_DEVICES=2,3
3 [c712f6n01]:CUDA_VISIBLE_DEVICES=2,3

It would be nice to have a (new) per-app context version of this option so you could have different values for the same envar in different app contexts:

shell$ prterun -np 2 -ax CUDA_VISIBLE_DEVICES="0,1" ./test-env.sh : -np 2 -ax CUDA_VISIBLE_DEVICES="2,3" ./test-env.sh 
0 [c712f6n01]:CUDA_VISIBLE_DEVICES=0,1
1 [c712f6n01]:CUDA_VISIBLE_DEVICES=0,1
2 [c712f6n01]:CUDA_VISIBLE_DEVICES=2,3
3 [c712f6n01]:CUDA_VISIBLE_DEVICES=2,3

Adding a new option like -ax (or something better named) would be useful, and probably a good first issue for someone.

@rhc54 rhc54 closed this as completed May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants