Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support empty relative node specifier (+e) in rankfiles #807

Closed
acolinisi opened this issue Mar 5, 2021 · 1 comment
Closed

Support empty relative node specifier (+e) in rankfiles #807

acolinisi opened this issue Mar 5, 2021 · 1 comment
Milestone

Comments

@acolinisi
Copy link
Contributor

This is a feature request to add support for +e (empty relative node) to rankfiles. PR #720 (Rankfile per prun) does not cover the use-case with multiple concurrent prun invocations where you want to run multiple independent jobs in one DVM without sharing any nodes.

I think the latter use case needs support for +e relative node specifier. With +n, separate jobs end up allocated onto different slots on the same nodes.

Since jobs are independent it doesn't make sense to require constructing a set of rankfiles (one rankfile per job) that are aware of each other, i.e. one set of rankfiles per one particular set of jobs.

Desired:

cat arankfile 
rank 0=+e0 slot=0
rank 1=+e1 slot=0
# ^ one rankfile, re-used for each job, by means of relative node specs

prte --daemonize

prun -n 2  --map-by rankfile:FILE=arankfile:NOLOCAL bash hostname-sleep.sh 10 &
b07n13
b07n14

prun -n 2  --map-by rankfile:FILE=arankfile:NOLOCAL bash hostname-sleep.sh 10 &
b07n15
b07n16

pterm

Actual:
The above syntax is rejected (only +nX is supported).
With the following rankfile, both jobs are allocated onto the same two nodes (as expected):

rank 0=+n0 slot=0
rank 1=+n1 slot=0
@rhc54
Copy link
Contributor

rhc54 commented Mar 5, 2021

@acolinisi It will probably be awhile before I can get to this - you are pretty savvy, so perhaps you might want to take a crack at it? The rankfile code is in the src/mca/rmaps/rank_file directory. The file itself gets parsed using flex, and the lexical directives are in rmaps_rank_file_lex.l. Currently, it only picks up the +n as a "relative node syntax" directive, so you'd need to add +e to that one.

You then need to extend the code in rmaps_rank_file.c starting at line 271 to account for the +e option. You can see how +n was handled, so it shouldn't be too difficult (I think).

@jjhursey jjhursey added this to the Future milestone Mar 25, 2021
@rhc54 rhc54 closed this as completed May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants