Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rankfile per prun #720

Closed
jjhursey opened this issue Jan 11, 2021 · 2 comments · Fixed by #750
Closed

Rankfile per prun #720

jjhursey opened this issue Jan 11, 2021 · 2 comments · Fixed by #750

Comments

@jjhursey
Copy link
Member

From the mailing list here

The --mca prte_rankfile arf.txt option seems to apply to the DVM and not per-job.

So prterun --mca prte_rankfile arf.txt ./a.out will work, but not prun --mca prte_rankfile arf.txt ./a.out.

@acolinisi
Copy link
Contributor

Thank you for implementing, it works for a use-case with a single prun invocation running concurrently in one DVM, but not the use-case with multiple concurrent prun invocations where you want to run multiple independent jobs in one DVM without sharing any nodes.

I think the latter use case needs support for +e relative node specifier. With +n, separate jobs end up allocated onto different slots on the same nodes.

Since jobs are independent it doesn't make sense to require constructing a set of rankfiles (one rankfile per job) that are aware of each other, i.e. one set of rankfiles per one particular set of jobs.

Desired:

cat arankfile 
rank 0=+e0 slot=0
rank 1=+e1 slot=0
# ^ one rankfile, re-used for each job, by means of relative node specs

prte --daemonize

prun -n 2  --map-by rankfile:FILE=arankfile:NOLOCAL bash hostname-sleep.sh 10 &
b07n13
b07n14

prun -n 2  --map-by rankfile:FILE=arankfile:NOLOCAL bash hostname-sleep.sh 10 &
b07n15
b07n16

pterm

Actual:
The above syntax is rejected (only +nX is supported).
With the following rankfile, both jobs are allocated onto the same two nodes (as expected):

rank 0=+n0 slot=0
rank 1=+n1 slot=0

Open new issue or am I missing something?

@rhc54
Copy link
Contributor

rhc54 commented Mar 5, 2021

Hmmm...yeah, that would be something we haven't supported before, so best to open a new issue. I don't see any reason why we couldn't do it - just not something that was previously requested 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants