-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flux-hostlist command and initialize resource.hosts in rc1 #1499
Conversation
6c55f7e
to
24448f1
Compare
Codecov Report
@@ Coverage Diff @@
## master #1499 +/- ##
=========================================
- Coverage 78.76% 78.67% -0.1%
=========================================
Files 164 164
Lines 30673 30673
=========================================
- Hits 24160 24131 -29
- Misses 6513 6542 +29
|
Here are references for openpmi and mpich/hydra host file formats: https://www.open-mpi.org/faq/?category=running#mpirun-hostfile https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager would it make sense to try to emit in one of these formats, selectable by option? |
Yeah, that's a question for @trws (though I will note that the default The |
Sorry, I don't think my comment above is clear. The open question in my mind is whether it is useful/necessary to add an option to generate hostfiles that contain extra information supported by the hostfile formats @garlick mentioned above (e.g. "slots" "max-slots" for OpenMPI).
It occurred to me that this information could be provided from |
Sorry for the delay responding - if this works for @trws I'm certainly good! Just throwing out ideas. |
It might be handy to emit things with slots or replicate a hostname once per slot under some circumstances, but even just having a way to get one hostname per line is enough to get the job done in the majority of cases. |
This could be done now for sub-instances by propagating the job Eventually of course @garlick also had the idea to emit rank list instead of host list, which could then be passed to |
That could certainly be handy, in fact I would definitely have used that
had it existed a couple of weeks ago.
…On 2 May 2018, at 20:47, Mark Grondona wrote:
> It might be handy to emit things with slots or replicate a hostname
once per slot under some circumstances,
This could be done now for sub-instances by propagating the job
`R_lite` to a well known key in the child kvs. The slots could be
assumed to be one per core of the `R_lite` `core` list for each rank.
Eventually of course `R` would certainly be available in an
instances...
@garlick also had the idea to emit rank list instead of host list,
which could then be passed to `flux-exec`, e.g. `flux exec -r $(flux
hostlist --ranks 123) COMMAND`
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1499 (comment)
|
Ok, the |
Move unsetenv() of FLUX_JOB_ID, FLUX_JOB_SIZE, and FLUX_JOB_NNODES until after runlevel initialization so that these variables are available to rc1 scripts.
Problem: wreckrun generates fake 'R_lite' for testing purposes when run in an instance with no sched module loaded, however because of Lua array indexing, rank 0 is always placed in the last position of the 'R_lite' array. This doesn't cause a problem, but may cause confusion for tools or testing workloads examining 'R_lite'. Fix by indexing the assigned resource array in wreckrun from 1 origin, and calculate rank as index - 1.
Add new flux-hostlist command to print list of hosts for the current instance or optionally a set of wreck job(s).
Add a new rc1 script to populate resource.hosts with a list of hosts (in hostlist format for now), 1 per rank, initialized either via the job entry of an enclosing instance if FLUX_JOB_ID is set, or via flux-hostlist fallback to `flux exec hostname` when not running in a sub-instance.
Add a small suite of tests to check flux-hostlist and correspoding rc1 script functionality.
Add hostlist HOSTLIST and hostnames to valid dictionary words for the spellcheck tests.
FWIW, rebased and added support for |
Shall we go ahead and merge this so we can move on, and save any future enhancements for another time? |
I'm not sure what enhancements, if any, are possible or required so we can close #1489. I'm willing to close and rework this PR if the wrong approach was taken. |
Sounds good! |
Thanks! |
Works for me, sorry for the long reply latency and thanks for getting this together! |
This is somewhat experimental, but I'm posting for feedback on the general appraoch. This PR adds a simple
flux hostlist
command that prints the list of hostnames, one per line in rank order, for the instance.If an optional
jobid
is provided, it will print the list of hostnames for that job (again one per node, not per task).The script will look for
resource.hosts
and, if set, will assume this is the list of hostnames for the instance (one per rank).resource.hosts
is kept in "hostlist" format.If
resource.hosts
is not set,flux hostlist
will fall back toflux exec hostname
For jobs,
flux hostlist
will attempt to read the hostnames directly out ofR_lite
if thenode
field exists, otherwise it will just lookup the hostname fromresource.hosts
by rank.A new
rc1
script is added to set the initialresource.hosts
viaFLUX_URI=$(parent_uri) flux hostlist ${FLUX_JOB_ID}
. In order for this to be possible, theunsetenv ("FLUX_JOB_ID")
done in the broker had to be moved until after runlevel initialization was complete. (@garlick, need your ok on that one)Fixes #1489