-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support host count in machinefile #7616
Conversation
- Support the syntax `[user@]host[:port][*count] [bind_addr]` in the machinefile. - Initial code just re-writes the list fed to addprocs() by repeating the host as specified, so there is no user-visible change.
This is good but I feel like having the count in front might be clearer. @JeffBezanson, any opinions? |
As in
no strong preference. |
Yeah, I prefer that. There's too much trailing junk in this format as it is and the number of processes seems like kind of an important number :-) |
Agreed. I'll revise the syntax. |
6c7c7e3
to
1a4c02f
Compare
Is there anything preventing this PR from being merged? Can someone who uses the machinefile stuff more than I do take a look? @JeffBezanson? @amitmurthy? |
The usual format (used by several MPI implementations) is "hostname count", i.e. host name first. |
On 10/30/2014 04:08 PM, Erik Schnetter wrote:
This was in the initial proposal. I'm agnostic to the actual format. If there a machinefile spec that we can follow? |
See e.g. https://www.myricom.com/software/mpi-implementations-with-mx/637-how-do-i-create-a-machine-file-for-spawning-an-mpich-mx-job.html for a brief tutorial that includes an example for this format. Open MPI is a modern MPI implementation; the relevant part of its documentation is here http://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php#sect6. This extends the syntax above (which is probably still accepted for backward compatibility) to be more generic, allowing for additional information as well. The syntax is then "HOSTNAME slots=COUNT". |
On 10/30/2014 04:13 PM, Erik Schnetter wrote:
I could work on reimplementing the parser if that's considered acceptable. We would probably need to support user=/bind_addr= keywords, because I This would break the existing machine file, but on the other hand at |
If you use ssh to access the hosts, then using ssh's syntax may be convenient: USER@HOSTNAME:PORT instead of just HOSTNAME. On the other hand, explicit keywords for user, port, bind_address would also be good. In this case I'd use the same spelling as ssh (i.e. bind_address instead of bind_addr). |
Since the hostfile is meant for accessing hosts via ssh, I think the current implementation in this PR, i.e. |
AFAIK, MPICH uses the "user=" keyword when using rsh/ssh. |
On 10/30/2014 04:30 PM, Amit Murthy wrote:
Having to deal with 3/4 different hostfiles for each tool I definitely support the idea of using an existing file format for this. |
Stefan / @JeffBezanson should take a decision and close this out. My preference is still for |
Me too — we can have a translation or compatibility mode for other formats, which are IMO less nice – and since there's more than one of them there's no way to make everyone happy. |
I'm doing some work on a branch and I'd rather not futz with it – would you mind resolving the conflict here and merging this? |
OK. Will do. |
Thank you :-) |
Thanks @amitmurthy and @wavexx – sorry that took so long to merge. |
This is the initial support for an improved machinefile, as described in #7589.
[user@]host[:port][*count] [bind_addr]
in themachinefile.
as specified, so there is no user-visible change.
Includes basic documentation.