Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support host count in machinefile #7616

Merged
merged 2 commits into from
Oct 31, 2014

Conversation

wavexx
Copy link
Contributor

@wavexx wavexx commented Jul 15, 2014

This is the initial support for an improved machinefile, as described in #7589.

  • Support the syntax [user@]host[:port][*count] [bind_addr] in the
    machinefile.
  • Initial code just re-writes the list fed to addprocs() by repeating the host
    as specified, so there is no user-visible change.

Includes basic documentation.

- Support the syntax `[user@]host[:port][*count] [bind_addr]` in the
  machinefile.
- Initial code just re-writes the list fed to addprocs() by repeating the host
  as specified, so there is no user-visible change.
@StefanKarpinski
Copy link
Member

This is good but I feel like having the count in front might be clearer. @JeffBezanson, any opinions?

@wavexx
Copy link
Contributor Author

wavexx commented Jul 15, 2014

As in [count*][user@]... ?

  2*host
  2*user@host

no strong preference.

@StefanKarpinski
Copy link
Member

Yeah, I prefer that. There's too much trailing junk in this format as it is and the number of processes seems like kind of an important number :-)

@wavexx
Copy link
Contributor Author

wavexx commented Jul 15, 2014

Agreed. I'll revise the syntax.

@StefanKarpinski
Copy link
Member

Is there anything preventing this PR from being merged? Can someone who uses the machinefile stuff more than I do take a look? @JeffBezanson? @amitmurthy?

@eschnett
Copy link
Contributor

The usual format (used by several MPI implementations) is "hostname count", i.e. host name first.

@wavexx
Copy link
Contributor Author

wavexx commented Oct 30, 2014

On 10/30/2014 04:08 PM, Erik Schnetter wrote:

The usual format (used by several MPI implementations) is "hostname count", i.e. host name first.

This was in the initial proposal. I'm agnostic to the actual format.

If there a machinefile spec that we can follow?

@eschnett
Copy link
Contributor

See e.g. https://www.myricom.com/software/mpi-implementations-with-mx/637-how-do-i-create-a-machine-file-for-spawning-an-mpich-mx-job.html for a brief tutorial that includes an example for this format.

Open MPI is a modern MPI implementation; the relevant part of its documentation is here http://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php#sect6. This extends the syntax above (which is probably still accepted for backward compatibility) to be more generic, allowing for additional information as well. The syntax is then "HOSTNAME slots=COUNT".

@wavexx
Copy link
Contributor Author

wavexx commented Oct 30, 2014

On 10/30/2014 04:13 PM, Erik Schnetter wrote:

See e.g. https://www.myricom.com/software/mpi-implementations-with-mx/637-how-do-i-create-a-machine-file-for-spawning-an-mpich-mx-job.html for a brief tutorial that includes an example for this format.

Open MPI is a modern MPI implementation; the relevant part of its documentation is here http://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php#sect6. This extends the syntax above (which is probably still accepted for backward compatibility) to be more generic, allowing for additional information as well. The syntax is then "HOSTNAME slots=COUNT".

I could work on reimplementing the parser if that's considered acceptable.

We would probably need to support user=/bind_addr= keywords, because I
don't see any explicit mention of username for login access. I also
don't see a list of standard keywords beyond slots.

This would break the existing machine file, but on the other hand at
least it would be extensible without breaking at each update.

@eschnett
Copy link
Contributor

If you use ssh to access the hosts, then using ssh's syntax may be convenient: USER@HOSTNAME:PORT instead of just HOSTNAME. On the other hand, explicit keywords for user, port, bind_address would also be good. In this case I'd use the same spelling as ssh (i.e. bind_address instead of bind_addr).

@amitmurthy
Copy link
Contributor

Since the hostfile is meant for accessing hosts via ssh, I think the current implementation in this PR, i.e. count*user@host:port works.

@wavexx
Copy link
Contributor Author

wavexx commented Oct 30, 2014

AFAIK, MPICH uses the "user=" keyword when using rsh/ssh.
In this case, it would be better to stick to the same format, so that we can reuse the same hostfile.

@wavexx
Copy link
Contributor Author

wavexx commented Oct 30, 2014

On 10/30/2014 04:30 PM, Amit Murthy wrote:

Since the hostfile is meant for accessing hosts via ssh, I think the
current implementation in this PR, i.e. count*user@host:port
works.

Having to deal with 3/4 different hostfiles for each tool
(dsh/mpi/julia) is a bit annoying.

I definitely support the idea of using an existing file format for this.
MPICH/OpenMPI formats seem reasonable and widely used.

@amitmurthy
Copy link
Contributor

Stefan / @JeffBezanson should take a decision and close this out. My preference is still for count*user@host:port, purely from a compactness point of view.

@StefanKarpinski
Copy link
Member

My preference is still for count*user@host:port, purely from a compactness point of view.

Me too — we can have a translation or compatibility mode for other formats, which are IMO less nice – and since there's more than one of them there's no way to make everyone happy.

@StefanKarpinski
Copy link
Member

I'm doing some work on a branch and I'd rather not futz with it – would you mind resolving the conflict here and merging this?

@amitmurthy
Copy link
Contributor

OK. Will do.

@StefanKarpinski
Copy link
Member

Thank you :-)

@amitmurthy amitmurthy merged commit 6be66d7 into JuliaLang:master Oct 31, 2014
@StefanKarpinski
Copy link
Member

Thanks @amitmurthy and @wavexx – sorry that took so long to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants