Show MPI connectivity map during MPI_INIT

It has long been discussed, and I swear there was a ticket about this
at some point but I can't find it now.  So I'm filing a new one --
close this as a dupe if someone can find an older one.

---

OMPI currently uses a negative ACK system to indicate if high-speed
networks are not used for MPI communications.  For example, if you
have the openib BTL available but it can't find any active ports in a
given MPI process, it'll display a warning message.

But some users want a ''positive'' acknowledgement of what networks
are being used for MPI communications (this can also help with
regression testing, per a thread on the MTT mailing list).  HP MPI
offers this feature, for example.  It would be nice to have a simple
MCA parameter that will cause MCW rank 0 to output a connectivity map
during MPI_INIT.

Complications:
- In some cases, OMPI doesn't know which networks will be used for
  communications with each MPI process peer; we only know which ones
  we'll try to use when connections are actually established (per
  OMPI's lazy connection model for the OB1 PML).  But I think that
  even outputting this information will be useful.
- Connectivity between MPI processes are likely to be non-uniform.
  E.g., MCW rank 0 may use the sm btl to communicate with some MPI
  processes, but a different btl to communicate with others.  This is
  almost certainly a different view than other processes have.  The
  connectivity information needs to be conveyed on a process-pair
  basis (e.g., a 2D chart).
- Since we have to span multiple PMLs, this may require an addition
  to the PML API.

A first cut could display a simple 2D chart of how OMPI thinks it may
send MPI traffic from each process to each process.  Perhaps something
like (OB1 6 process job, 2 processes on each of 3 hosts):

```
MCW rank 0     1     2     3     4     5
0        self  sm    tcp   tcp   tcp   tcp
1        sm    self  tcp   tcp   tcp   tcp
2        tcp   tcp   self  sm    tcp   tcp
3        tcp   tcp   sm    self  tcp   tcp
4        tcp   tcp   tcp   tcp   self  sm
5        tcp   tcp   tcp   tcp   sm    self
```

Note that the upper and lower triangular portions of the map are the
same, but it's probably more human-readable if both are output.
However, multiple built-in output formats could be useful, such as:
- Human readable, full map (see above)
- Human readable, abbreviated (see below for some ideas on this)
- Machine parsable, full map
- Machine parsable, abbreviated

It may also be worthwhile to investigate a few huersitics to compress
the graph where possible.  Some random ideas in this direction:
- The above example could be represented as:

```
MPI connectivty map, listed by process:
X->X: self
X<->X+1, X in {0,2,4}: sm
other: tcp
```
- Another example:

```
MPI connectivty map, listed by process:
X->X: self
other: tcp
```
- Another example:

```
MPI connectivty map, listed by process:
all: CM PML, MX MTL
```
- Perhaps something could be done with "exceptions" -- e.g., where
  the openib BTL is being used for inter-node connectivity ''except''
  for one node (where IB is malfunctioning, and OMPI fell back to
  TCP) -- this is a common case that users/sysadmins want to detect.

Another useful concept might be to show some information about each
endpoint in the connectivity map.  E.g., show a list of TCP endpoints
on each process, by interface name and/or IP address.  Similar for
other transports.  This kind of information can show when/if
multi-rail scenarios are active, etc.  For example:

```
MCW rank 0     1     2     3     4     5
0        self      sm        tcp:eth0  tcp:eth0  tcp:eth0  tcp:eth0
1        sm        self      tcp:eth0  tcp:eth0  tcp:eth0  tcp:eth0
2        tcp:eth0  tcp:eth0  self      sm        tcp:eth0  tcp:eth0
3        tcp:eth0  tcp:eth0  sm        self      tcp:eth0  tcp:eth0
4        tcp:eth0  tcp:eth0  tcp:eth0  tcp:eth0  self      sm
5        tcp:eth0  tcp:eth0  tcp:eth0  tcp:eth0  sm        self
```

With more information such as interface names, compression of the
output becomes much more important, such as:

```
MPI connectivty map, listed by process:
X->X: self
X<->X+1, X in {0,2,4}: sm
other: tcp:eth0,eth1
```

Note that these ideas can certainly be implemented in stages; there's
no need to do everything at once.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Show MPI connectivity map during MPI_INIT #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Show MPI connectivity map during MPI_INIT #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions