Description
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
v3.1.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
built from source
Please describe the system on which you are running
- Operating system/version: CentOS 7
- Computer hardware: Intel CPUs
- Network type: TCP
Details of the problem
I have a TCP network of two nodes with different network interfaces. The output of ip addr
is as follows:
node1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 18:66:da:2e:43:4f brd ff:ff:ff:ff:ff:ff
inet 146.122.240.139/23 brd 146.122.241.255 scope global dynamic eth0
valid_lft 5066sec preferred_lft 5066sec
inet6 fe80::1a66:daff:fe2e:434f/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:5c:0f:85:a0 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
node2
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 18:66:da:2e:43:ae brd ff:ff:ff:ff:ff:ff
inet 146.122.240.138/23 brd 146.122.241.255 scope global dynamic eth0
valid_lft 3541sec preferred_lft 3541sec
inet6 fe80::1a66:daff:fe2e:43ae/64 scope link
valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:1e:69:de brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:1e:69:de brd ff:ff:ff:ff:ff:ff
Running a simple MPI application (Init + Allreduce + Finalize) with mpirun -np 2 -H node1,node2 -mca orte_base_help_aggregate 0 ./a.out
hangs for a while and eventually fails with
--------------------------------------------------------------------------
WARNING: Open MPI failed to TCP connect to a peer MPI process. This
should not happen.
Your Open MPI job may now fail.
Local host: node1
PID: 15830
Message: connect() to 192.168.122.1:1040 failed
Error: Operation now in progress (115)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Open MPI failed to TCP connect to a peer MPI process. This
should not happen.
Your Open MPI job may now fail.
Local host: node2
PID: 25833
Message: connect() to 172.17.0.1:1040 failed
Error: Operation now in progress (115)
--------------------------------------------------------------------------
It seems like there is a connection problem between the virbr0
and docker0
interfaces. I have seen that Open MPI ignores all vir*
interfaces, but that's only the case in oob/tcp
and not in btl/tcp
, right?
Adding -mca btl_tcp_if_include eth0
to the command line causes the program to finish successfully. The same can be achieved with -mca btl_tcp_if_exclude virbr0,docker0,lo
.
However, as this is not very user-friendly (requires knowledge about available network interfaces, etc.) and we don't know about potential network configurations we come across in the future (hence, we don't want to hard-code this in openmpi-mca-params.conf
or the like), we are wondering: Is there any chance to have this case handled by Open MPI transparently?
Thanks,
Moritz