Skip to content

TCP BTL fails in the presence of virbr0/docker0 interfaces #6377

Closed
@mkre

Description

@mkre

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v3.1.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

built from source

Please describe the system on which you are running

  • Operating system/version: CentOS 7
  • Computer hardware: Intel CPUs
  • Network type: TCP

Details of the problem

I have a TCP network of two nodes with different network interfaces. The output of ip addr is as follows:
node1

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 18:66:da:2e:43:4f brd ff:ff:ff:ff:ff:ff
    inet 146.122.240.139/23 brd 146.122.241.255 scope global dynamic eth0
       valid_lft 5066sec preferred_lft 5066sec
    inet6 fe80::1a66:daff:fe2e:434f/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 02:42:5c:0f:85:a0 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever

node2

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 18:66:da:2e:43:ae brd ff:ff:ff:ff:ff:ff
    inet 146.122.240.138/23 brd 146.122.241.255 scope global dynamic eth0
       valid_lft 3541sec preferred_lft 3541sec
    inet6 fe80::1a66:daff:fe2e:43ae/64 scope link 
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:1e:69:de brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:1e:69:de brd ff:ff:ff:ff:ff:ff

Running a simple MPI application (Init + Allreduce + Finalize) with mpirun -np 2 -H node1,node2 -mca orte_base_help_aggregate 0 ./a.out hangs for a while and eventually fails with

--------------------------------------------------------------------------
WARNING: Open MPI failed to TCP connect to a peer MPI process.  This
should not happen.

Your Open MPI job may now fail.

  Local host: node1
  PID:        15830
  Message:    connect() to 192.168.122.1:1040 failed
  Error:      Operation now in progress (115)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Open MPI failed to TCP connect to a peer MPI process.  This
should not happen.

Your Open MPI job may now fail.

  Local host: node2
  PID:        25833
  Message:    connect() to 172.17.0.1:1040 failed
  Error:      Operation now in progress (115)
--------------------------------------------------------------------------

It seems like there is a connection problem between the virbr0 and docker0 interfaces. I have seen that Open MPI ignores all vir* interfaces, but that's only the case in oob/tcp and not in btl/tcp, right?
Adding -mca btl_tcp_if_include eth0 to the command line causes the program to finish successfully. The same can be achieved with -mca btl_tcp_if_exclude virbr0,docker0,lo.

However, as this is not very user-friendly (requires knowledge about available network interfaces, etc.) and we don't know about potential network configurations we come across in the future (hence, we don't want to hard-code this in openmpi-mca-params.conf or the like), we are wondering: Is there any chance to have this case handled by Open MPI transparently?

Thanks,
Moritz

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions