Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP: unexpected process identifier in connect_ack #6240

Open
abouteiller opened this issue Jan 5, 2019 · 19 comments
Open

TCP: unexpected process identifier in connect_ack #6240

abouteiller opened this issue Jan 5, 2019 · 19 comments

Comments

@abouteiller
Copy link
Member

abouteiller commented Jan 5, 2019

Upon job startup, the program deadlocks with the following output. Upon quick investigation, the opal_proc name is valid, but does not match the number that came from the socket (same jobid, but different (valid) rank).

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

3a4a1f93 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #6239 from hppritcha/topic/swat_orte_shutdown.... Pritchard  2 hours ago

Please describe the system on which you are running

  • Operating system/version: CentOS7
  • Computer hardware: x86_64
  • Network type: TCP

Details of the problem

 salloc -N4 -Ccauchy /home/bouteill/ompi/master.debug/bin/mpirun -mca btl tcp,self IMB-MPI1 pingpong
salloc: Granted job allocation 245932
salloc: Waiting for resource configuration
salloc: Nodes c[00-03] are ready for job
#------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2019 Update 1, MPI-1 part
#------------------------------------------------------------
# Date                  : Fri Jan  4 18:52:06 2019
# Machine               : x86_64
# System                : Linux
# Release               : 3.10.0-514.26.1.el7.x86_64
# Version               : #1 SMP Wed Jun 28 15:10:01 CDT 2017
# MPI Version           : 3.1
# MPI Thread Environment:
[...]

# PingPong
[c01][[19494,1],8][../../../../../master/opal/mca/btl/tcp/btl_tcp_endpoint.c:630:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[19494,1],13]

The same run with options -mca btl openib,vader,self completes successfully.

@abouteiller
Copy link
Member Author

Another defective behavior observed

salloc -N2 -Ccauchy /home/bouteill/ompi/master.debug/bin/mpirun -npernode 1 -display-map   -mca btl tcp,self IMB-MPI1 pingpong
salloc: Granted job allocation 245942
salloc: Waiting for resource configuration
salloc: Nodes c[00-01] are ready for job
 Data for JOB [8161,1] offset 0 Total slots allocated 16

 ========================   JOB MAP   ========================

 Data for node: c00     Num slots: 8    Max slots: 0    Num procs: 1
        Process OMPI jobid: [8161,1] App: 0 Process rank: 0 Bound: N/A

 Data for node: c01     Num slots: 8    Max slots: 0    Num procs: 1
        Process OMPI jobid: [8161,1] App: 0 Process rank: 1 Bound: N/A

 =============================================================
[...]
# PingPong
--------------------------------------------------------------------------
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.

This attempted connection will be ignored; your MPI job may or may not
continue properly.

  Local host: c01
  PID:        22596
--------------------------------------------------------------------------

@abouteiller
Copy link
Member Author

Issue looks related to multirail and non-routable interfaces. By forcing the if_include list one can get it back to working. This is a regression compared to 3/4 weeks ago, where it would 'just run'.

salloc -N2 -Ccauchy /home/bouteill/ompi/master.debug/bin/mpirun -npernode 1 -display-map   -mca btl_tcp_if_include eth0,ib1 -mca btl tcp,self IMB-MPI1 pingpong
# NORMAL
 salloc -N2 -Ccauchy /home/bouteill/ompi/master.debug/bin/mpirun -npernode 1 -display-map   -mca btl_tcp_if_include eth0,virbr0 -mca btl tcp,self IMB-MPI1 pingpong
# FAIL
 salloc -N2 -Ccauchy /home/bouteill/ompi/master.debug/bin/mpirun -npernode 1 -display-map   -mca btl_tcp_if_include virbr0 -mca btl tcp,self IMB-MPI1 pingpong
# FAIL (as expected, but not with the correct error message).

@AnkBurov
Copy link

AnkBurov commented Feb 22, 2019

Any progress? Version 2.1.1:
mpirun --host server_name hostname~

tcp_peer_recv_connect_ack: received unexpected process identifier [[0,37974],0] from [[22164,0],0]

@Akshay-Venkatesh
Copy link
Contributor

Another defective behavior observed

salloc -N2 -Ccauchy /home/bouteill/ompi/master.debug/bin/mpirun -npernode 1 -display-map   -mca btl tcp,self IMB-MPI1 pingpong
salloc: Granted job allocation 245942
salloc: Waiting for resource configuration
salloc: Nodes c[00-01] are ready for job
 Data for JOB [8161,1] offset 0 Total slots allocated 16

 ========================   JOB MAP   ========================

 Data for node: c00     Num slots: 8    Max slots: 0    Num procs: 1
        Process OMPI jobid: [8161,1] App: 0 Process rank: 0 Bound: N/A

 Data for node: c01     Num slots: 8    Max slots: 0    Num procs: 1
        Process OMPI jobid: [8161,1] App: 0 Process rank: 1 Bound: N/A

 =============================================================
[...]
# PingPong
--------------------------------------------------------------------------
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.

This attempted connection will be ignored; your MPI job may or may not
continue properly.

  Local host: c01
  PID:        22596
--------------------------------------------------------------------------

Hi @abouteiller Has there been any progress on fixing this bug? I just checked out master and I'm still facing issues with OSU benchmarks (although the problems go away if I use pml UCX)

Do you happen to know of a workaround for non-UCX case (--mca pml ^ucx)?

cc @jsquyres

@mkre
Copy link

mkre commented Jun 25, 2019

I'm seeing an issue with BTL/tcp and Open MPI 4.0.1 when using three hosts with the following network interfaces:

  • host1: eno1, lo, virbr0
  • host2: eth0, lo, virbr0
  • host3: eno1, eth0, lo, vboxnet0, vboxnet1, virbr0

With Open MPI 3.1.3, I can make it working with -mca btl_tcp_if_exclude lo,virbr0. However, this doesn't help with Open MPI 4.0.1. Specifying an if_include list (as suggested by @abouteiller above) doesn't help because there is no common network interface available on all three hosts. I was able to have it working with Open MPI 4.0.1 by using btl_tcp_if_include 192.168.37.0/24, but that is kind of cumbersome.

Could my observed behavior be related to this issue?

@abouteiller
Copy link
Member Author

abouteiller commented Jun 25, 2019

@Akshay-Venkatesh try to select only routable interfaces with -mca btl_tcp_if_include ethX,ethY
@mkre your IP based solution looks correct.

@mkre
Copy link

mkre commented Jun 26, 2019

@abouteiller, our issue is that this is not a general workaround. We are packaging a binary distribution of Open MPI and would love to see it working on all kinds of systems without requiring manual user intervention like special command line flags. With Open MPI 3, ignoring virbr0 (and other interfaces like docker0) by default was a viable solution, which is no longer working with Open MPI 4.

Would you consider this a bug/regression in Open MPI 4?

On a side note, some while ago, I opened a ticket regarding a similar issue with Open MPI 3 (#6377), and @bwbarrett chimed in there and said that BTL/tcp has known issues with interface pairing which will be addressed at some point.

@bwbarrett
Copy link
Member

@mkre #7134 should fix the pairing issue with TCP; can you see if master fixes your issue? We likely can't backport to 4.0, but would like to make sure we have it right on 5.0.

@mkre
Copy link

mkre commented Feb 20, 2020

@bwbarrett, seems like many things related to PMIX have changed since 4.x. After it took me a while to get anything working at all in my heterogeneous TCP environment (playing with RPATHs, PMIX_PREFIX, PMIX_INSTALL_PREFIX, etc.), I'm now getting an pmix_init:startup:internal-failure. Not sure where to go from this point without putting too much effort into adjusting our build and startup scripts for Open MPI master.

@rhc54
Copy link
Contributor

rhc54 commented Feb 21, 2020

I'm not sure what PMIx has to do with your situation. PMIx communications are purely node-local and never cross between nodes. Or are you saying you have different OMPI prefix locations on each node (or type of node)? I can see where that would be a problem on OMPI master right now.

@mkre
Copy link

mkre commented Feb 24, 2020

@rhc54 I managed to get it working.

I am testing on three hosts, each having the following network interfaces:

  1. lo, eth0
  2. lo, eth0, docker0
  3. lo, eth0, docker0

@bwbarrett Looks like I still have to specify --mca btl_tcp_if_exclude docker0,127.0.0.0/8 to get it working properly. Without any exclude list, it will hang at startup, just as it did with previous Open MPI versions.

@wckzhang
Copy link
Contributor

@mkre Do you know whether the Netlinks or the Weighted Reachable Component are being utilized?

@rhc54
Copy link
Contributor

rhc54 commented Feb 25, 2020

PRRTE has not been updated to use reachable, @wckzhang, and so the RTE startup could well be hanging in these scenarios.

@wckzhang
Copy link
Contributor

Hmm, I don't think I considered what would happen in that case. That might explain it.

@mkre
Copy link

mkre commented Feb 26, 2020

@wckzhang, from the output from --mca reachable_base_verbose 100 it seems like the weighted component is being used.

@wckzhang
Copy link
Contributor

Weighted reachable component is less intelligent than the netlinks component. It weights connections based on Public IP same network > Public IP different network > Private IP same network < Private IP different network, as well as bandwidth. I believe it can say two interfaces on different subnets are reachable, but are low weighted. Try using netlinks, you may get greater success with that component (The Netlink component uses libnl to determine connectivity between pairs of interfaces. It is functional for both IPv4 and IPv6. The main downside of this component is that is requires libnl, which is a Linux only library which is not installed by default on many systems.)

@mkre
Copy link

mkre commented Mar 3, 2020

@wckzhang, seems like our Open MPI distribution does not support reachable/netlink. I have tried for some time now to configure Open MPI with support for this component, but I keep getting checking if MCA component reachable:netlink can compile... no in my ./configure output, even after adding my libnl3 installation directory via --with-libnl. Maybe someone can advise on what I'm doing wrong here?

@bwbarrett
Copy link
Member

@mkre can you include output of configure and/or the generated config.log.

@mkre
Copy link

mkre commented Mar 4, 2020

@bwbarrett, sure:
configure.txt
config.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants