-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPI hangs on MPI_Test with InfiniBand and high message rate #4863
Comments
@Noxoomo could you include your command line, please. How many threads? |
For simplified example i just run /usr/local/mpi/bin/mpirun --prefix /usr/local/mpi/ --bind-to none --host host1,host2 binary In my application I use 1 thread per GPU device + 1 thread on each machine to route command (I tried to serialize all calls to MPI function via global mutex, but this only reduce frequency of freezes). Issue was reproduced even with 1 active device per machine (so 2 host, each 2 working threads) |
This might be because openib BTL doesnt handle high injection rate very well. @Noxoomo Can you try this?
|
Does not help, still hangs. Simple example from first post hangs the same way.
This one does not hang example from first post, but it might work well just for selected for test constants, so i'll check it in more details later. UPD: build application to check from wrong revision, results about it will provide latter |
@jladd-mlnx attached example code do not have any GPU specific buffers. @Noxoomo can you give it a try with UCX pml? |
@bureddy, @jladd-mlnx Today I build OpenMPI from fresh master with latest commits (merged PR #4852 ) and I currently can't reproduce hang. I'll run more stress test later and will write about results |
I make more test with merged PR #4852:
I currently can't run my application with OpenMPI + UCX, i get exception with this stacktrace: I assume this caused by some compatibility issues with compilers and/or system libraries, but I did not figure out whats going wrong yet. |
Thank you very much for reporting back.
You should not have to move all comms to single thread. This might still be the same problem that #4852 try to fix. We would like to understand more about this hang but this seems to be very difficult to track (after 10 hours of run) but when it hang, can you get the stack trace for us? |
AFAIK, OpenMPI + UCX doesn't support MPI_THREAD_MULTIPLE (at least with default compile instructions) BTW, separate thread for all MPI calls is good for my application design and i just didn't have time before to make it. |
OK, i'll provide stack trace on the next run. It'll be at the end of week, GPU's are scare resource and I can't take them for too much time during working days. |
Just one more thing: i run benchmarks before i changed logic to comms through single thread with merged PR #4852, there hang were also rare (after several hours). |
Just for you information: I don't forget about the issue, but I currently can't use machines for enough time to reproduce problems. |
Thank you for the update. |
Closing due to no update. Can be reopened. |
Background information
I have GPU-based application, which uses MPI to transfer messages and data between several nodes.
It works fine on 1GB/s networks, but stucks in deadlock if i switch to InfiniBand.
System description, OpenMPI details and etc
I've reproduced issue on two clusters with InfiniBand network. I've used several OpenMPI versions builded from source for all runs (see details below)
First cluster consists of Intel dual-socket servers with Ubuntu 12.04 and NVIDIA GPUs, Mellanox Technologies MT27500 Family [ConnectX-3] for InfiniBand. On this cluster i've tried OpenMPI 2.1.2 and OpenMPI 3.0
On the second one there are dual-socket Intel servers with Ubuntu 16.04 and NVIDIA GPUs, Mellanox Technologies MT27700 Family [ConnectX-4] for InifniBand. This cluster is IPV6-only, so on it OpenMPI was builded from master with several patches to fix IPV6-compability.
All MPI builds were with CUDA support.
I could provide more information on request.
Details of the problem
My application sends a lot of small message. All sends and receives are async (ISend, IRecv) and i use MPI_Test in loop to check for completeness of operation. On InfiniBand networks MPI_Test would not "return true" for MPI_ISend which was received by other host (communications in my application usually uses unique tags so i dumped every receive and send request and successful call to MPI_Test and from logs i saw that some MPI_ISend was received, but sender was notified about it). As a result, my application waits forever.
I checked the same code with MVAPICH2 and everything work fine.
My application is complex, so I've reproduced the same issue with far more simple code, attached bellow:
Backtrace:
The text was updated successfully, but these errors were encountered: