-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not return in multi-node MPI applications #429
Comments
Hello @parrotsky , This is unusual, Caliper shouldn't affect MPI progress when going from intra- to inter-node communication. Does this only happen when Caliper is enabled? It's possible the issue is in the underlying program. In particular, pay close attention to the order of communications between the processes and make sure you're not stuck in a blocking |
Hi, @daboehme Thanks for your reply. |
Hi, First I would like to thank the contributors for providing such an elegant and easy-to-go library to profile MPI programs.
MY problem:
I built a mpi cluster within a lan with up to 8 devices (Linux Ubuntu 20.04) according to the MPI tutorial.
I want to use Caliper to profile my applications over multiple devices. And before that, I wrote a simple hello world to test if it works.
The code is as below:
the program works perfectly with multi-threads on a single device.
When I test them over two devices(nodes), the program could not return normally and got stuck in somewhere.
Is there anybody who encounters the same issue or figure out where the bug locates?
Thanks a lot for answering.
The text was updated successfully, but these errors were encountered: