-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fence_nb to flux pmix #8380
Conversation
opal_common_ucx_del_proc call fails if pmix doesn't implement fence_nb Signed-off-by: Sami Ilvonen <sami.ilvonen@csc.fi>
Can one of the admins verify this patch? |
ok to test |
Do you also need to apply this to |
I don't have a test setup for v4.1.x yet, but opal_common_ucx_mca_pmix_fence seem to be identical in v4.1.x so the fix is probably needed there too. |
This seems ok. It does make the nonblocking call ( It might be useful to fix |
Is this needed on master? If so, I'd prefer if it went to master first, and then to v4.0.x and v4.1.x. Thanks! |
There is no flux plugin on master, Jeff - strictly PMIx |
Problem: the OpenMPI packaged with centos 7 does not include the PMI plugin, which is required to boot under Flux. The OpenMPI version packaged with centos 8 (4.0.5) has a bug that causes a segfault on finalization, which is fixed in the next patch version. PR that fixes the bug: (open-mpi/ompi#8380) Solution: for centos 7, hand compile the same version of OpenMPI (1.10), but with the `--with-pmi` flag. For centos 8, hand compile the version of OpenMPI with the patch (4.0.6). Since this is an involved process, add a helper script that will also make adding other MPIs much easier.
Problem: the OpenMPI packaged with centos 7 does not include the PMI plugin, which is required to boot under Flux. The OpenMPI version packaged with centos 8 (4.0.5) has a bug that causes a segfault on finalization, which is fixed in the next patch version. PR that fixes the bug: (open-mpi/ompi#8380) Solution: for centos 7, hand compile the same version of OpenMPI (1.10), but with the `--with-pmi` flag. For centos 8, hand compile the version of OpenMPI with the patch (4.0.6). Since this is an involved process, add a helper script that will also make adding other MPIs much easier.
Flux runs segfault in MPI_Finalize if ucx pml is used. It seems that opal_common_ucx_mca_pmix_fence calls opal_pmix.fence_nb without checking if the fence_nb is implemented or not. This patch adds a 'fake' fence_nb call that uses PMI_Barrier similarly to the fence call.