Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential client crash at MPI_Finalize() time #777

Closed
wangvsa opened this issue Jun 21, 2023 · 1 comment
Closed

Potential client crash at MPI_Finalize() time #777

wangvsa opened this issue Jun 21, 2023 · 1 comment

Comments

@wangvsa
Copy link
Collaborator

wangvsa commented Jun 21, 2023

When MPI attributes were created with MPI_Comm_delete_attr_function* call backs, the MPI_Finalize() call will invoke the call back for the user to clean up resources.
HDF5 uses the call back to write cached data to the file at MPI_Finalize() time.

Example found from HDF5 source code:

MPI_Comm_create_keyval(MPI_NULL_COPY_FN, (MPI_Comm_delete_attr_function *)H5_mpi_delete_cb,  &key_val, NULL)
MPI_Comm_set_attr(MPI_COMM_SELF, key_val, NULL)
MPI_Comm_free_keyval(&key_val)

However, when we intercepted the MPI_Finalize() call, we unmount the client before calling PMPI_Finalize(), which causes HDF5 to write to non-existing files that may crash the client.
I noticed this issue when running qmcpack (it uses HDF5).

Fix: call PMPI_Finalize() before we unmount the client, if unifyfs_unmount() does not use MPI.

@adammoody
Copy link
Collaborator

Great debugging, @wangvsa ! Your proposed fix also sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants