Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AIORI HDF5 for better compatibility with VOL connectors #441

Merged
merged 5 commits into from
Nov 8, 2022

Conversation

jhendersonHDF
Copy link
Contributor

No description provided.

soumagne and others added 4 commits October 26, 2022 23:35
Fix IOR to work with native and non-native VOL connectors

Add support for HDF5_Access with H5Fis_accessible if available

Clean up HDF5 backend and move MPI_Init call

Add support for H5Fdelete
Retrieve HDF5 dataset's file dataspace upon dataset setup rather than dataset data transfer
src/ior.c Outdated
@@ -196,14 +196,14 @@ int ior_main(int argc, char **argv)
out_logfile = stdout;
out_resultfile = stdout;

/* start the MPI code */
MPI_CHECK(MPI_Init(&argc, &argv), "cannot initialize MPI");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With parallel-enabled HDF5 VOL connectors, we've usually found that some command-line options may initialize HDF5 before MPI has been initialized and cause the connectors to believe they're running in serial mode. I see the comment below about check -h option from commandline without starting MPI;, but it seems like it would be ok to start MPI first, even if all the user is doing is printing out the help summary. Looking for any feedback on whether this is ok or if there might be a better approach for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some MPI implementations that fail if one doesn't run them inside a submitted batch job.
This is a nuisance for the users as they have to run a parallel job with 1 proc to then print out the help.
Isn't there a proper way inside the HDF5 VOL or plugin to fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; thanks for the info! I'll restore the original behavior and revisit.

Isn't there a proper way inside the HDF5 VOL or plugin to fix this?

Unfortunately there's a bit of an ordering problem right now where we assume that MPI has been initialized before HDF5. Some VOL connectors may attempt to bring up MPI if it isn't brought up when they initialize, but that would only break IOR here since it would fail when it tries to call MPI_Init. This ordering issue was initially a problem with the DAOS VOL connector, but I believe that occurred due to the "get version" callback being called and initializing HDF5 and the connector before MPI was initialized. I don't know if that particular situation is still an issue; I would need to investigate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not quite sure how it happens that they try to initialize themself. Is that triggered by H5get_libversion()? Which call does it?

The best strategy would be to initialize HDF5 in the plugin by implementing the AIORI initialize function in the
ior_aiori_t hdf5_aiori struct.

A prospect strategy could be:
initialize options (doesn't require bringing up any VOL connector and such).

initialize plugin => initialize anything necessary...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe the H5get_libversion call was the routine bringing everything up the last time I tried this. I don't know if that's still the case since it's been a while, but I brought that change over because I know it was needed previously. I can certainly try implementing the initialization callback to just call H5open and then see if there are any other paths that may prematurely bring up the library and address those.

Copy link
Collaborator

@JulianKunkel JulianKunkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks alright except for the MPI Init situation.

@jhendersonHDF
Copy link
Contributor Author

Hi @JulianKunkel,

if the PR looks okay in its current form can it be merged? I believe at this point the MPI_Init change may no longer be needed and if I still have issues related to HDF5 initialization (I haven't had much time to test that yet) I can set up a separate PR for discussion.

@JulianKunkel JulianKunkel merged commit 970e25c into hpc:main Nov 8, 2022
@JulianKunkel
Copy link
Collaborator

Jep, great, thanks for the reminder, I didn't see the update in the PR before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants