[Feature Request] HDF5 parallel compression feature improvements #3348
jhendersonHDF
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The following items are possible optimizations and improvements for the parallel compression feature that were noted at the time of the feature's implementation:
Implement a better check for whether particular dataset data chunks are being completely overwritten so as to avoid reading them from the HDF5 file. Due to complexity in coordinating this information between MPI processes, the library currently only recognizes full overwrites of chunks when they're being fully overwritten by a single process. A better check would be able to determine when the total combined selection among all MPI processes writing to a particular chunk covers the entire chunk.
Implement an API routine that allows the user to hint to the library that no chunk in a write operation will be written to by more than process. This can allow the library to skip a significant portion of pre-I/O setup code that involves MPI overhead since the library will not need to worry about determining chunk ownership for the write operation.
Implement a new internal function that will allow passing of a vector of chunks to reallocate space in the file for. The chunk index code currently requires reallocating file space for each chunk individually; this new function could allow the file free space management code to be more efficient about allocating file space for the chunks and would also save a bit on function call overhead.
Implement a new internal function that will allow passing of a vector of chunks to query information for, such as the chunk address and any filter masks. The chunk index code currently requires querying this information for each chunk individually; this new function could save on metadata I/O overhead by not querying the chunk index structure every time for a single chunk.
Implement a new internal function that will allow passing of a vector of chunks to re-insert into the chunk index structure. The chunk index code currently requires re-insertion of each chunk individually; this new function could save on metadata I/O by eliminating some common metadata operations performed each time for each chunk.
Add support for the parallel compression feature to
h5perf
Beta Was this translation helpful? Give feedback.
All reactions