-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Other ZFP features to consider supporting #32
Comments
Hi Mark, I've a branch to support multi threaded compression operations of the plugin. This relies on the OpenMP Parallel Execution policy of the ZFP library. |
Cool. Can you send me the link(s) to the branch? I am pretty much a thread newbie. I know MPI well, but don't have much experience with threads or OpenMP. I am guessing one parameter to control needs to be the number of threads the caller wants the plugin to be allowed to use. Another, may be to avoid thread creation/destruction overheads on multiple instantiations of the plugin for different datasets that would maybe allow the caller to give the plugin the specific, already created, threads to use? I dunno. |
Hi Mark,
In current implementation, the number of spawned threads are controlled by the OMP_NUM_THREADS environment variable, but the number of actual threads involved in the compression is decreased until each thread would deal with a long enough chunk. This choice, which can be modified, prevents overhead of using too many threads on too small buffers. The parameter min_stream_size_per_thread is currently set to 256K bytes, but can be changed and set during plugin initialization.
As far as I know, HDF5 do not really support multi-threaded operations (if compiled with threadsafe support, it just serialize operations called from different threads). If I understand well your indication, multiple instantiations of the plugin will be probably made by different MPI processes or, if in a multi-threaded region, serialized by the library itself. |
Great. Thanks @lferraro. Just took a peek. Looks like you currently have implemented the compression (write) only. Is that correct? Have you tested/played with it much? Can you propose a test mode we can add to H5Z-ZFP's test suite? If I am jumping the gun, lemme know. Just interested in understanding the work ahead. |
@markcmiller86 zfp does not (yet) support OpenMP decompression. For other than fixed-rate mode, this will require encoding additional information on where in the variable-rate stream blocks reside. We are actively working on parallel decompression, but we likely won't see support for this for another year. |
Thanks @lindstro for explanation. Regarding plugin properties to control threading. From the code, it looks like there are potentially two or three new controls...
A challenge here is that up to this point, the filter controls have been mutually exclusive. So, the generic interface which uses CPP macros to set We could define
which will insert values for For the real properties, this is more easily handled by extending the definition of |
As @lindstro clarified, decompression can be performed in parallel with fixed rate mode only.
I've developed this branch last week. I've tested this solution with some of our applications in our HPC environment, registering good scaling speedups for our needs. It was during those tests that I decided to go with the minimum size of stream per thread control which prevents misusage of the feature in an easy automated controlled manner.
Yes, sure. I can add a regression test for this in my next commit. The final compressed stream is independent of execution policy or the number of used threads. We can enforce some tests for this, reporting also achieved speedups.
Next step is the implementation of the execution policy control in the filter setup, for both generic and properties interfaces. The execution policy (serial, openmp or CUDA) is independent with respect the selected compression mode. This should be stressed and enforced in some way with the API of the plugin initialization. What about something like the following for the generic interface ... H5Pset_zfp_exec_serial(size_t cd_nelmts, unsigned int *cd_vals); // the default, no required parameter
H5Pset_zfp_exec_openmp(size_t num_threads, size_t min_size_per_thread, size_t cd_nelmts, unsigned int *cd_vals);
H5Pset_zfp_exec_cuda(size_t cd_nelmts, unsigned int *cd_vals); // no required parameter We can safely drop the chunk size and scheduling control execution parameters for the moment since I consider them too much fine tuning features for an HDF5 user in this first release. We can add support for them in the future if users ask for it. |
Since this makes sense only for rate mode, I wonder if the relevant params here should be folded into existing rate setting interface or a new interface for rate defined like so...
With the above approach, we can make the macro a varargs macro and detect the old interface and new interface users because the second arg will either be a small positive number (for old interface) or a large positive number (for new interface). Alternatively, I'd just define a new interface...
|
@markcmiller86 Just to be sure we're on the same page, parallel (de)compression makes sense for all compression modes, although currently not all combinations are supported. Thanks to @LennartNoordsij, we now have an OpenMP implementation of variable-rate decompression also, but this mode requires storage of an additional (possibly very small) index that records offsets into the stream. You and I should discuss how to record this additional metadata in H5Z-ZFP without breaking backward compatibility. I wanted to point this out so that the design you and @lferraro agree on does not fundamentally limit parallel (de)compression support to fixed-rate mode. In particular, zfp 0.5.5 already supports OpenMP compression (but not decompression) for all compression modes. |
Ah, ok. Now I understand. So, these new parameters really do need to be wholly split out from other parts of the interface for setting compression params. |
Exactly. That's way I suggested to add independent interfaces with respect compression mode.
@lindstro do you have a working branch with this feature? When do you think this new feature will be released? What is required to implement in the HDF5-ZFP plugin to support it? @markcmiller86 what do you think of my suggested global interface to select the execution mode in plugin setup? |
@lferraro We do have such an experimental branch, though we're still iterating with @LennartNoordsij on the API. It's possible/likely that the API will take a different form when this capability is eventually released. Supporting block indexing in the zfp command-line tool will require underlying changes to the zfp compressed format. It would make sense to bundle those changes with others that we have planned, but those will take at least another year to implement. That said, I think we have some flexibility in how we incorporate the changes necessary for parallel decompression in H5Z-ZFP, so that could likely happen much sooner. |
@markcmiller86 do you agree with the following APIs to select the execution mode? Global interface for dynamically loaded HDF5 plugin : H5Pset_zfp_exec_serial_cdata(size_t cd_nelmts, unsigned int *cd_vals); // the default, no required parameter
H5Pset_zfp_exec_openmp_cdata(size_t num_threads, size_t min_size_per_thread, size_t cd_nelmts, unsigned int *cd_vals);
H5Pset_zfp_exec_cuda_cdata(size_t cd_nelmts, unsigned int *cd_vals); // no required parameter Properties Interface for dataset creation property list: herr_t H5Pset_zfp_exec_serial(hid_t dcpl_id); // the default, no required parameter
herr_t H5Pset_zfp_exec_openmp(size_t num_threads, size_t min_size_per_thread, hid_t dcpl_id);
herr_t H5Pset_zfp_exec_cuda(hid_t dcpl_id); // no required parameter How can we proceed? Do you want me to implement these interfaces in my branch or do you prefer to work on it? |
Yes
Sure |
@lferraro apologies for letting this languish. I guess I either didn't close the loop here or didn't continue tracking work on your branch. Can you update me as to status at this point? |
Dear Mark, I'm really sorry. I didn't make any progress in my branch due to
a lot of work I had in the last months. Currently, I'm about to finish, so
I hope I can get back to the branch and finalize it. If I am right, in
order to complete the work, we need to implement the APIs to select the
execution mode and add some regression tests for the multi-threaded plugin.
I think I can start working on it starting from the 20th of April, but If
you are in a hurry or if you want to take care of implementing this stuff
by your own, of course you can go on.
…On Wed, Apr 8, 2020 at 10:39 PM Mark C. Miller ***@***.***> wrote:
@lferraro <https://github.com/lferraro> apologies for letting this
languish. I guess I either didn't close the loop here or didn't continue
tracking work on your branch. Can you update me as to status at this point?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB7FWUW2WH7QXUHT4WI6Y6TRLTOJRANCNFSM4H3XM27A>
.
|
@lferraro ... absolutely no worries! I just wanted to check in and see if you still plan/want to continue work on this. It sounds like you do and I fully welcome the help 😄 . I will try to remember to touch base in another few weeks. Take care and stay safe. |
The text was updated successfully, but these errors were encountered: