-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check to see if our HDF5 files have checksums turned on #1525
Comments
This came up at the "HDF formats" discussion. HDF does not have top-level checksum a la frame files, but it does have a per dataset checking feature (fletcher32): http://docs.h5py.org/en/latest/high/dataset.html#fletcher32-filter We should probably just turn this on. |
BTW, if someone wants to turn this on globally, the best way is to add a
method to this class here (wrapper of the h5py.File class).
https://github.com/ligo-cbc/pycbc/blob/master/pycbc/io/hdf.py#L24
and simply modify the setitem / getitem. We've done a similar thing in the
statmap code that could be moved here to also enable a certain compression
level by default. Then everything just uses the interface like normal
without having to worry about manually setting a lot of these things (with
a worse user interface). We can then just transition the file open calls to
this class as we find the time.
…On Thu, Mar 16, 2017 at 10:22 PM, Ian Harry ***@***.***> wrote:
This came up at the "HDF formats" discussion.
HDF does not have top-level checksum a la frame files, but it does have a
per dataset checking feature (fletcher32):
http://docs.h5py.org/en/latest/high/dataset.html#fletcher32-filter
We should probably just turn this on.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1525 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACGrRuOPfNoakYsPAsJyTKjbz4O-aboZks5rmaf6gaJpZM4MenWk>
.
--
Dr. Alexander Nitz
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Callinstrasse 38
D-30167 Hannover, Germany
Tel: +49 511 762-17097
|
Did anyone give this a try, and if so did it "just work"? |
@ahnitz @spxiwh @titodalcanton did |
We did not implement the "per dataset checking feature" within PyCBC. .... However, Pegasus implemented checksum testing on all data files (similar to our own checks on frame files). After we added an option to stop it testing symlinked files, this has been working nicely. We often disable this for development runs, so must remember to re-enable it for production runs! This doesn't help if using the files outside of pegasus though, unless one also extracts the checksums for all files (which is possible). We might consider enabling this feature, but it would not be ideal to have to enable it in every HDF call, as there are a lot of these throughout PyCBC. Some sort of environment variable to make this default would be much easier .... @GarethCabournDavies A point of discussion for the face-to-face tomorrow(?) |
Are the Pegasus checksums maintained in a database or calculated on the fly before/after each file transfer? Note, in either case I think it would be worth while also adding internal checksums to the critical files as well. Thanks. |
For enabling the h5py dataset checksu: Could we insist on all calls to This could add the environment variable check into the |
I also would need to check that e.g. |
Can be closed by #4831 |
No description provided.
The text was updated successfully, but these errors were encountered: