-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite errors on s390x #95
Comments
It's not just |
Is the zfp library built with |
@lindstro I think if the problem was compilation withOUT Lines 155 to 159 in e81aec1
and we would see an error message to that effect in the HDF5-DIAG call trace output. |
Rather than muse, we can simply look into the build log and discover that neither flag mentioned by @lindstro is passed. We should therefore treat this report as a user error for now and investigate the situation on the Debian side. |
@helmutg the flag needs to be passed to zfp's build, not h5z-zfp, and it was added to zfp in response to Debian bug#1023821. The h5z-zfp build log mentions using @markcmiller86 has a point: the reason the flag was added to zfp was precisely because of the h5z-zfp check he quoted. |
@helmutg I haven't read in detail your whole post here but is your basic conclusion one where things work when everything is BIG endian but fail with LITTLE endien (or vice versa) or do things fail only with MIXED endian (between writer and reader)? Reason I ask is that the filter is supposed to have logic to handle HDF5 internal endian swapping (for MIXED endian cases) and we do test this with the |
Ok, did a little more thinking about this. The failure in the H5Z-ZFP filter is occuring here... Lines 462 to 465 in e81aec1
The error message emmitted is a bit misleading. Whats failing is an attempt to read ZFP's header. The error message should read something like |
We observe that builds on little endian succeed and builds on big endian fail. I would hope to resolve the question of mixed endian with your expertise: If the test suite contains little endian data files to be parsed as a test case, then a big endian build will exercise mixed endian features. Am I right in assuming that most |
I looked into the zfp source code and all that it does there is verify 4 octects 'z', 'f', 'p', ZFP_CODEC == 5. I can actually locate this sequence in the affected test file, so it seems likely that the outer layer parsing code gets the offset wrong somehow. This is a bit tricky as we're dealing with libhdf5 and libzfp and seeing them all in action at the same time. It could be a bug in either of them in principle. The libhdf5 testsuite seems to succeed on s390x. I suppose we're not going to get far without debugging this on an actual big endian machine. |
Well, what is failing here is before any raw data reading where endienness could be an issue is involved. That said, the filter does write ZFP's header bytes to the HDF5 dataset's |
Ok, I think I may know the issue now that I've written the above description. I think the issue is that when the filter writes ZFP's header to the dataset's I think that is what is going on here. |
That seems like a plausible explanation. Just to be clear, when |
To try and confirm these hypotheses, I added this instrumentation to --- src/orig 2022-12-22 11:09:49.215971341 +0000
+++ src/H5Zzfp.c 2022-12-22 11:15:44.563971308 +0000
@@ -448,6 +448,15 @@
/* make a copy of cd_values in case we need to byte-swap it */
memcpy(cd_values_copy, cd_values, cd_nelmts * sizeof(cd_values[0]));
+ for (int i = 0; i < (cd_nelmts > 10 ? 10 : cd_nelmts); ++i)
+ fprintf(stderr, "cd_values[%d] = %x\n", i, cd_values[i]);
+
+ for (int i = 0; i < (cd_nelmts * sizeof(cd_values[0]) > 16 ? 16 : cd_nelmts * sizeof(cd_values[0])); ++i)
+ {
+ unsigned char* buf = (unsigned char*)cd_values;
+ fprintf(stderr, "buffer[%d] = '%c' %02x\n", i, buf[i], (unsigned int)(buf[i]) & 0xff);
+ }
+
/* treat the cd_values as a zfp bitstream buffer */
if (0 == (bstr = B stream_open(&cd_values_copy[0], sizeof(cd_values_copy[0]) * cd_nelmts)))
H5Z_ZFP_PUSH_AND_GOTO(H5E_RESOURCE, H5E_NOSPACE, 0, "opening header bitstream failed"); The resulting output is:
At a quick glance, it does seem that |
@spanezz ok, thanks for that experiment. When I am back from holidays, I will work on a fix. |
And, I think the right answer will be read "normally", check for possibility of it having been byte-swapped and if so, un-byte-swap it before proceeding. |
Hi, were you able to make any progress? Let me know if/how I can help |
When reading a byte-swapped file, the input is grouped to 4-byte words and each of them is swapped individually. When we try to read such a file, we first validate its header using zfp_read_header with the ZFP_HEADER_MAGIC flag. This flag causes it to only validate the first word to be "zfp\x05". If it is not exactly that, it gives up. Unfortunately, this magic word can already be swapped. The actual byte swapping code would only be tried once the full header would fail to read, so automatic byte swapping never worked. Instead, when encountering a header with bad magic, try swapping it already and only try reading the full header once the magic (normal or swapped) has been read successfully. Thanks to Mark C. Miller, Peter Lindstrom and Enrico Zini for doing most of the debugging to get here.
When reading a byte-swapped file, the input is grouped to 4-byte words and each of them is swapped individually. When we try to read such a file, we first validate its header using zfp_read_header with the ZFP_HEADER_MAGIC flag. This flag causes it to only validate the first word to be "zfp\x05". If it is not exactly that, it gives up. Unfortunately, this magic word can already be swapped. The actual byte swapping code would only be tried once the full header would fail to read, so automatic byte swapping never worked. Instead, when encountering a header with bad magic, try swapping it already and only try reading the full header once the magic (normal or swapped) has been read successfully. Thanks to Mark C. Miller, Peter Lindstrom and Enrico Zini for doing most of the debugging to get here.
When reading a byte-swapped file, the input is grouped to 4-byte words and each of them is swapped individually. When we try to read such a file, we first validate its header using zfp_read_header with the ZFP_HEADER_MAGIC flag. This flag causes it to only validate the first word to be "zfp\x05". If it is not exactly that, it gives up. Unfortunately, this magic word can already be swapped. The actual byte swapping code would only be tried once the full header would fail to read, so automatic byte swapping never worked. Instead, when encountering a header with bad magic, try swapping it already and only try reading the full header once the magic (normal or swapped) has been read successfully. Thanks to Mark C. Miller, Peter Lindstrom and Enrico Zini for doing most of the debugging to get here.
I believe this issue has been resolved by prior work. Capturing here results from running most up to date code on s390x system on HDF5-1.10.7 ZFP-0.5.0
ZFP-0.5.5
ZFP-1.0.0 with
|
I've uploaded git to Debian experimental and confirm that it builds on s390x. Thank you. |
Debian build machines report a failure of the test suite on s390x.
The test suite suppresses output, so I run failing tests again manually:
Same for
ifile=test_zfp_030235.h5 max_reldiff=0.025
andifile=test_zfp_110050.h5 max_reldiff=0.025
.For the
h5repack -f UD=32013,0,4,3,0,3539053052,1062232653
test, I get a ratio of99
.I tried to extract all information I could think of, let me know if I can help debug this further
The text was updated successfully, but these errors were encountered: