-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different ZFP stream word sizes #133
Comments
If we store an additional bit of info in the H5Z-ZFP header, then I think we should design things to require 8-bit on writes by default but allow user to override this default allowing non-8-bit word stream on writes. This would mean adding a property to the properties interface and utilizing a currently unused bit in the generic interface. By default H5Z-ZFP would behave as it currently does, erroring in But, we can add the ability for users to disable this behhavior allowing any stream word size. On the read end, it is probably best for the ZFP library to always be configured for 8-bit streams. That way, it will read correctly, I think, regardless of stream word size used in writer. We need to understand what will happen and how we'll detect the situation when we read non-8-bit stream compressed data with a non-8-bit stream reader. Do we know. Will it just sort of work most of the time (e.g. little endian) and not work only in cross-endian situations or only big endian or both? |
Maybe I'm being confused by HDF5 lingo, but requiring 8-bit writes while allowing non-8-bit writes seems contradictory. I would advocate for encoding which word size was used in the It is not true in general that an 8-bit reader can correctly process streams written as N-bit words, with N > 8. This works only on little endian machines, and there are still issues with alignment. I would propose that H5Z-ZFP perform explicit 64-bit alignment by padding the end of the stream after In the case of big endian machines, we cannot mix word sizes. The H5Z-ZFP reader should just fail if libzfp was built with a different word size. Unless we have the opportunity to manually byte swap the compressed data first. |
I think we do have the ability to byte-swap data as desired before (or after) ZFP operates on it. Would it make sense to always have the filter deliver (for compressions on write) ZFP little endian data, regardless of host's native endian-ness? That way, we'd be always living in a world where 8-bit reader can correctly process N-bit word streams with N>8. We'd just have to do some additional endian gynmastics when running on big-endian systems. |
This seems like a reasonable approach, in particular since big-endian machines are quickly falling out of favor. But let me ask this first: is there any situation where H5Z-ZFP might call I would propose that we add a 2-bit code, n, to The only change going forward is that we would add some padding to the compressed stream on write. On read, we would ideally check the number of compressed bytes processed to make sure it matches expectations, and we'd then have to be a little careful when dealing with mismatched word sizes between the written file and the compiled filter. It appears that the filter currently checks only if the return value is zero, indicating failure. |
Hmm....remember HDF5 compresses individual blocks so strictly speaking, I think you are asking whether it would call However, with partial I/O, you can have a situation where a block is only partially written (remaining parts of the block are treated as a fill value) and then a later Hmmm...now that I hear myself describing that, something just occurred to me having to do with the possibility of compounding loss with lossy compression. If I write a partial block, the whole block (partial + fill) gets |
To me it is clear that any lossy compression may have issues when allowing to write incomplete chunks to the HDF5 file and therefore chunks should be written only once. "Write once, read many" is responsibility of the ZFP user, not the ZFP developers. The HDF5 chunk cache size must be set big enough to prevent uncontrolled flushing when using lossy compression filters. |
Yes, losses can certainly compound. With some arbitrary fill value (instead of zfp's specialized padding), compression accuracy will in general be negatively impacted in zfp's fixed-rate mode. Those errors will then persist (unless they're canceled by pure luck) over subsequent re-compression calls. Moreover, those errors could also hurt decorrelation and then degrade compression once a block is filled. I don't have a good sense of how severe this problem is, however. |
I will have to inquire with the HDF Group but I would think it should be possible, maybe, for H5Z-ZFP to use ZFP's "specialized padding" in most circumstances. Also, I may be confused by the word "persist" but I've always thought once ZFP has compressed some data, there is some loss that can never be recovered and so in that sense, any ZFP compression results in some persistent errors. The question I had is whether they may grow with subsequent ZFP compressoin calls. |
The point I was making about persistent errors is exactly the one you make. In other words, compressing half a block padded with fill values will introduce some irrevocable loss. When you are later given the rest of the block and re-compress, it's not the same as if you compressed the whole block only once, as you've now lost information, and the errors that were introduced in the first compression step persist. There's also a second form of error that results from injecting "noise" in the data during the first round of compression that could then hurt zfp's "predictor" (really, a transform) and cause compression vs. accuracy to suffer in the second round of compression. To illustrate this point, consider a simpler compressor that predicts and represents linear functions perfectly. Suppose we have a block of data (2, 4, 6, 8). The predictor would result in a perfect fit to this data. But suppose that we're initially given only a partial block (2, 4, *, *), where * denotes a not-yet transmitted value that is replaced by a fill value of 0. The best linear fit (in the least-squares sense) to this padded block (2, 4, 0, 0) is then (3, 2, 1, 0). When we later receive the last two samples, we're asked to compress (3, 2, 6, 8) instead of (2, 4, 6, 8), because of lossy compression in the first step. The best linear fit to this modified block is then (1.9, 3.8, 5.7, 7.6), even though the original block could be represented exactly. We then have to spend precious additional bits if we want to (perhaps partially) correct this error. Worse yet, we have to make large corrections of values (1.9, 3.8) to (3, 2) that have already been contaminated with error, and those are relatively more costly than the smaller corrections of values (5.7, 7.6) to (6, 8). A similar issue occurs with zfp when a block is compressed in multiple stages. |
@lindstro...ok thanks for that detailed explanation 💪🏻 I agree with @vasole that maybe we should include in H5Z-ZFP docs some advice on this...perhaps avoid doing partial I/O + lossy compression. @vasole do you happen to have any other refs in the literature about this issue in general? I also think it is an unusual use case that is unlikely in pracitce. But, I may be biased in my experiences so far. I do not think we can easily detect partial I/O or blocks with fill value inside the H5Z-ZFP filter funcs themselves. We might be able to interrogate HDF5 for those details...I don't honestly know. |
Perhaps this page: https://en.wikipedia.org/wiki/Generation_loss I do not know if it is what you are asking for, but the situation most users should be familiar with is the degradation of JPEG images when edited and saved again in JPEG format instead of using a lossless format after editing. In this web page they called simply JPEG degradation: https://imagekit.io/blog/jpeg-image-degradation/ I particularly like their analogy with the "Photocopier Effect". When doing a photocopy something is lost. If you do a photocopy of the photocopy things will degrade more and so on. |
This issue of "generation loss" is one I've pondered for a long time and hypothesized about but never had much time to investigate further. We've conjectured that starting from some arbitrary input x that is compressed-then-decompressed as D(C(x)), another round-trip of compression + decompression should not change the result if the same compression parameters are used. And we've had some reasonable arguments for why that should be the case. However, I just ran some experiments with real data and very low rates and precisions, e.g., a rate of 22 bits/block in 2D, translating to a rate of 11/8 = 1.375 bits/value. Note that each 2D zfp block requires 12 bits of header to represent the common exponent and whether the block is all-zero, so coefficients of such blocks are allocated only (22 - 12) / 16 = 0.625 bits/value. In such settings, it seems that drift can occur to the point that repeated D(C(x)) does not only fail to converge but actually diverges and blows up. This seems to be the case only in fixed-rate or -precision modes, but I have not rigorously verified this. The gist of it is that a real input value of 1 decompresses to a value somewhat different from 1 and that at extreme compression gets reconstructed and rounded to a value of 2 (recall that the rate is here less than one compressed bit/value, and precision may be even lower than one uncompressed bit/value). Fed back into compression, the only difference between 1 and 2 as input is the exponent, which zfp factors out, so 2 gets reconstructed as 4, 4 gets reconstructed as 8, and so on. Hence, each application of D(C(x)) doubles the input value, until we eventually blow up. Now, in practice, such extremely low precision is of course not practically useful, where values are not even accurate to a single bit. And if you bump up precision or rate slightly, e.g., from 22 to 23 bits per block of 16 values, the repeated application of D(C(x)) converges quickly after 1 or 2 iterations. Again, I have only anecdotal data to support this and it would be useful to analyze this issue more rigorously. Let me also add that errors occur both in the "conversion" (compression) of IEEE floating-point values to zfp and in conversion in the opposite direction, from zfp to IEEE floating-point values (decompression). This is because the two number systems fundamentally represent different subsets of the reals (or tensors of reals) that have a large intersection but with neither being a subset of the other. In the case I'm describing above, it seems clear that the issue is with lack of zfp precision, i.e., errors are incurred on compression but not decompression. But I thought I'd point out that loss may also be due to "limitations" of IEEE as zfp uses 62 mantissa bits while IEEE double precision uses only 53 mantissa bits. |
Turns out there is an HDF5 method to control whether the lib compresses partial chunks ( |
@markcmiller86 What are your thoughts on this proposal of mine:
Just to clarify, I meant a 2-bit code n to represent a word size of 2n bytes, so valid values of n are {0, 1, 2, 3} to represent word sizes of {8, 16, 32, 64} bits. When dealing with files already generated with n = 0 (8-bit word size), you would not be able to decompress with larger word sizes unless the stream size is a multiple of the word size, even though in practice, any stream is overwhelmingly likely I'm sure that on decompression in H5Z-ZFP, we already are given the stream length in bytes, so we can test then if that's a multiple of the word size and bail otherwise. We'd also have to manually do some byte swapping on big endian machines. I think we want to prioritize this capability as zfp's CUDA and HIP support currently requires 64-bit words, so without this fix you can't install zfp both with GPU and HDF5 support. And as mentioned, zfp tests pass only for 64-bit words. |
We talked about this more and decided to focus primarily on little-endian machines/workflows first. To address this, we decided the right things to do are...
|
@markcmiller86 Thanks for the summary. I think it's safe to assume that the buffer is word aligned if it was produced by Regarding padding the output stream to a multiple of 8 bytes, it may make sense to provide a compile-time macro to disable such new (but now default) behavior for applications that might be sensitive to such a change. |
@lindstro we were talking about big endian systems and that issue came up in a recent YouTube I watched regarding the Linux Kernel |
Currently, H5Z-ZFP requires ZFP be configured with 8-bit word streams. It is an outright error (not just a silent skip) to attempt to use H5Z-ZFP if ZFP library is configured incorrectly. This means it is impossible to overlook cases where ZFP library is perhaps incorrectly configured for stream word size.
But, ZFP can actually be configured with 16, 32 and 64 bit word sizes as well. Higher word sizes mean faster compression/decompression. That is important for in-memory ZFP arrays. Why can't we can support these various word sizes in H5Z-ZFP? Well, the answer is that we can but what do we do when the resulting data is readin a cross-endian context?
If we store addtional information in the dataset header about ZFP stream word size used, we can detect the combination of >8 bit word stream size AND cross-endian context and fail the read. Well, it may be that an 8-bit configured ZFP on the read half of the operation will be able to work with any ZFP word stream size on the write half. So, maybe we only need to make sure that when reading ZFP compressed data, the ZFP library is configured with 8-bit stream word size.
@lindstro identified a situation in which the current filter behavior to require 8-bit stream word size is helpful when ZFP library is being installed at a facility and apps are using that install and then someone accidentally installs ZFP with non-8bit word stream size. Currently, it would be quite obvious because data writes would fail. If, however, we make it silently skip ZFP compression in that case, that could present serious issues. Its just a use case to keep on our radar and be sure we don't somehow wind up increasing the likelihood of this outcome.
The text was updated successfully, but these errors were encountered: