Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refuse to read future format versions #217

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README_HEADER.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ All entries are little endian.
:bit 2 (``0x04``):
Whether the bit-shuffle filter has been applied or not.
:bit 3 (``0x08``):
Reserved
Reserved, must be zero.
:bit 4 (``0x10``):
If set, the blocks will not be split in sub-blocks during compression.
:bit 5 (``0x20``):
Expand Down
73 changes: 41 additions & 32 deletions blosc/blosc.c
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,8 @@ struct blosc_context {

const uint8_t* src;
uint8_t* dest; /* The current pos in the destination buffer */
uint8_t* header_flags; /* Flags for header. Currently booked:
- 0: byte-shuffled?
- 1: memcpy'ed?
- 2: bit-shuffled? */
uint8_t* header_flags; /* Flags for header */
int compressor_version; /* Compressor version byte, only used during decompression */
int32_t sourcesize; /* Number of bytes in source buffer (or uncompressed bytes in compressed file) */
int32_t nblocks; /* Number of total blocks in buffer */
int32_t leftover; /* Extra bytes at end of buffer */
Expand Down Expand Up @@ -691,6 +689,7 @@ static int blosc_d(struct blosc_context* context, int32_t blocksize,
int doshuffle = (header_flags & BLOSC_DOSHUFFLE) && (typesize > 1);
int dobitshuffle = ((header_flags & BLOSC_DOBITSHUFFLE) &&
(blocksize >= typesize));
int compressor_version = context->compressor_version;

if (doshuffle || dobitshuffle) {
_tmp = tmp;
Expand Down Expand Up @@ -719,28 +718,38 @@ static int blosc_d(struct blosc_context* context, int32_t blocksize,
}
else {
if (compformat == BLOSC_BLOSCLZ_FORMAT) {
if (compressor_version != BLOSC_BLOSCLZ_VERSION_FORMAT)
return -9;
nbytes = blosclz_decompress(src, cbytes, _tmp, neblock);
}
#if defined(HAVE_LZ4)
else if (compformat == BLOSC_LZ4_FORMAT) {
if (compressor_version != BLOSC_LZ4_VERSION_FORMAT)
return -9;
nbytes = lz4_wrap_decompress((char *)src, (size_t)cbytes,
(char*)_tmp, (size_t)neblock);
}
#endif /* HAVE_LZ4 */
#if defined(HAVE_SNAPPY)
else if (compformat == BLOSC_SNAPPY_FORMAT) {
if (compressor_version != BLOSC_SNAPPY_VERSION_FORMAT)
return -1;
nbytes = snappy_wrap_decompress((char *)src, (size_t)cbytes,
(char*)_tmp, (size_t)neblock);
}
#endif /* HAVE_SNAPPY */
#if defined(HAVE_ZLIB)
else if (compformat == BLOSC_ZLIB_FORMAT) {
if (compressor_version != BLOSC_ZLIB_VERSION_FORMAT)
return -1;
nbytes = zlib_wrap_decompress((char *)src, (size_t)cbytes,
(char*)_tmp, (size_t)neblock);
}
#endif /* HAVE_ZLIB */
#if defined(HAVE_ZSTD)
else if (compformat == BLOSC_ZSTD_FORMAT) {
if (compressor_version != BLOSC_ZSTD_VERSION_FORMAT)
return -1;
nbytes = zstd_wrap_decompress((char*)src, (size_t)cbytes,
(char*)_tmp, (size_t)neblock);
}
Expand Down Expand Up @@ -1355,18 +1364,21 @@ int blosc_run_decompression_with_context(struct blosc_context* context,

/* Read the header block */
version = context->src[0]; /* blosc format version */
versionlz = context->src[1]; /* blosclz format version */
context->compressor_version = context->src[1];

context->header_flags = (uint8_t*)(context->src + 2); /* flags */
context->typesize = (int32_t)context->src[3]; /* typesize */
context->sourcesize = sw32_(context->src + 4); /* buffer size */
context->blocksize = sw32_(context->src + 8); /* block size */
ctbytes = sw32_(context->src + 12); /* compressed buffer size */

/* Unused values */
version += 0; /* shut up compiler warning */
versionlz += 0; /* shut up compiler warning */
ctbytes += 0; /* shut up compiler warning */
if (version != BLOSC_VERSION_FORMAT) {
/* Version from future */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May it be worth to print some informative message before giving up? Something like:

printf("Unrecognized header version %d\n", version);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like libraries that print messages, that descision should be given to the caller. Before we have a way to return error messages, I prefer silent errors. Maybe we can return -10 and document what it means in the API documentation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. However, the current approach is to print an informative message before returning a negative number. I don't think that changing this behavior in this PR in just a couple of places is a good idea. Probably the good thing to do is to detect all the places where a negative code is returned, make sure that there are no repetitions and document these codes. One can even think on creating a new API (`blosc_print_error(int)?) for more verbose output. But this would be another PR.

Copy link
Contributor Author

@kalvdans kalvdans Feb 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need at least to move the error printing out of the helper threads to avoid printing garbled output. I can do that refactor in a separate pull request with the existing error messages, and push that first, if you want.
[EDIT: I think printf has a mutex, so the output won't be garbled but it will be duplicates]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe a printf is not strictly necessary. Perhaps a char* blosc_get_error(int) would be enough. A better approach would be to define an enumerated for the different errors and some functions for returning proper messages. The Zstandard approach could be a good example to follow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea to me, the Zstd approach resembles what I've seen in other libraries. I also don't think a library should print things unless told to - what to do with the error should be the user's decision (where the user here is a programmer).

return -1;
}
if (*context->header_flags & 0x08) {
/* compressor flags from the future */
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

}

context->bstarts = (uint8_t*)(context->src + 16);
/* Compute some params */
Expand Down Expand Up @@ -1455,7 +1467,7 @@ int blosc_decompress(const void *src, void *dest, size_t destsize)
int blosc_getitem(const void *src, int start, int nitems, void *dest)
{
uint8_t *_src=NULL; /* current pos for source buffer */
uint8_t version, versionlz; /* versions for compressed header */
uint8_t version, compressor_version; /* versions for compressed header */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For being consistent with blosc convention for names(e.g. compname, compformat), I'd prefer using compversion for this variable.

uint8_t flags; /* flags for header */
int32_t ntbytes = 0; /* the number of uncompressed bytes */
int32_t nblocks; /* number of total blocks in buffer */
Expand All @@ -1474,22 +1486,21 @@ int blosc_getitem(const void *src, int start, int nitems, void *dest)

/* Read the header block */
version = _src[0]; /* blosc format version */
versionlz = _src[1]; /* blosclz format version */
compressor_version = _src[1];
flags = _src[2]; /* flags */
typesize = (int32_t)_src[3]; /* typesize */
nbytes = sw32_(_src + 4); /* buffer size */
blocksize = sw32_(_src + 8); /* block size */
ctbytes = sw32_(_src + 12); /* compressed buffer size */

if (version != BLOSC_VERSION_FORMAT)
return -9;

ebsize = blocksize + typesize * (int32_t)sizeof(int32_t);
tmp = my_malloc(blocksize + ebsize + blocksize);
tmp2 = tmp + blocksize;
tmp3 = tmp + blocksize + ebsize;

version += 0; /* shut up compiler warning */
versionlz += 0; /* shut up compiler warning */
ctbytes += 0; /* shut up compiler warning */

_src += 16;
bstarts = _src;
/* Compute some params */
Expand Down Expand Up @@ -1540,10 +1551,11 @@ int blosc_getitem(const void *src, int start, int nitems, void *dest)
cbytes = bsize2;
}
else {
struct blosc_context context;
/* blosc_d only uses typesize and flags */
struct blosc_context context = {0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Nice way to initialize struct to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It is makes the code slightly slower, but so much easier to debug!

/* Only initialize the fields blosc_d uses */
context.typesize = typesize;
context.header_flags = &flags;
context.compressor_version = compressor_version;

/* Regular decompression. Put results in tmp2. */
cbytes = blosc_d(&context, bsize, leftoverblock,
Expand Down Expand Up @@ -1969,14 +1981,12 @@ void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes,
size_t *cbytes, size_t *blocksize)
{
uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */
uint8_t version, versionlz; /* versions for compressed header */

/* Read the version info (could be useful in the future) */
version = _src[0]; /* blosc format version */
versionlz = _src[1]; /* blosclz format version */
uint8_t version = _src[0]; /* version of header */

version += 0; /* shut up compiler warning */
versionlz += 0; /* shut up compiler warning */
if (version != BLOSC_VERSION_FORMAT) {
*nbytes = *blocksize = *cbytes = 0;
return;
}

/* Read the interesting values */
*nbytes = (size_t)sw32_(_src + 4); /* uncompressed buffer size */
Expand All @@ -1990,17 +2000,16 @@ void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize,
int *flags)
{
uint8_t *_src = (uint8_t *)(cbuffer); /* current pos for source buffer */
uint8_t version, versionlz; /* versions for compressed header */

/* Read the version info (could be useful in the future) */
version = _src[0]; /* blosc format version */
versionlz = _src[1]; /* blosclz format version */
uint8_t version = _src[0]; /* version of header */

version += 0; /* shut up compiler warning */
versionlz += 0; /* shut up compiler warning */
if (version != BLOSC_VERSION_FORMAT) {
*flags = *typesize = 0;
return;
}

/* Read the interesting values */
*flags = (int)_src[2]; /* flags */
*flags = (int)_src[2] & 7; /* first three flags */
*typesize = (size_t)_src[3]; /* typesize */
}

Expand Down
15 changes: 10 additions & 5 deletions blosc/blosc.h
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,8 @@ BLOSC_EXPORT int blosc_compress_ctx(int clevel, int doshuffle, size_t typesize,

If an error occurs, e.g. the compressed data is corrupted or the
output buffer is not large enough, then 0 (zero) or a negative value
will be returned instead.
will be returned instead. If the format version is not supported by
the library, -9 is returned.

Environment variables
---------------------
Expand Down Expand Up @@ -274,7 +275,8 @@ BLOSC_EXPORT int blosc_decompress(const void *src, void *dest, size_t destsize);

If an error occurs, e.g. the compressed data is corrupted or the
output buffer is not large enough, then 0 (zero) or a negative value
will be returned instead.
will be returned instead. If the format version is not supported by
the library, -9 is returned.
*/
BLOSC_EXPORT int blosc_decompress_ctx(const void *src, void *dest,
size_t destsize, int numinternalthreads);
Expand Down Expand Up @@ -401,7 +403,8 @@ BLOSC_EXPORT int blosc_free_resources(void);
You only need to pass the first BLOSC_MIN_HEADER_LENGTH bytes of a
compressed buffer for this call to work.

This function should always succeed.
If the format is not supported by the library, all output arguments will be
filled with zeros.
*/
BLOSC_EXPORT void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes,
size_t *cbytes, size_t *blocksize);
Expand All @@ -411,16 +414,18 @@ BLOSC_EXPORT void blosc_cbuffer_sizes(const void *cbuffer, size_t *nbytes,
Return information about a compressed buffer, namely the type size
(`typesize`), as well as some internal `flags`.

The `flags` is a set of bits, where the currently used ones are:
The `flags` is a set of bits, where the used ones are:
* bit 0: whether the shuffle filter has been applied or not
* bit 1: whether the internal buffer is a pure memcpy or not
* bit 2: whether the bit shuffle filter has been applied or not

You can use the `BLOSC_DOSHUFFLE`, `BLOSC_DOBITSHUFFLE` and
`BLOSC_MEMCPYED` symbols for extracting the interesting bits
(e.g. ``flags & BLOSC_DOSHUFFLE`` says whether the buffer is
byte-shuffled or not).

This function should always succeed.
If the format is not supported by the library, all output arguments will be
filled with zeros.
*/
BLOSC_EXPORT void blosc_cbuffer_metainfo(const void *cbuffer, size_t *typesize,
int *flags);
Expand Down