Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refuse to read future format versions #217

Closed
wants to merge 2 commits into from

Conversation

kalvdans
Copy link
Contributor

Adds checks to header format version byte, to compressed data format
version byte, and that reserved fields are zero.

This change was triggered by the format version discussions in #216 .
@FrancescAlted, @estan

Adds checks to header format version byte, to compressed data format
version byte, and that reserved fields are zero.
versionlz += 0; /* shut up compiler warning */
ctbytes += 0; /* shut up compiler warning */
if (version != BLOSC_VERSION_FORMAT) {
/* Version from future */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May it be worth to print some informative message before giving up? Something like:

printf("Unrecognized header version %d\n", version);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like libraries that print messages, that descision should be given to the caller. Before we have a way to return error messages, I prefer silent errors. Maybe we can return -10 and document what it means in the API documentation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. However, the current approach is to print an informative message before returning a negative number. I don't think that changing this behavior in this PR in just a couple of places is a good idea. Probably the good thing to do is to detect all the places where a negative code is returned, make sure that there are no repetitions and document these codes. One can even think on creating a new API (`blosc_print_error(int)?) for more verbose output. But this would be another PR.

Copy link
Contributor Author

@kalvdans kalvdans Feb 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need at least to move the error printing out of the helper threads to avoid printing garbled output. I can do that refactor in a separate pull request with the existing error messages, and push that first, if you want.
[EDIT: I think printf has a mutex, so the output won't be garbled but it will be duplicates]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe a printf is not strictly necessary. Perhaps a char* blosc_get_error(int) would be enough. A better approach would be to define an enumerated for the different errors and some functions for returning proper messages. The Zstandard approach could be a good example to follow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea to me, the Zstd approach resembles what I've seen in other libraries. I also don't think a library should print things unless told to - what to do with the error should be the user's decision (where the user here is a programmer).

}
if (*context->header_flags & 0x08) {
/* compressor flags from the future */
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -1540,10 +1552,11 @@ int blosc_getitem(const void *src, int start, int nitems, void *dest)
cbytes = bsize2;
}
else {
struct blosc_context context;
/* blosc_d only uses typesize and flags */
struct blosc_context context = {0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Nice way to initialize struct to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It is makes the code slightly slower, but so much easier to debug!

blosc/blosc.c Outdated

version += 0; /* shut up compiler warning */
versionlz += 0; /* shut up compiler warning */
assert (version == BLOSC_VERSION_FORMAT); /* Should have been checked earlier */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment applicable to a code that is going to be included in the library? As it is I find it misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree the comment is misleading, I will remove the comment. Since there is no way to signal errors I'd like to keep the assert though.

blosc/blosc.c Outdated

version += 0; /* shut up compiler warning */
versionlz += 0; /* shut up compiler warning */
assert (_src[0] == BLOSC_VERSION_FORMAT); /* Should have been checked earlier */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@FrancescAlted
Copy link
Member

I generally agree with the PR. My only suggestion is to add some explanatory messages for the user understanding the reason why a Blosc buffer cannot be decoded.

blosc/blosc.c Outdated
- 1: memcpy'ed?
- 2: bit-shuffled? */
uint8_t* header_flags; /* Flags for header */
int versionlz; /* Compressor version byte, only used during decompression */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling it just compressor_version? (It looks like it's used for more than just LZ-based compressors?)

@kalvdans
Copy link
Contributor Author

@FrancescAlted, @estan: I have made version mismatch return a special value and documented that value, and renamed the variable to be algorithm-agnostic. Please review again! Also I don't know how to debug the failed test on osx.

@estan
Copy link

estan commented Feb 18, 2018

LGTM. I don't know about the macOS failure, but maybe it's some lingering effect of this incident? https://www.traviscistatus.com/incidents/t68jk6rj92q4 , @FrancescAlted Could you re-try the Travis job perhaps?

@FrancescAlted
Copy link
Member

Yep, I restored the failing travis job and everything passes smoothly now.

Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the non-printing of the reason for an error (see comment), I am +1 for this one.

versionlz += 0; /* shut up compiler warning */
ctbytes += 0; /* shut up compiler warning */
if (version != BLOSC_VERSION_FORMAT) {
/* Version from future */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. However, the current approach is to print an informative message before returning a negative number. I don't think that changing this behavior in this PR in just a couple of places is a good idea. Probably the good thing to do is to detect all the places where a negative code is returned, make sure that there are no repetitions and document these codes. One can even think on creating a new API (`blosc_print_error(int)?) for more verbose output. But this would be another PR.

@@ -1455,7 +1467,7 @@ int blosc_decompress(const void *src, void *dest, size_t destsize)
int blosc_getitem(const void *src, int start, int nitems, void *dest)
{
uint8_t *_src=NULL; /* current pos for source buffer */
uint8_t version, versionlz; /* versions for compressed header */
uint8_t version, compressor_version; /* versions for compressed header */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For being consistent with blosc convention for names(e.g. compname, compformat), I'd prefer using compversion for this variable.

@FrancescAlted
Copy link
Member

Superseded by #219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants