-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refuse to read future format versions #217
Conversation
Adds checks to header format version byte, to compressed data format version byte, and that reserved fields are zero.
versionlz += 0; /* shut up compiler warning */ | ||
ctbytes += 0; /* shut up compiler warning */ | ||
if (version != BLOSC_VERSION_FORMAT) { | ||
/* Version from future */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May it be worth to print some informative message before giving up? Something like:
printf("Unrecognized header version %d\n", version);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like libraries that print messages, that descision should be given to the caller. Before we have a way to return error messages, I prefer silent errors. Maybe we can return -10 and document what it means in the API documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. However, the current approach is to print an informative message before returning a negative number. I don't think that changing this behavior in this PR in just a couple of places is a good idea. Probably the good thing to do is to detect all the places where a negative code is returned, make sure that there are no repetitions and document these codes. One can even think on creating a new API (`blosc_print_error(int)?) for more verbose output. But this would be another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need at least to move the error printing out of the helper threads to avoid printing garbled output. I can do that refactor in a separate pull request with the existing error messages, and push that first, if you want.
[EDIT: I think printf has a mutex, so the output won't be garbled but it will be duplicates]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, maybe a printf is not strictly necessary. Perhaps a char* blosc_get_error(int)
would be enough. A better approach would be to define an enumerated for the different errors and some functions for returning proper messages. The Zstandard approach could be a good example to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea to me, the Zstd approach resembles what I've seen in other libraries. I also don't think a library should print things unless told to - what to do with the error should be the user's decision (where the user here is a programmer).
} | ||
if (*context->header_flags & 0x08) { | ||
/* compressor flags from the future */ | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@@ -1540,10 +1552,11 @@ int blosc_getitem(const void *src, int start, int nitems, void *dest) | |||
cbytes = bsize2; | |||
} | |||
else { | |||
struct blosc_context context; | |||
/* blosc_d only uses typesize and flags */ | |||
struct blosc_context context = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Nice way to initialize struct to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It is makes the code slightly slower, but so much easier to debug!
blosc/blosc.c
Outdated
|
||
version += 0; /* shut up compiler warning */ | ||
versionlz += 0; /* shut up compiler warning */ | ||
assert (version == BLOSC_VERSION_FORMAT); /* Should have been checked earlier */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment applicable to a code that is going to be included in the library? As it is I find it misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree the comment is misleading, I will remove the comment. Since there is no way to signal errors I'd like to keep the assert though.
blosc/blosc.c
Outdated
|
||
version += 0; /* shut up compiler warning */ | ||
versionlz += 0; /* shut up compiler warning */ | ||
assert (_src[0] == BLOSC_VERSION_FORMAT); /* Should have been checked earlier */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
I generally agree with the PR. My only suggestion is to add some explanatory messages for the user understanding the reason why a Blosc buffer cannot be decoded. |
blosc/blosc.c
Outdated
- 1: memcpy'ed? | ||
- 2: bit-shuffled? */ | ||
uint8_t* header_flags; /* Flags for header */ | ||
int versionlz; /* Compressor version byte, only used during decompression */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling it just compressor_version
? (It looks like it's used for more than just LZ-based compressors?)
@FrancescAlted, @estan: I have made version mismatch return a special value and documented that value, and renamed the variable to be algorithm-agnostic. Please review again! Also I don't know how to debug the failed test on osx. |
LGTM. I don't know about the macOS failure, but maybe it's some lingering effect of this incident? https://www.traviscistatus.com/incidents/t68jk6rj92q4 , @FrancescAlted Could you re-try the Travis job perhaps? |
Yep, I restored the failing travis job and everything passes smoothly now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the non-printing of the reason for an error (see comment), I am +1 for this one.
versionlz += 0; /* shut up compiler warning */ | ||
ctbytes += 0; /* shut up compiler warning */ | ||
if (version != BLOSC_VERSION_FORMAT) { | ||
/* Version from future */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. However, the current approach is to print an informative message before returning a negative number. I don't think that changing this behavior in this PR in just a couple of places is a good idea. Probably the good thing to do is to detect all the places where a negative code is returned, make sure that there are no repetitions and document these codes. One can even think on creating a new API (`blosc_print_error(int)?) for more verbose output. But this would be another PR.
@@ -1455,7 +1467,7 @@ int blosc_decompress(const void *src, void *dest, size_t destsize) | |||
int blosc_getitem(const void *src, int start, int nitems, void *dest) | |||
{ | |||
uint8_t *_src=NULL; /* current pos for source buffer */ | |||
uint8_t version, versionlz; /* versions for compressed header */ | |||
uint8_t version, compressor_version; /* versions for compressed header */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For being consistent with blosc convention for names(e.g. compname
, compformat
), I'd prefer using compversion
for this variable.
Superseded by #219 |
Adds checks to header format version byte, to compressed data format
version byte, and that reserved fields are zero.
This change was triggered by the format version discussions in #216 .
@FrancescAlted, @estan