-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error messages for CRAM reference mismatches. #1427
Conversation
cram/cram_decode.c
Outdated
rname, s->ref_start, s->ref_end); | ||
hts_log_error("CRAM : %s", md5_print(s->hdr->md5, M)); | ||
hts_log_error("Ref : %s", md5_print(digest, M)); | ||
kstring_t ks = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use the initialiser since it exists (also I think the kstring is being leaked):
kstring_t ks = {0}; | |
kstring_t ks = KS_INITIALIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops forgot about freeing it! Duh. Thanks.
I'll change it to use KS_INITIALIZE. I did consider it, but for some baffling reason it's the "house style" (just 10 out of 150 initialisations are done this way in htslib) so just went with the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the house style is mostly { 0, 0, NULL }
, mainly because KS_INITIALIZE
was added so late (PR #879 in 2019).
I expected that {0}
(5 current instances) would lead to a “missing initialiser for field” warning with ‑Wextra
, but it turns out there's a special case for the universal zero initializer. Learning every day… 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the {0}
case is a wonderfully useful for initialising large local arrays without having to resort to explicit memset calls. I use that quite a bit. :-)
You do need to still be aware of {{0}}
though for 2-dimensional arrays to avoid compiler warnings with older compilers (eg int a[10][20] = {0};
gives a warning with clang 5.0), even though I think initialising 2D arrays with a 1D list is legal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course technically {0}
is not the same as {0, 0, NULL}
as there is no requirement in C for NULL to be all-bits-zero, only to compare to 0 as equality. Blergh!
Practically speaking if we ever found such a system all hell would break loose, as we and others routinely make this assumption (which thankfully holds on all systems built in the last few decades).
Edit: oops after reading more I'm wrong. The missing values in an initialiser aren't set to 0-bits, but to the same value they get set when used in a static allocation. This is explicitly stated to be 0 for numerics and NULL for pointers. Obviously it's the same thing for current implementations anyway, but the pedantry wasn't true. As you say - live and learn every day :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those members without initializers are initialised in a type-appropriate way, so in fact leaving off NULL
is fine even on a weird platform. (C99 §6.7.8/21: “If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate […], the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration”, which for pointers means initialised to a platform-appropriate null pointer.)
I couldn't find anything special in the standard about {0}
beyond that, so I think “universal zero initializer” is a useful term that common practice and the implementations have come up with for for purposes of this warning.
It's thinking calloc()
or memset()
will adequately zero pointers and floating point values that is theoretically non-portable. But POSIX reassures us, at least about the pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I'd already corrected myself above.
I didn't know POSIX had finally modernised things though and were mandating that NULL can be set using memset/calloc. That's great and about time!
Not that I'm losing sleep over it. :-) So much code would fall over with non-bit-0 NULL pointers that htslib would be the least of our worries!
If the user specifies the wrong reference, the CRAM slice header MD5sum checks fail. We now report the SQ line M5 string too so it is possible to validate against the whole chr in the ref.fa file. The error message has also been improved to report the reference name instead of #num. Finally, we now hint at the likely cause, which counters the misleading samtools supplied error of "truncated or corrupt" file. See samtools/samtools#1640.
It's trying to spot error messages starting with lowercase letters, but in doing so forbids things like "@sq" as it's not capital.
deba481
to
d7cc10d
Compare
If the user specifies the wrong reference, the CRAM slice header
MD5sum checks fail. We now report the SQ line M5 string too so it is
possible to validate against the whole chr in the ref.fa file.
The error message has also been improved to report the reference name
instead of #num.
Finally, we now hint at the likely cause, which counters the
misleading samtools supplied error of "truncated or corrupt" file.
See samtools/samtools#1640.
Note we could go one step further and pull the entire chromosome out, md5sum that, and compare to the SQ line. This would then be direct evidence for the wrong reference being specified, but it seems overkill. I'm sure 99% of times it's justified to simply state it's the wrong reference anyway, or at least the first thing to check.