Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check number syntax is valid in hts_parse_decimal() #1400

Merged
merged 1 commit into from
Feb 16, 2022

Conversation

jmarshall
Copy link
Member

Motivated by #1395, this alters hts_parse_decimal() to validate that the number parsed is indeed a validly presented numeric value and to have a way of reporting failure. What we want to accept and the returned strend value are carefully considered, and test cases are added accordingly.

Follows strtol() et al's conventions, setting strend to the original str (or emitting a warning if strend is NULL) if the input string does not form a valid numeric constant. In particular, require it to have a non-zero number of digits.

Merges E and k/M/G handling, so that at most one can be used. In particular, "1e+3k" is non-standard and ambiguous, and should be rejected.

Adds documentation to htslib/hts.h describing the numeric syntax parsed and the two modes of operation (as determined by whether strend is NULL or not).

Add test cases, in particular verifying that a sole "G" is rejected. Fixes #1395. For simplicity, the tests are added to test/sam.c which already contains similar API tests. If preferred, I could create a new test/hts.c containing these tests instead.

Follow strtol() et al's conventions, setting strend to the original str
(or emitting a warning) if the input string does not form a valid numeric
constant. In particular, require it to have a non-zero number of digits.

Merge E and k/M/G handling, so that at most one can be used. In particular,
"1e+3k" is non-standard and ambiguous, and should be rejected.

Document the numeric syntax parsed and the two modes of operation (as
determined by whether  strend  is non-NULL).

Add test cases, in particular verifying that a sole "G" is rejected.
@jmarshall
Copy link
Member Author

Rebased past PR #1396, which has since been merged.

IMHO this PR further improves hts_parse_decimal() over current develop, by carefully considering how the returned strend value can be used by the caller to detect failure and by considering the diagnostics that should be emitted in the strend == NULL mode of operation. In particular, current develop's early return for the !has_digit case:

  • does not emit a diagnostic (in strend == NULL mode) when no valid number has been parsed, which would be the user's only way of noticing the problem in this mode.
  • does not return a strend value that can easily be used to detect parse failure when the string begins with whitespace (as str has already been incremented past the whitespace, so the returned *strend is not equal to the caller's str).

@daviesrob
Copy link
Member

Looks good, thanks.

@daviesrob daviesrob merged commit fb82bae into samtools:develop Feb 16, 2022
@jmarshall jmarshall deleted the hts_parse_decimal_updates branch February 16, 2022 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hts_parse_decimal() silently fails on non-numeric strings starting with G
2 participants