Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hts_parse_decimal() silently fails on non-numeric strings starting with G #1395

Closed
pd3 opened this issue Feb 11, 2022 · 2 comments · Fixed by #1400
Closed

hts_parse_decimal() silently fails on non-numeric strings starting with G #1395

pd3 opened this issue Feb 11, 2022 · 2 comments · Fixed by #1400
Assignees

Comments

@pd3
Copy link
Member

pd3 commented Feb 11, 2022

The function does not reject strings like G A, thinking it was able to parse a coordinate due to

htslib/hts.c

Line 3513 in 1c8829a

case 'g': case 'G': e += 9; s++; break;

This leads to problems reported in samtools/bcftools#1598 where region format is autodetected by first attempting to read chr beg end and, upon failure, reads chr pos. One relevant part of code that can be mislead by this is

htslib/synced_bcf_reader.c

Lines 1126 to 1130 in 1c8829a

if ( k==ifrom )
*to = hts_parse_decimal(ss, &tmp, 0);
else
*from = hts_parse_decimal(ss, &tmp, 0);
if ( ss==tmp ) return -1;

@pd3 pd3 self-assigned this Feb 11, 2022
@jmarshall
Copy link
Member

Indeed, it shouldn't be recognising G as 0 gigabases/gigabytes; those suffixes should require some digits first, e.g. 1G.

I think it would be appropriate to only do the E and K/M/G processing if a non-zero number of digits have been seen. Or pretty much equivalently, by wrapping if (s > str) around this processing.

1e+3k doesn't seem good to accept, so I would suggest merging the E/e processing into the switch, so that only one instance of any of E/K/M/G is accepted.

@pd3
Copy link
Member Author

pd3 commented Feb 14, 2022

I issued a pull request here #1396.

Although expressions like 1e+3k are ambiguous, (1e3)k vs 1e(3k), I don't mind those as much as they are interpreted the expected way (1e3)k.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants