Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use correct return value in bcf_get_format and bcf_get_info_values #398

Merged
merged 6 commits into from
Jun 21, 2023

Conversation

Marlin-Na
Copy link
Contributor

According to the notes in htslib vcf header file:

Use the returned number of written values for accessing valid entries of dst, as ndst is only a
watermark that can be higher than the returned value, i.e. the end of dst can contain carry-over
values from previous calls to bcf_get_format_*() on lines with more values per sample.

I had a case that when accessing a format field of string type, even there was only one sample, I got two strings returned (albeit the second one was empty). With this change, the issue was fixed.

@coveralls
Copy link

coveralls commented Jun 20, 2023

Pull Request Test Coverage Report for Build 5327586180

  • 31 of 37 (83.78%) changed or added relevant lines in 1 file are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.4%) to 79.758%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/bcf/record.rs 31 37 83.78%
Files with Coverage Reduction New Missed Lines %
src/bcf/record.rs 1 78.83%
src/bam/record.rs 4 76.31%
Totals Coverage Status
Change from base Build 5324511866: 0.4%
Covered Lines: 2439
Relevant Lines: 3058

💛 - Coveralls

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add one or two test cases which show that ret is the right thing to use and also works in case of multiple strings?

@dlaehnemann
Copy link
Member

Another thing I noticed: It seems like we then simply have ret returned once as a usize and once as an i32. And indeed, most of the calls to these internal functions seem to use only one version. And in the other cases, one could simply cast it into the other type. So maybe just return Ok(Some(ret)) as is and then cast it to usize whenever that is necessary?

@Marlin-Na
Copy link
Contributor Author

Looking at the code again. I think maybe an alternative fix is to change

self.data(htslib::BCF_HT_STR).map(|(n, _)| {

in Format reader to use ret instead of n.

The Info reader actually got it correct:

data.map(|(_, ret)| {

@Marlin-Na
Copy link
Contributor Author

But float() and integer() in Format read also got it wrong. Yeah I think we should just use ret instead of n. I will implement the fix @dlaehnemann suggested.

self.data(htslib::BCF_HT_REAL).map(|(n, _)| {

@Marlin-Na
Copy link
Contributor Author

I think this PR is ready for merging.

@johanneskoester johanneskoester merged commit f9a1981 into rust-bio:master Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants