Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing the index file from a bam::IndexedReader #386

Closed
tedil opened this issue Apr 26, 2023 · 4 comments
Closed

Accessing the index file from a bam::IndexedReader #386

tedil opened this issue Apr 26, 2023 · 4 comments

Comments

@tedil
Copy link
Member

tedil commented Apr 26, 2023

I noticed that there was neither a non-unsafe way of accessing the actual index of an IndexedReader nor any "convenience" methods related to the index.

Therefore, I've tried translating samtools idxstats using the htslib FFI interface, but stumbled upon the following issue:

When accessing the index of an IndexedReader, the result is different than when loading the index manually through sam_index_load, as can be reproduced with the following code and the example BAM file and index in data.zip.

use std::ffi;

use rust_htslib::bam;
use rust_htslib::bam::{FetchDefinition, Read};
use rust_htslib::htslib::hts_idx_t;

/// Get length, mapped and unmapped read counts for each contig.
unsafe fn idxstats<R: bam::Read>(
    bam: &mut R,
    idx: *mut hts_idx_t,
) -> Vec<(String, (u64, u64, u64))> {
    bam.header()
        .target_names()
        .iter()
        .enumerate()
        .map(|(tid, tname)| {
            let (mut mapped, mut unmapped) = (0, 0);
            rust_htslib::htslib::hts_idx_get_stat(idx, tid as i32, &mut mapped, &mut unmapped);
            let tname = std::str::from_utf8(tname).unwrap();
            let tlen = bam.header().target_len(tid as u32).unwrap();
            (tname.to_string(), (tlen, mapped, unmapped))
        })
        .collect()
}

fn main() {
    let mut reader = bam::IndexedReader::from_path("data/test.bam").unwrap();
    reader.fetch(FetchDefinition::All).unwrap();

    let stats_from_indexed_reader = unsafe {
        let idx = (*reader.htsfile()).idx;
        idxstats(&mut reader, idx)
    };

    let stats_from_index = unsafe {
        let path = "data/test.bam.bai";
        let c_str = ffi::CString::new(path).unwrap();
        let htsfile = reader.htsfile();
        let idx = rust_htslib::htslib::sam_index_load(htsfile, c_str.as_ptr());
        idxstats(&mut reader, idx)
    };

    assert_eq!(stats_from_indexed_reader, stats_from_index);
}

I assume the two resulting stats being different is due to internal state, but I hoped it would at least error instead of silently always returning 0.
In any case, I think it would be good to have a safe way to retrieve information from the corresponding index (if there already is such a way, please let me know!)

@tedil tedil changed the title Accessing the index file from abam::IndexedReader Accessing the index file from a bam::IndexedReader Apr 26, 2023
@jts
Copy link
Contributor

jts commented Apr 26, 2023

I looked into this a bit and the issue is that IndexedReader has its own pointer to an hts_idx_t rather than setting the idx field of the htsfile member:

https://docs.rs/rust-htslib/latest/src/rust_htslib/bam/mod.rs.html#603

This means that accessing (*reader.htsfile().idx) will always return null. I can see two possible fixes:

  1. make an idx() member of IndexedReader that returns IndexedReader::idx
  2. redesign IndexedReader so the idx pointer is stored within htsfile and remove the idx member

I'm not familiar enough with the code to say whether 2) is a good idea but I quickly tested the first solution and your test case works.

HTH,
Jared

@tedil
Copy link
Member Author

tedil commented Apr 27, 2023

Yes, that helps indeed, thanks a lot!
While 1 is an easy fix, I think 2 is a more "proper" way.
Probably just start with option 1 for now and address option 2 later ;)

@jts
Copy link
Contributor

jts commented Apr 27, 2023

Looking at the htslib code I don't think that the idx field is meant to be accessible: https://github.com/samtools/htslib/blob/develop/htslib/hts.h#L233. I think what the library is doing is fine and option 1 should be preferred.

@tedil
Copy link
Member Author

tedil commented May 25, 2023

The index is now accessible via .index(), see #387

@tedil tedil closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants