Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bam::Record:new should return a valid record #361

Merged
merged 13 commits into from
May 22, 2024
57 changes: 56 additions & 1 deletion src/bam/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2268,7 +2268,7 @@ CCCCCCCCCCCCCCCCCCC"[..],
where
F: Fn(&record::Record) -> Option<bool>,
{
let mut bam_reader = Reader::from_path(bamfile).unwrap(); // internal functions, just unwarp
let mut bam_reader = Reader::from_path(bamfile).unwrap(); // internal functions, just unwrap
dcroote marked this conversation as resolved.
Show resolved Hide resolved
let header = header::Header::from_template(bam_reader.header());
let mut sam_writer = Writer::from_path(samfile, &header, Format::Sam).unwrap();
for record in bam_reader.records() {
Expand Down Expand Up @@ -2818,4 +2818,59 @@ CCCCCCCCCCCCCCCCCCC"[..],
assert_eq!(header_refseqs[0].get("SN").unwrap(), "ref_1",);
assert_eq!(header_refseqs[0].get("LN").unwrap(), "10000000",);
}

#[test]
fn test_bam_new() {
// Create the path to write the tmp test BAM
let tmp = tempfile::Builder::new()
.prefix("rust-htslib")
.tempdir()
.expect("Cannot create temp dir");
//let bampath = tmp.path().join("test.bam");
//let bampath = "/tmp/test.bam";
let bampath = "/tmp/test.sam";

// write an unmapped BAM record (uBAM)
{
// Build the header
let mut header = Header::new();

// Add the version
header.push_record(
HeaderRecord::new(b"HD")
.push_tag(b"VN", &"1.6")
.push_tag(b"SO", &"unsorted"),
);

// Build the writer
//let mut writer = Writer::from_path(&bampath, &header, Format::Bam).unwrap();
let mut writer = Writer::from_path(&bampath, &header, Format::Sam).unwrap();

// Build an empty record
let mut record = Record::new();

// Write the record (this previously seg-faulted)
assert!(writer.write(&record).is_ok());
}

// Read the record
{
// Build th reader
let mut reader = Reader::from_path(&bampath).expect("Error opening file.");

// Read the record
let mut rec = Record::new();
match reader.read(&mut rec) {
Some(r) => r.expect("Failed to read record."),
None => panic!("No record read."),
};

// Check a few things
assert!(rec.is_unmapped());
assert_eq!(rec.tid(), -1);
assert_eq!(rec.pos(), -1);
assert_eq!(rec.mtid(), -1);
assert_eq!(rec.mpos(), -1);
}
}
}
15 changes: 13 additions & 2 deletions src/bam/record.rs
Original file line number Diff line number Diff line change
Expand Up @@ -113,12 +113,23 @@ fn extranul_from_qname(qname: &[u8]) -> usize {
impl Record {
/// Create an empty BAM record.
pub fn new() -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal in this PR is to that one does not need to call any other methods on Record to have a valid record. Currently, this requires setting the read name and alignment positions (and unmapped flag if the positions are -1). So this PR has an opinion, that a new record is unmapped with empty name/seq/qual/cigar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for indirectly making me think about Default trait vs new(): https://users.rust-lang.org/t/when-to-use-default-trait/11190

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record {
let mut record = Record {
inner: unsafe { MaybeUninit::zeroed().assume_init() },
own: true,
cigar: None,
header: None,
}
};
// The read/query name needs to be set as empty to properly initialize
// the record
record.set_qname(b"");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the writer would not write the record unless the query name is set to something (empty is ok)

// Developer note: these are needed so the returned record is properly
// initialized as unmapped.
record.set_unmapped();
record.set_tid(-1);
record.set_pos(-1);
record.set_mpos(-1);
record.set_mtid(-1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that we should also set the record to be unmapped by default. I mean, how can it be mapped if the tid/pos are -1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable but I'd quickly compare third party implementations (htslib/htsjdk/noodles) and/or spec footnotes (lots of gotchas there).

record
}

pub fn from_inner(from: *mut htslib::bam1_t) -> Self {
Expand Down