Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new interfaces for manipulating auxilary BAM fields #17

Open
jakobnissen opened this issue Dec 15, 2018 · 0 comments
Open

Add new interfaces for manipulating auxilary BAM fields #17

jakobnissen opened this issue Dec 15, 2018 · 0 comments

Comments

@jakobnissen
Copy link
Member

Add new interfaces for manipulating auxilary BAM fields

I've been working a bit with BAM files using BioAlignments the past days and have been looking through the source code. During the work, there was a few function I found that I missed from BioAlignments which would have my work easier. They would all be quite easy to implement, and I'd love to pitch a PR with the new interfaces + tests, but being interfaces, they're hard to change once released and very annoying to get wrong. So, let's discuss them first!

Phase out AuxData, or make it view-based.

Currently, an AuxData object can be created from the auxiliary data of a BAM record. In the current implementation It necessitates a copy of the data from the record, meaning that the user can manipulate the AuxData and seemingly succeeding, but without the record actually changing. Furthermore, most interfaces (all except iterate, AFAIK) for aux fields is currently implemented to work on Vector{UInt8} or on the records directly, making the AuxData type mostly obsolete.
I suggest removing the object alltogether to avoid code duplication of functions acting on AuxData and Records and confusion about what the user is mutating. If this is not acceptable, then making the AuxData view-based.

get(record::BAM.Record, key::AbstractString, default)

Records work sort of like AbstractDict. In fact, the implemented AuxData type is a subtype of AbstractDict. But asking for a value with a default is very awkward for the user. I suggest adding this method, which is to completely mirror how it works for Dicts.

delete!(record::BAM.Record, key::AbstractString)

Actually changing the auxiliary fields of BAM.Record objects is difficult. Again, this one should simply mirror the similar method for Dicts. One could also think about adding pop!.

setindex!(record::BAM.Record, key::AbstractString, value)

This would also be useful to have. The value would be encoded and appended to the end of the data array. Probably a few different methods would have to be written, since the type of Value should only be an UInt8, Int32, Float32, String, or Vector{T} where T can be UInt8, Int8, UInt16, Int16, UInt32, Int32 or Float32.

Some new interface for making headers more easy

People can make all kinds of custom headers, but usually they just want the @SQ headers updated plus whatever non-SQ headers the program created. It's worth noting that BioAlignments make no checks for the header and records matching together, gladly writing a healthy-looking but unparsable (by htslib) BAM.file, so making it easy to not mess up the headers is rather important. I suggest making it easy to update the SQ headers. I'd like any suggestion here.
I can imagine two new constructors for SAM.Header, namely SAM.Header(h::Sam.Header, records::Vector{BAM.Record}) and SAM.Header(records::Vector{BAM.Record}), the latter creating a header with the appropriate "SQ" headers, the former with the same, but keeping any non-SQ tags already present.

@CiaranOMara CiaranOMara transferred this issue from BioJulia/BioAlignments.jl Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant