Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support optional . and ? in Mm tag #1418

Closed
cjw85 opened this issue Apr 5, 2022 · 6 comments · Fixed by #1423
Closed

Support optional . and ? in Mm tag #1418

cjw85 opened this issue Apr 5, 2022 · 6 comments · Fixed by #1423
Assignees

Comments

@cjw85
Copy link

cjw85 commented Apr 5, 2022

@jts proposed changes to the Mm/Ml tag specification which allow to explicitely record what can be assumed about bases for which no entry in the tag is made. These changes are now in the SAMtags specification, but not supported by the tag parsing code in htslib.

A check needs to be performed around here:

htslib/sam.c

Line 6175 in 9bcb2d2

// List of modification types

to see if a . or ? is present and (minimally) skip to the next character. Currently a file containing this optional information will cause the parser to trip up eventually emitting the error.

"Insufficient number of entries in ML tag"

@jkbonfield
Copy link
Contributor

It appears I started work on this last year and managed to completely forget about it! (https://github.com/jkbonfield/htslib/tree/base_mod%2B is a WIP.)

I'll resurrect it. The plan there was to extend the API to not only not choke on such data, but provide a way of programmatically distinguishing between explicit and implicitly specified base modifications.

@cjw85
Copy link
Author

cjw85 commented Apr 6, 2022

Not choking would be a good start 😄

I was wondering what a useful API should look like. The explicit ? I'd call the simple case: the caller just needs to know the results are explicit and the data is then easy to interpret.

In the case of the implicit . mode, should the API just return a zero probability for every alignment position with the corresponding canonical base in the query? Or should that be left to a caller after inspecting the state->implicit flag? Either of these seems simple enough when traversing alignments, though I'm wondering about the pileup API extension --- seems more fiddly to require a caller to rationalise the absence of data.

@jkbonfield
Copy link
Contributor

Agreed. This should probably be broken into a trivial "make it work" PR and a more complex "make it useful" one.

jkbonfield added a commit to jkbonfield/htslib that referenced this issue Apr 12, 2022
These define explicit vs implicit coordinates.  They are now part of
the MM specification, but we don't do anything with this data yet.
This PR simply permits them to be parsed without choking, and ignores
the additional markup.  A subsequent PR will improve on this.

Fixes samtools#1418
jkbonfield added a commit to jkbonfield/htslib that referenced this issue Apr 12, 2022
These define explicit vs implicit coordinates.  They are now part of
the MM specification, but we don't do anything with this data yet.
This PR simply permits them to be parsed without choking, and ignores
the additional markup.  A subsequent PR will improve on this.

Fixes samtools#1418
jkbonfield added a commit to jkbonfield/htslib that referenced this issue Apr 12, 2022
These define explicit vs implicit coordinates.  They are now part of
the MM specification, but we don't do anything with this data yet.
This PR simply permits them to be parsed without choking, and ignores
the additional markup.  A subsequent PR will improve on this.

Fixes samtools#1418
whitwham pushed a commit that referenced this issue Apr 13, 2022
These define explicit vs implicit coordinates.  They are now part of
the MM specification, but we don't do anything with this data yet.
This PR simply permits them to be parsed without choking, and ignores
the additional markup.  A subsequent PR will improve on this.

Fixes #1418
@assafgrw
Copy link

Hello,
I thinkthis tread deals with an issue i am facing:
I am facing Issues with trying to convert bam to fastq while keeping the Ml tags. Mm tags works good for me but when trying to extract the Ml tags " samtools fastq -T Ml" I get the following error: ""Problem copying aux tags: [@86c81ed2-7d81-4566-9dc0-a4e64aaba3fa Ml:B:*** Unhandled aux type ***]" Any ideas.

@jkbonfield
Copy link
Contributor

That looks like an old issue. samtools fastq in the past didn't deal with the B aux array type, but was fixed in samtools/samtools#1301 (included in version 1.11).

Upgrade and your problem should go away.

@assafgrw
Copy link

@jkbonfield Thanks! I've updated samtools and now it seems to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants