-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for new modified base tag #1123
Comments
Is there any update on support for this feature. The interface in pysam to support the modified base tag is very useful, but the lack of this support makes it very hard to use going forward. |
Is there any update on support for this feature? |
The lack of support for this is causing invalid results for many users attempting to parse modified base tags with pysam. Specifically with the implicit |
The “expected pattern” you pointed to in your initial comment is in HTSlib code. This code was updated to allow the new The question then would be whether the pysam Perhaps you or your employer are offering to sponsor the development of improved pysam basemod functionality. If that is not the case, I am having difficulty interpreting parts of your comment — notably Without support for this we will have to drop pysam usage for modified base analysis — as anything other than the sort of unkind vaguely threatening off-topic comment that causes maintainer demotivation and burnout and will eventually lead to you being banned from this repository. |
I apologizes if this was interpreted as a threat in any way as this was not the intent. I truly appreciate all the hard work that goes into maintaining community tools and do not wish to diminish this effort in any way. We would certainly be happy to look into contributing to the development of this feature. To the issue at hand, I understand that the tag with these specifiers is parsed and does not error given these new format tags. The issue is the returned modified base probabilities. From the SAM specification "When this flag is not present, or it is ‘.’, these bases should be assumed to have low probability of modification." The current implementation in htslib and pysam returns no entry for these skipped positions when the specification states that these positions should be assumed to have low probability of modification. The question now becomes how to implement this assumption. I would personally be happy for these probabilities to be returned as 0 (or equivalent to the 0 bin value where applicable). It would be good to discuss issues that other users or maintainers might have with this choice. Would adding this feature require a new attribute/array of the return value indicating where probabilities were inferred in the returned array? Would a dedicated choice aside from the 0 probability value make more sense? Once we've made these design decisions we can look into how best to look into implement this feature. |
See previous htslib PR here: We (the nebulous conglomerate of @jts, @jkbonfield and I) left the matter at the API not being completely broken, but did not have time to come back and make the API useful in the sense of conveying more information about implied calls. |
In that case, that may mean that establishing conventions in this area is more an htslib issue than a pysam issue.
|
The modified base tags (MM/ML) were updated not too long ago and this new tag format is breaking the current implementation. The update adds a
.
or?
to the end of the MM tag description (see section 1.7 in sam tags spec). To copy the text here the spec, the new tag is formatted asMM:Z:([ACGTUN][-+]([a-z]+|[0-9]+)[.?]?(,[0-9]+)*;)*
. Note the additional[.?]
in this as opposed to the original version. It seems that this should result in an update to this expected pattern in the code.The text was updated successfully, but these errors were encountered: