Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MM tag preferred format for TAPS data #785

Open
jamesbaye opened this issue Aug 22, 2024 · 1 comment · May be fixed by #799
Open

MM tag preferred format for TAPS data #785

jamesbaye opened this issue Aug 22, 2024 · 1 comment · May be fixed by #799
Assignees

Comments

@jamesbaye
Copy link

Hello,

TAPS is a methyl-seq method where mC are converted into T.

In SAM spec, most of the mC examples highlight “C+m”, however C+m does not work off the shelf with TAPS data since the mC are represented in the SEQ as Ts and not Cs.

Alternatives to represent the mC mod could include T+m or N+m however these are not officially listed in the table of combinations. Is there any recommendation on which format to settle on?

Thanks,
James

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 10, 2024

Apologies for the delay. Initially because I simply didn't know, and had never heard of TAPS, but then because I sadly got distracted and forgot to go back to this.

I'd think N+m is probably the way to go. The specification states:

"Note ‘N’ may be used to match any base rather than specifically an ‘N’ call by the sequencing instrument. This may be used in situations where the base modification is not a derivation of a standard base type".

(I wouldn't trust that T+m will work as it's could cause problems with validators. I haven't tried it, but I'd be surprised if it does work given the way the base counting works and there may well be tools that explicitly check compatibility of original and modified base type.)

@jkbonfield jkbonfield self-assigned this Oct 8, 2024
jkbonfield added a commit to jkbonfield/hts-specs that referenced this issue Oct 28, 2024
The text already states that an unmodified base of N means we count
any base type, but base N code N in the table is a little misleading
as to the intention.  It was intended to mean any unspecified
modification, in the same way C+C is any unspecified C mod, but in
this case it's against all bases rather than a specific base type.

However that doesn't solve the issue of whether we can record specific
mods against any "fundamental" source base.  Clarified this by adding
an extra line to the table and some text.  (However note this doesn't
necessarily imply downstream processing tools will not do any
compatibility assessment and reject N+m when the SEQ base is a T.)

Fixes samtools#785
@jkbonfield jkbonfield linked a pull request Oct 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Progressing
Development

Successfully merging a pull request may close this issue.

2 participants