Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome Nexus changes the Reference Allele and Tumor_Seq_Allele2 #30

Closed
thomasyu888 opened this issue Jan 14, 2021 · 7 comments
Closed
Assignees

Comments

@thomasyu888
Copy link

thomasyu888 commented Jan 14, 2021

  • Input
    input.txt

  • annotation-tools intermediate files I must add the .txt at the end or github won't allow me to upload these. My understanding it the input.txt.temp.annotated.txt is the output from Genome Nexus. But because the annotation-tools allows us to include a directory with a list of mafs or vcfs, it annotates each one of those files separately. processed.txt is all of these merged.
    input.txt.temp.annotated.txt
    input.txt.temp.txt

  • Processed
    processed.txt

@inodb
Copy link
Member

inodb commented Jan 14, 2021

Maybe not related but looks likeI end_position column is missing in the input? Not sure if that's the reason things go haywire tho

@thomasyu888
Copy link
Author

thomasyu888 commented Jan 14, 2021

Interesting. The input file doesn't have End_Position because we don't require centers to include it, but I made an edit to the original post to include End_Position.

@sheridancbio
Copy link
Contributor

This appears (on first glance) to be related to the issue of a shared common prefix in the alleles. A replacement of "A" with "AT" is (I think) semantically equivalent to an insertion of a "T" after the referenced "A". But the output above seems to be a bug of some sort. If a removal of a shared prefix had occurred, the offsets should have been adjusted. The first line has a shared prefix of CTTTTTTTTTTTT followed by either a T [reference] or a C [tumor]. Isn't this semantically equivalent to a SNP at the final position? So in that case , it should be a T->C SNP at position (170837513+13). I also notice that there are three input records here, but only 2 output records (post processing) I wonder what happened there. @ao508 may know these rules off the top of her head.

@thomasyu888
Copy link
Author

thomasyu888 commented Jan 15, 2021

Thanks @sheridancbio and @inodb . I made edits to my initial post - that is the complete list.

@ao508
Copy link
Contributor

ao508 commented Jan 21, 2021

@thomasyu888 Would you mind slacking me the input MAFs with all of the fields? That would make it easier for me to troubleshoot the issue with the MAF standardization script we provided.

@thomasyu888
Copy link
Author

thomasyu888 commented Jan 23, 2021

@inodb @ao508 , This is ready to look at

I took a quick glance at the annotation-tools file before it gets sent to Genome Nexus, it appears that this could be a Genome Nexus issue.

@inodb
Copy link
Member

inodb commented Mar 26, 2021

Fixed in #39

@inodb inodb closed this as completed Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants