Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome Nexus: 41 variants that have incorrectly annotated ref/alt. #34

Closed
thomasyu888 opened this issue Jan 16, 2021 · 10 comments · Fixed by #39
Closed

Genome Nexus: 41 variants that have incorrectly annotated ref/alt. #34

thomasyu888 opened this issue Jan 16, 2021 · 10 comments · Fixed by #39
Assignees
Labels

Comments

@thomasyu888
Copy link

thomasyu888 commented Jan 16, 2021

Uploaded as file for ease. ref_alt_diff.txt

@thomasyu888 thomasyu888 changed the title 41 variants that have incorrectly annotated ref/alt. Genome Nexus: 41 variants that have incorrectly annotated ref/alt. Jan 16, 2021
@inodb
Copy link
Member

inodb commented Feb 15, 2021

@thomasyu888 could you elaborate what these columns represent exactly:

CURRENT.START   CURRENT.REF     CURRENT.ALT     EXPECTED.START  EXPECTED.REF    EXPECTED.ALT

I assume that we need at least these three sets of columns to solve this issue (or something similar):

INPUT
CURRENT_OUTPUT
EXPECTED_OUTPUT

But I only see EXPECTED and CURRENT

In addition these miss end position. In my understanding the annotation-tools converts VCF to MAF and thereby computes the End_Position. It would be good to add that as well. Ideally to make the debugging easier, we would have all this:

INPUT_ANNOTATION_TOOLS
OUTPUT_ANNOTATION_TOOLS
OUTPUT_GENOME_NEXUS_ANNOTATON_PIPELINE
EXPECTED_OUTPUT_GENOME_NEXUS_ANNOTATION_PIPELINE

Could you help provide these?

@inodb inodb assigned thomasyu888 and unassigned inodb Feb 15, 2021
@thomasyu888
Copy link
Author

Ah, @inodb this is only the current output (CURRENT...) and expected output (EXPECTED...)

I didn't pull down the input. One thing to note is that if these are from maf files, they may not have end_positions. I would have to double check.

@thomasyu888
Copy link
Author

Sorry for the delay, Ill slowly add resources here. There is one variant that is a VCF. I won't upload the ref_alt_diff.txt file again, but here is.

center input: input.vcf.txt
Annotation Pipeline initial output: input.vcf.temp.txt
Genome Nexus output: input.vcf.temp.annotated.txt
Annotation Pipeline final: processed.txt

@thomasyu888
Copy link
Author

More resources - I split these up because some sites have different input files as well.

center input: input_1.txt
Annotation Pipeline initial output: input.txt.temp.txt
Genome Nexus output: input.txt.temp.annotated.txt
Annotation Pipeline final: processed.txt

@thomasyu888
Copy link
Author

More:

center input: input_2.txt
Annotation Pipeline initial output: input.txt.temp.txt
Genome Nexus output: input.txt.temp.annotated.txt
Annotation Pipeline final: processed.txt

@thomasyu888
Copy link
Author

Final:

center input: input.txt
Annotation Pipeline initial output: input.txt.temp.txt
Genome Nexus output: input.txt.temp.annotated.txt
Annotation Pipeline final: processed.txt

@ao508
Copy link
Contributor

ao508 commented Mar 25, 2021

@thomasyu888 I've started looking into this and wanted to follow up on an inconsistency I found. Idk if there are others, just that the first one I started looking at the REF/ALT are not what's reported in the diff file you shared above.

These are the POS, REF, ALT in the VCF from the input file in this comment

current pos current ref current alt
7578397 GAA TCC

But the diff file shows

current pos current ref current alt
7578397 TGG TCC

@ao508
Copy link
Contributor

ao508 commented Mar 25, 2021

I found similar inconsistencies in the other input example files as well. I'm not really sure how to approach this either because of the differing inputs

from the diff file shared:

pos ref alt
7578535 TTGTTGAGGGCAGGGGAGTAC TGAGGGCAGGGGAGTA

from the input file (input2)

pos ref alt
7578533 ATCTTGTTGAGGGCAGGGGAGTA ATTGAGGGCAGGGGAGTA

Just eye balling what this might resolve to.. which still doesn't match what the ref_alt_diff.txt has for this variant.

my assumption:

pos ref alt
7578535 CTTGTTGAGGGCAGGGGAGTA TGAGGGCAGGGGAGTA

the expected values reported in the diff file shared

pos ref alt
7578536 TGTTGAGGGCAGGGGAGTAC GAGGGCAGGGGAGTA

@thomasyu888
Copy link
Author

thomasyu888 commented Mar 25, 2021

Sorry about the confusion @ao508 . The diff file doesnt show what's in the input.txt. The diff file is supposed to match what's in processed.txt. what the collaborator is suggesting is that those are incorrect and the corrected ref/alt in the other columns of the diff file.

@ao508
Copy link
Contributor

ao508 commented Mar 25, 2021

I think I've resolved an issue that affected at least some of these variants #39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants