-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 438 vep annotation order #441
Conversation
…notations for VEP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few tiny things, sorry for being picky.
@@ -26,7 +26,31 @@ | |||
Step Output | |||
=========== | |||
|
|||
TODO | |||
Annotations can be done on all genes & transcripts overlapping with the variant locus, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrasing:
Users can annotate all genes ...
so it fits better with the second part of the sentence.
In the latter case, the output vcf file will only contain one annotation per variant, while | ||
in the former case, there might be over 100 annotations for each variant. | ||
|
||
The ordering of features drinving the representative annotation choice is under user control. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
driving
@@ -115,7 +139,7 @@ | |||
assembly: GRCh38 | |||
cache_version: 102 # WARNING- this must match the wrapper's vep version! | |||
tx_flag: "gencode_basic" # The flag selecting the transcripts. One of "gencode_basic", "refseq", and "merged". | |||
pick: yes # Other option: no (report one or all consequences) | |||
pick_order: ["biotype", "mane", "appris", "tsl", "ccds", "canonical", "rank", "length"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to print a constant here? If yes, one could make PICK_ORDER
a global constant and reuse it for the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default order. In check_config
, the code tests validity of all the options provided by the user in the yaml file
|
||
if selected: | ||
for criterion in criteria: | ||
codes[criterion] = get_value(criterion, fields[criterion]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a very simple thing: Why is codes changed here? It's not used afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used before... The codes store the current value for the selected (best) annotation
- The codes are all set on maximum values (so the least likely to be selected).
- For each annotation, we loop over all criteria
- Get the criterion code for the current annotation
- If the code is unknown or equal to the current annotation, continue to the next criterion
- If the code is smaller than the current annotation, then this annotation is better than the selected, and should replace it.
- If the current annotation is selected, then set all codes for this annotation, and replace the selected values for the newly selected annotation values
I hope it makes sense...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Impose user-defined annotation order for VEP.
Previously, the wrapper was using the default order, based on ENSEMBL canonical transcript.
Because of that choice, many variant were not uploaded to cBioPortal.
The default choice now prioritizes protein-coding genes with MANE transcripts.
Many variants are now recovered in cBioPortal views.