-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transcripts level #256
Comments
To create a transcript view, some parameters are needed.
{
"transcripts": {
"table": "transcripts",
"struct": {
"from_column_format": [
{
"transcripts_column": "ANN",
"transcripts_infos_column": "Feature_ID"
}
],
"from_columns_map": [
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"Ensembl_geneid",
"LIST_S2_score",
"LIST_S2_pred"
]
},
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"VARITY_R_score",
"Aloft_pred"
]
}
]
}
}
} This param is used with function
|
Calculation to add transcripts annotations as a field in INFO in JSON format. howard calculation --input="tests/data/example.ann.transcripts.vcf.gz" --output="/tmp/output.transcript.vcf" --calculations="TRANSCRIPTS_JSON" --param="config/param.transcripts.json" |
Prioritization of transcripts in 'HOWARD' mode with 'transcripts' profiles available in a configuration JSON file, with 'PZT' as prefix: "transcripts": {
...
"prioritization": {
"profiles": ["transcripts"],
"prioritization_config": "config/prioritization_transcripts_profiles.json",
"pzprefix": "PZT",
"prioritization_score_mode": "HOWARD"
}
} With prioritization parameters based on 'LIST_S2_score' (file 'config/prioritization_transcripts_profiles.json'): {
"transcripts": {
"LIST_S2_score": [
{
"type": "gt",
"value": "0.75",
"score": 10,
"flag": "PASS",
"comment": ["Very Good LIST Score"]
},
{
"type": "gt",
"value": "0.50",
"score": 10,
"flag": "PASS",
"comment": ["Good LIST Score"]
}
]
}
} Command: howard calculation --input='tests/data/example.dbnsfp.transcripts.vcf.gz' --output='/tmp/example.calculation.transcripts.vcf' --param='config/param.transcripts.json' --calculations='TRANSCRIPTS_PRIORITIZATION' Output VCF with PZTTranscript, PZTScore and PZTFlag (partial output): #CHROM POS ID REF ALT QUAL FILTER INFO
chr1 28736 . A C 100 PASS CLNSIG=pathogenic
chr1 35144 . A C 100 PASS CLNSIG=non-pathogenic
chr1 69101 . A G 100 PASS genename=OR4F5;Ensembl_transcriptid=ENST00000641515,ENST00000335137;LIST_S2_score=0.79822,0.716128;PZTTranscript=ENST00000641515;PZTScore=20;PZTFlag=PASS |
Include transcripts annotations, either in JSON format or structured format (like 'snpEff'), with calculation tool. Parameters in json file (e.g. 'config/param.transcripts.json'): {
"transcripts": {
"transcripts_info_field_json": "transcripts_json",
"transcripts_info_field_format": "transcripts_ann",
"table": "transcripts",
"struct": {...}
...
} Command: howard calculation --input='tests/data/example.ann.transcripts.vcf.gz' --output='/tmp/example.calculation.transcripts.vcf' --param='config/param.transcripts.json' --calculations='TRANSCRIPTS_ANNOTATIONS' Output VCF with 'transcripts_json' and 'transcripts_ann' INFO fields (partial output): ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=transcripts_json,Number=.,Type=String,Description="Transcripts in JSON format">
##INFO=<ID=transcripts_ann,Number=.,Type=String,Description="Transcripts annotations: 'transcript | VARITY_R_score | transcript_1 | Annotation | FeatureID | Allele | HGVSc | Aloft_pred | HGVSp | TranscriptBioType | Distance | genename | LIST_S2_score | AAposAAlength | GeneID | Ensembl_geneid | Rank | GeneName_1 | ERRORSWARNINGSINFO | FeatureType | LIST_S2_pred | CDSposCDSlength | cDNAposcDNAlength | AnnotationImpact'">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 69101 . A G 100 PASS ANN=G|missense_variant|...;genename=OR4F5;Ensembl_transcriptid=ENST00000641515,ENST00000335137;LIST_S2_score=0.79822,0.716128;transcripts_json={"ENST00000335137":{"VARITY_R_score":"0.27627227","transcript_1":"ENST00000335137","Annotation":null,"FeatureID":null,"Allele":null,"HGVSc":null,"Aloft_pred":".","HGVSp":null,"TranscriptBioType":null,"Distance":null,"genename":"OR4F5","LIST_S2_score":"0.716128","AAposAAlength":null,"GeneID":null,"Ensembl_geneid":"ENSG00000186092","Rank":null,"GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":null,"LIST_S2_pred":"T","CDSposCDSlength":null,"cDNAposcDNAlength":null,"AnnotationImpact":null},"ENST00000641515":{"VARITY_R_score":".","transcript_1":"ENST00000641515","Annotation":null,"FeatureID":null,"Allele":null,"HGVSc":null,"Aloft_pred":".","HGVSp":null,"TranscriptBioType":null,"Distance":null,"genename":"OR4F5","LIST_S2_score":"0.79822","AAposAAlength":null,"GeneID":null,"Ensembl_geneid":"ENSG00000186092","Rank":null,"GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":null,"LIST_S2_pred":"T","CDSposCDSlength":null,"cDNAposcDNAlength":null,"AnnotationImpact":null},"NM_001005484.1":{"VARITY_R_score":null,"transcript_1":"NM_001005484.1","Annotation":"missense_variant","FeatureID":"NM_001005484.1","Allele":"G","HGVSc":"c.11A>G","Aloft_pred":null,"HGVSp":"p.Glu4Gly","TranscriptBioType":"protein_coding","Distance":null,"genename":"OR4F5","LIST_S2_score":null,"AAposAAlength":"4/305","GeneID":"OR4F5","Ensembl_geneid":null,"Rank":"1/1","GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":"transcript","LIST_S2_pred":null,"CDSposCDSlength":"11/918","cDNAposcDNAlength":"11/918","AnnotationImpact":"MODERATE"}};transcripts_ann=ENST00000335137|0.27627227|ENST00000335137|||||.||||OR4F5|0.716128|||ENSG00000186092||OR4F5|||T|||,ENST00000641515|.|ENST00000641515|||||.||||OR4F5|0.79822|||ENSG00000186092||OR4F5|||T|||,NM_001005484.1||NM_001005484.1|missense_variant|NM_001005484.1|G|c.11A>G||p.Glu4Gly|protein_coding||OR4F5||4/305|OR4F5||1/1|OR4F5||transcript||11/918|11/918|MODERATE |
In order to consider also variants' annotations into transcripts prioritization, INFO column of VCF is included into the transcripts view/bubble. Thus, it is now allowed to parameterize prioritization profiles for transcripts with annotations from variants. Here is a example of a parametrization with an annotation from transcripts 'LIST_S2_score' and an annotation from variants 'CLNSIG': {
"transcripts": {
"LIST_S2_score": [
{
"type": "gt",
"value": "0.75",
"score": 10,
"flag": "PASS",
"comment": ["Very Good LIST Score"]
},
{
"type": "gt",
"value": "0.50",
"score": 10,
"flag": "PASS",
"comment": ["Good LIST Score"]
}
],
"CLNSIG": [
{
"type": "eq",
"value": "pathogenic",
"score": 100,
"flag": "PASS",
"comment": ["Pathogenic"]
}
]
}
} |
…e_view Add INFO annotations for transcripts prioritization #256
New options for transcripts prioritization: {
"profiles": ["transcripts"],
"prioritization_config": "prioritization_transcripts_profiles.json",
"prioritization_score_mode": "HOWARD",
"pzprefix": "PZT",
"pzfields": ["Score", "Flag", "spliceAI_score", "spliceAI_pred"],
"prioritization_transcripts_order": {
"PZTFlag": "ASC",
"PZTScore": "DESC",
"spliceIA_score": "DESC"
},
"prioritization_transcripts": "transcripts.tsv",
"prioritization_transcripts_force": false,
"prioritization_transcripts_version_force": false
}
|
New options to control transcript struct mapping: {
"from_column_format": [
{
"transcripts_column": "ANN",
"transcripts_infos_column": "Feature_ID",
"column_rename": {
"Gene_Name": "genename",
"Feature_ID": "THETRANSCRIPTOFSNPEFF"
},
"column_clean": true,
"column_case": null
}
],
"from_columns_map": [
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"Ensembl_geneid",
"LIST_S2_score",
"LIST_S2_pred"
],
"column_rename": {
"LIST_S2_score": "LISTScore",
"LIST_S2_pred": "LISTPred"
},
"column_clean": false,
"column_case": null
},
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"VARITY_R_score",
"Aloft_pred"
],
"column_rename": null,
"column_clean": false,
"column_case": "lower"
}
]
}
Theses options are useful to control fields names, to merge fields from multiple source (e.g. |
…e_view Add options to control transcripts view struct #256
New options to merge and map transcript IDs (e.g. from Ensembl to refSeq), in order to merge multiple-sourced annotations in transcript view. {
"table": "transcripts",
"column_id": "transcript",
"transcripts_info_json": "transcripts_json",
"transcripts_info_field": "transcripts_json",
"transcript_id_remove_version": true,
"transcript_id_mapping_file": "My_transcripts_mapping_file.tsv.gz",
"transcript_id_mapping_force": false,
"struct": {...}
Beware of transcript version in transcript mapping file, to prevent fix of transcript with and without version (or use remove version option to be consistent) Example of transcripts mapping file:
|
…e_view Add trancript mapping and filter, and manage version #256
New option to extract NOMEN from a field (e.g. "transcripts": {
"table": "transcripts",
"column_id": "transcript",
"transcripts_info_json": "transcripts_json",
"transcripts_info_field": "transcripts_json",
"transcript_id_remove_version": true,
"transcript_id_mapping_file": "transcripts.for_mapping.tsv",
"transcript_id_mapping_force": false,
"struct": {
"from_column_format": [
{
"transcripts_column": "ANN",
"transcripts_infos_column": "Feature_ID",
"column_clean": true
}
],
"from_columns_map": [
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"Ensembl_geneid",
"LIST_S2_score",
"LIST_S2_pred"
]
},
{
"transcripts_column": "Ensembl_transcriptid",
"transcripts_infos_columns": [
"genename",
"VARITY_R_score",
"Aloft_pred"
]
}
]
},
"prioritization": {
"profiles": ["transcripts"],
"prioritization_config": "prioritization_transcripts_profiles_fields_renamed.json",
"pzprefix": "PZT",
"pzfields": ["Score", "Flag", "LIST_S2_score", "LIST_S2_pred"],
"prioritization_score_mode": "HOWARD",
}
}
"calculation": {
"calculations": {
"NOMEN": {
"options"{
"hgvs_field": "snpeff_hgvs",
"transcripts": "transcripts.tsv",
"transcripts_table": "variants",
"transcripts_column": "PZTTranscript",
"transcripts_order": ["column", "file"]
}
}
}
} Within section
This option is useful to provide a NOMEN corresponding to the "best" transcript of the variant, after transcript prioritization. |
In order to extract a specific prioritize column from another prioritization profile (e.g. "prioritization": {
"profiles": ["transcripts", "transcripts2"],
"prioritization_config": "prioritization_transcripts_profiles_fields_renamed2.json",
"pzprefix": "PZT",
"pzfields": [
"Score",
"Flag",
"LIST_S2_score",
"LIST_S2_pred",
"PZTFlag_transcripts",
"PZTScore_transcripts",
"PZTFlag_transcripts2",
"PZTScore_transcripts2"
],
"prioritization_score_mode": "HOWARD"
} Example of output: #CHROM POS REF ALT transcript PZTFlag_transcripts PZTScore_transcripts PZTFlag_transcripts2 PZTScore_transcripts2
0 chr1 69101 A G NM_001005484 PASS 0 PASS 0
1 chr1 69101 A G ENST00000335137 PASS 0 PASS 0
2 chr1 28736 A C NR_036051 PASS 200 PASS 400
3 chr1 28736 A C NR_036266 PASS 200 PASS 400
4 chr1 28736 A C NR_036267 PASS 200 PASS 400
5 chr1 28736 A C NR_036268 PASS 200 PASS 400
6 chr1 28736 A C NR_024540 PASS 200 PASS 400
7 chr1 35144 A C NR_036051 PASS 100 PASS 200
8 chr1 35144 A C NR_036266 PASS 100 PASS 200
9 chr1 35144 A C NR_036267 PASS 100 PASS 200
10 chr1 35144 A C NR_036268 PASS 100 PASS 200
11 chr1 35144 A C NR_026818 PASS 100 PASS 200
12 chr1 35144 A C NR_026820 PASS 100 PASS 200
13 chr1 35144 A C NR_026822 PASS 100 PASS 200
14 chr1 768251 A G NR_047519 PASS 100 PASS 200
15 chr1 768251 A G NR_047526 PASS 100 PASS 200
16 chr1 768251 A G NR_047521 PASS 100 PASS 200
17 chr1 768251 A G NR_047523 PASS 100 PASS 200
18 chr1 768251 A G NR_047524 PASS 100 PASS 200
19 chr1 768251 A G NR_047525 PASS 100 PASS 200
20 chr1 768252 A G NR_047519 PASS 100 PASS 200
21 chr1 768252 A G NR_047526 PASS 100 PASS 200
22 chr1 768252 A G NR_047521 PASS 100 PASS 200
23 chr1 768252 A G NR_047523 PASS 100 PASS 200
24 chr1 768252 A G NR_047524 PASS 100 PASS 200
25 chr1 768252 A G NR_047525 PASS 100 PASS 200
26 chr1 768253 A G NR_047519 PASS 100 PASS 200
27 chr1 768253 A G NR_047526 PASS 100 PASS 200
28 chr1 768253 A G NR_047521 PASS 100 PASS 200
29 chr1 768253 A G NR_047523 PASS 100 PASS 200
30 chr1 768253 A G NR_047524 PASS 100 PASS 200
31 chr1 768253 A G NR_047525 PASS 100 PASS 200
32 chr7 55249063 G A NM_005228 PASS 0 PASS 0
33 chr7 55249063 G A NM_001346897 PASS 0 PASS 0
34 chr7 55249063 G A NM_001346898 PASS 0 PASS 0
35 chr7 55249063 G A NM_001346941 PASS 0 PASS 0
36 chr7 55249063 G A NM_001346899 PASS 0 PASS 0
37 chr7 55249063 G A NM_001346900 PASS 0 PASS 0
38 chr7 55249063 G A NR_047551 PASS 100 PASS 200 |
In order to explore transcripts information related to each variant, especially to calculate scores, need to create a "transcript view". It can be another table or a view (e.g. "transcripts"), which each line correspond to a transcript (i.e. multiple lines for a variant). A transcript ID column as a uniq key is needed.
TODO:
The text was updated successfully, but these errors were encountered: