Add transcripts level #256

antonylebechec · 2024-07-31T18:04:02Z

In order to explore transcripts information related to each variant, especially to calculate scores, need to create a "transcript view". It can be another table or a view (e.g. "transcripts"), which each line correspond to a transcript (i.e. multiple lines for a variant). A transcript ID column as a uniq key is needed.

TODO:

Add multiple output PZ info (PZFlag, PZComment...)
Add additional output PZ info annotation transcript specific (such as Scores, predictions...)
Add prioritization through a list of transcripts of preference (if multiple transcript with same PZScore, or force the list independently)
Add option to by flexible with transcript version (refSeq)
Add option to order transcript by multiple columns (not only PZFlag and PZScore by default, but all available transcripts annotations)
Map refSeq/Ensembl transcript acc
Merge transcripts annotation columns (e.g. genename from multiple source/struct)
Export transcripts view as a file (TSV)
Add multiple prioritize profiles

antonylebechec · 2024-07-31T18:14:48Z

To create a transcript view, some parameters are needed.
As an example, this param identify a table to generate (transcripts), and a structure corresponding to columns dedicated to transcripts, such as :

a uniq annotation field with a specific format (from_column_format) like snpEff annotation,
a list of annotation fields corresponding to transcripts in another specific field (from_columns_map), like dbNSFP annotation

{
            "transcripts": {
                "table": "transcripts",
                "struct": {
                    "from_column_format": [
                        {
                            "transcripts_column": "ANN",
                            "transcripts_infos_column": "Feature_ID"
                        }
                    ],
                    "from_columns_map": [
                        {
                            "transcripts_column": "Ensembl_transcriptid",
                            "transcripts_infos_columns": [
                                "genename",
                                "Ensembl_geneid",
                                "LIST_S2_score",
                                "LIST_S2_pred"
                            ]
                        },
                        {
                            "transcripts_column": "Ensembl_transcriptid",
                            "transcripts_infos_columns": [
                                "genename",
                                "VARITY_R_score",
                                "Aloft_pred"
                            ]
                        }
                    ]
                }
            }
        }

This param is used with function Variants.create_transcript_view() to generate a transcripts table:

   #CHROM       POS REF ALT       transcript     transcript_1 AAposAAlength Distance Allele Aloft_pred          HGVSc  ... cDNAposcDNAlength    genename       FeatureID LIST_S2_pred ERRORSWARNINGSINFO VARITY_R_score      GeneID                          Annotation  GeneName_1        HGVSp AnnotationImpact
0    chr1     28736   A   C      NR_024540.1      NR_024540.1          None     None      C       None    n.50+585T>G  ...              None      WASH7P     NR_024540.1         None               None           None      WASH7P                      intron_variant      WASH7P         None         MODIFIER
1    chr1     28736   A   C      NR_036051.1      NR_036051.1          None   1630.0      C       None     n.-1630A>C  ...              None   MIR1302-2     NR_036051.1         None               None           None   MIR1302-2               upstream_gene_variant   MIR1302-2         None         MODIFIER
2    chr1     28736   A   C      NR_036266.1      NR_036266.1          None   1630.0      C       None     n.-1630A>C  ...              None   MIR1302-9     NR_036266.1         None               None           None   MIR1302-9               upstream_gene_variant   MIR1302-9         None         MODIFIER
3    chr1     28736   A   C      NR_036267.1      NR_036267.1          None   1630.0      C       None     n.-1630A>C  ...              None  MIR1302-10     NR_036267.1         None               None           None  MIR1302-10               upstream_gene_variant  MIR1302-10         None         MODIFIER
4    chr1     28736   A   C      NR_036268.1      NR_036268.1          None   1630.0      C       None     n.-1630A>C  ...              None  MIR1302-11     NR_036268.1         None               None           None  MIR1302-11               upstream_gene_variant  MIR1302-11         None         MODIFIER
5    chr1     35144   A   C      NR_026818.1      NR_026818.1          None     None      C       None       n.597T>G  ...              None     FAM138A     NR_026818.1         None               None           None     FAM138A  non_coding_transcript_exon_variant     FAM138A         None         MODIFIER
6    chr1     35144   A   C      NR_026820.1      NR_026820.1          None     None      C       None       n.597T>G  ...              None     FAM138F     NR_026820.1         None               None           None     FAM138F  non_coding_transcript_exon_variant     FAM138F         None         MODIFIER
7    chr1     35144   A   C      NR_026822.1      NR_026822.1          None     None      C       None       n.597T>G  ...              None     FAM138C     NR_026822.1         None               None           None     FAM138C  non_coding_transcript_exon_variant     FAM138C         None         MODIFIER
8    chr1     35144   A   C      NR_036051.1      NR_036051.1          None   4641.0      C       None     n.*4641A>C  ...              None   MIR1302-2     NR_036051.1         None               None           None   MIR1302-2             downstream_gene_variant   MIR1302-2         None         MODIFIER
9    chr1     35144   A   C      NR_036266.1      NR_036266.1          None   4641.0      C       None     n.*4641A>C  ...              None   MIR1302-9     NR_036266.1         None               None           None   MIR1302-9             downstream_gene_variant   MIR1302-9         None         MODIFIER
10   chr1     35144   A   C      NR_036267.1      NR_036267.1          None   4641.0      C       None     n.*4641A>C  ...              None  MIR1302-10     NR_036267.1         None               None           None  MIR1302-10             downstream_gene_variant  MIR1302-10         None         MODIFIER
11   chr1     35144   A   C      NR_036268.1      NR_036268.1          None   4641.0      C       None     n.*4641A>C  ...              None  MIR1302-11     NR_036268.1         None               None           None  MIR1302-11             downstream_gene_variant  MIR1302-11         None         MODIFIER
12   chr1     69101   A   G  ENST00000335137  ENST00000335137          None     None   None          .           None  ...              None       OR4F5            None            T               None     0.27627227        None                                None       OR4F5         None             None
13   chr1     69101   A   G  ENST00000641515  ENST00000641515          None     None   None          .           None  ...              None       OR4F5            None            T               None              .        None                                None       OR4F5         None             None
14   chr1     69101   A   G   NM_001005484.1   NM_001005484.1         4/305     None      G       None        c.11A>G  ...            11/918       OR4F5  NM_001005484.1         None               None           None       OR4F5                    missense_variant       OR4F5    p.Glu4Gly         MODERATE
15   chr1    768251   A   G      NR_047519.1      NR_047519.1          None     None      G       None  n.287+3767A>G  ...              None   LINC01128     NR_047519.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
16   chr1    768251   A   G      NR_047521.1      NR_047521.1          None     None      G       None  n.287+3767A>G  ...              None   LINC01128     NR_047521.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
17   chr1    768251   A   G      NR_047523.1      NR_047523.1          None     None      G       None  n.287+3767A>G  ...              None   LINC01128     NR_047523.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
18   chr1    768251   A   G      NR_047524.1      NR_047524.1          None     None      G       None  n.287+3767A>G  ...              None   LINC01128     NR_047524.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
19   chr1    768251   A   G      NR_047525.1      NR_047525.1          None     None      G       None  n.154+3767A>G  ...              None   LINC01128     NR_047525.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
20   chr1    768251   A   G      NR_047526.1      NR_047526.1          None     None      G       None  n.287+3767A>G  ...              None   LINC01128     NR_047526.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
21   chr1    768252   A   G      NR_047519.1      NR_047519.1          None     None      G       None  n.287+3768A>G  ...              None   LINC01128     NR_047519.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
22   chr1    768252   A   G      NR_047521.1      NR_047521.1          None     None      G       None  n.287+3768A>G  ...              None   LINC01128     NR_047521.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
23   chr1    768252   A   G      NR_047523.1      NR_047523.1          None     None      G       None  n.287+3768A>G  ...              None   LINC01128     NR_047523.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
24   chr1    768252   A   G      NR_047524.1      NR_047524.1          None     None      G       None  n.287+3768A>G  ...              None   LINC01128     NR_047524.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
25   chr1    768252   A   G      NR_047525.1      NR_047525.1          None     None      G       None  n.154+3768A>G  ...              None   LINC01128     NR_047525.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
26   chr1    768252   A   G      NR_047526.1      NR_047526.1          None     None      G       None  n.287+3768A>G  ...              None   LINC01128     NR_047526.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
27   chr1    768253   A   G      NR_047519.1      NR_047519.1          None     None      G       None  n.287+3769A>G  ...              None   LINC01128     NR_047519.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
28   chr1    768253   A   G      NR_047521.1      NR_047521.1          None     None      G       None  n.287+3769A>G  ...              None   LINC01128     NR_047521.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
29   chr1    768253   A   G      NR_047523.1      NR_047523.1          None     None      G       None  n.287+3769A>G  ...              None   LINC01128     NR_047523.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
30   chr1    768253   A   G      NR_047524.1      NR_047524.1          None     None      G       None  n.287+3769A>G  ...              None   LINC01128     NR_047524.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
31   chr1    768253   A   G      NR_047525.1      NR_047525.1          None     None      G       None  n.154+3769A>G  ...              None   LINC01128     NR_047525.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
32   chr1    768253   A   G      NR_047526.1      NR_047526.1          None     None      G       None  n.287+3769A>G  ...              None   LINC01128     NR_047526.1         None               None           None   LINC01128                      intron_variant   LINC01128         None         MODIFIER
33   chr7  55249063   G   A   NM_001346897.2   NM_001346897.2      742/1091     None      A       None      c.2226G>A  ...         2487/3848        EGFR  NM_001346897.2         None               None           None        EGFR                  synonymous_variant        EGFR  p.Gln742Gln              LOW
34   chr7  55249063   G   A   NM_001346898.2   NM_001346898.2      787/1136     None      A       None      c.2361G>A  ...         2622/3983        EGFR  NM_001346898.2         None               None           None        EGFR                  synonymous_variant        EGFR  p.Gln787Gln              LOW
35   chr7  55249063   G   A   NM_001346899.1   NM_001346899.1      742/1165     None      A       None      c.2226G>A  ...         2483/6218        EGFR  NM_001346899.1         None               None           None        EGFR                  synonymous_variant        EGFR  p.Gln742Gln              LOW
36   chr7  55249063   G   A   NM_001346900.2   NM_001346900.2      734/1157     None      A       None      c.2202G>A  ...         2393/9676        EGFR  NM_001346900.2         None               None           None        EGFR                  synonymous_variant        EGFR  p.Gln734Gln              LOW
37   chr7  55249063   G   A   NM_001346941.2   NM_001346941.2       520/943     None      A       None      c.1560G>A  ...         1821/9104        EGFR  NM_001346941.2         None               None           None        EGFR                  synonymous_variant        EGFR  p.Gln520Gln              LOW
38   chr7  55249063   G   A      NM_005228.5      NM_005228.5      787/1210     None      A       None      c.2361G>A  ...         2622/9905        EGFR     NM_005228.5         None               None           None        EGFR                  synonymous_variant        EGFR  p.Gln787Gln              LOW
39   chr7  55249063   G   A      NR_047551.1      NR_047551.1          None     None      A       None      n.1201C>T  ...              None    EGFR-AS1     NR_047551.1         None               None           None    EGFR-AS1  non_coding_transcript_exon_variant    EGFR-AS1         None         MODIFIER

…#4

antonylebechec · 2024-08-01T21:38:13Z

Calculation to add transcripts annotations as a field in INFO in JSON format.
Example (create config/param.transcripts.json with param from help):

howard calculation --input="tests/data/example.ann.transcripts.vcf.gz" --output="/tmp/output.transcript.vcf" --calculations="TRANSCRIPTS_JSON" --param="config/param.transcripts.json"

antonylebechec · 2024-08-28T09:35:13Z

Prioritization of transcripts in 'HOWARD' mode with 'transcripts' profiles available in a configuration JSON file, with 'PZT' as prefix:

"transcripts": {
  ...
  "prioritization": {
     "profiles": ["transcripts"],
     "prioritization_config": "config/prioritization_transcripts_profiles.json",
     "pzprefix": "PZT",
     "prioritization_score_mode": "HOWARD"
  }
}

With prioritization parameters based on 'LIST_S2_score' (file 'config/prioritization_transcripts_profiles.json'):

{
  "transcripts": {
    "LIST_S2_score": [
      {
        "type": "gt",
        "value": "0.75",
        "score": 10,
        "flag": "PASS",
        "comment": ["Very Good LIST Score"]
      },
      {
        "type": "gt",
        "value": "0.50",
        "score": 10,
        "flag": "PASS",
        "comment": ["Good LIST Score"]
      }
    ]
  }
}

Command:

howard calculation --input='tests/data/example.dbnsfp.transcripts.vcf.gz' --output='/tmp/example.calculation.transcripts.vcf' --param='config/param.transcripts.json' --calculations='TRANSCRIPTS_PRIORITIZATION'

Output VCF with PZTTranscript, PZTScore and PZTFlag (partial output):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    28736   .       A       C       100     PASS    CLNSIG=pathogenic
chr1    35144   .       A       C       100     PASS    CLNSIG=non-pathogenic
chr1    69101   .       A       G       100     PASS    genename=OR4F5;Ensembl_transcriptid=ENST00000641515,ENST00000335137;LIST_S2_score=0.79822,0.716128;PZTTranscript=ENST00000641515;PZTScore=20;PZTFlag=PASS

antonylebechec · 2024-08-28T16:39:17Z

Include transcripts annotations, either in JSON format or structured format (like 'snpEff'), with calculation tool.

Parameters in json file (e.g. 'config/param.transcripts.json'):

{
  "transcripts": {
    "transcripts_info_field_json": "transcripts_json",
    "transcripts_info_field_format": "transcripts_ann",
    "table": "transcripts",
    "struct": {...}
    ...
}

Command:

howard calculation --input='tests/data/example.ann.transcripts.vcf.gz' --output='/tmp/example.calculation.transcripts.vcf' --param='config/param.transcripts.json' --calculations='TRANSCRIPTS_ANNOTATIONS'

Output VCF with 'transcripts_json' and 'transcripts_ann' INFO fields (partial output):

##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=transcripts_json,Number=.,Type=String,Description="Transcripts in JSON format">
##INFO=<ID=transcripts_ann,Number=.,Type=String,Description="Transcripts annotations: 'transcript | VARITY_R_score | transcript_1 | Annotation | FeatureID | Allele | HGVSc | Aloft_pred | HGVSp | TranscriptBioType | Distance | genename | LIST_S2_score | AAposAAlength | GeneID | Ensembl_geneid | Rank | GeneName_1 | ERRORSWARNINGSINFO | FeatureType | LIST_S2_pred | CDSposCDSlength | cDNAposcDNAlength | AnnotationImpact'">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    69101   .       A       G       100     PASS    ANN=G|missense_variant|...;genename=OR4F5;Ensembl_transcriptid=ENST00000641515,ENST00000335137;LIST_S2_score=0.79822,0.716128;transcripts_json={"ENST00000335137":{"VARITY_R_score":"0.27627227","transcript_1":"ENST00000335137","Annotation":null,"FeatureID":null,"Allele":null,"HGVSc":null,"Aloft_pred":".","HGVSp":null,"TranscriptBioType":null,"Distance":null,"genename":"OR4F5","LIST_S2_score":"0.716128","AAposAAlength":null,"GeneID":null,"Ensembl_geneid":"ENSG00000186092","Rank":null,"GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":null,"LIST_S2_pred":"T","CDSposCDSlength":null,"cDNAposcDNAlength":null,"AnnotationImpact":null},"ENST00000641515":{"VARITY_R_score":".","transcript_1":"ENST00000641515","Annotation":null,"FeatureID":null,"Allele":null,"HGVSc":null,"Aloft_pred":".","HGVSp":null,"TranscriptBioType":null,"Distance":null,"genename":"OR4F5","LIST_S2_score":"0.79822","AAposAAlength":null,"GeneID":null,"Ensembl_geneid":"ENSG00000186092","Rank":null,"GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":null,"LIST_S2_pred":"T","CDSposCDSlength":null,"cDNAposcDNAlength":null,"AnnotationImpact":null},"NM_001005484.1":{"VARITY_R_score":null,"transcript_1":"NM_001005484.1","Annotation":"missense_variant","FeatureID":"NM_001005484.1","Allele":"G","HGVSc":"c.11A>G","Aloft_pred":null,"HGVSp":"p.Glu4Gly","TranscriptBioType":"protein_coding","Distance":null,"genename":"OR4F5","LIST_S2_score":null,"AAposAAlength":"4/305","GeneID":"OR4F5","Ensembl_geneid":null,"Rank":"1/1","GeneName_1":"OR4F5","ERRORSWARNINGSINFO":null,"FeatureType":"transcript","LIST_S2_pred":null,"CDSposCDSlength":"11/918","cDNAposcDNAlength":"11/918","AnnotationImpact":"MODERATE"}};transcripts_ann=ENST00000335137|0.27627227|ENST00000335137|||||.||||OR4F5|0.716128|||ENSG00000186092||OR4F5|||T|||,ENST00000641515|.|ENST00000641515|||||.||||OR4F5|0.79822|||ENSG00000186092||OR4F5|||T|||,NM_001005484.1||NM_001005484.1|missense_variant|NM_001005484.1|G|c.11A>G||p.Glu4Gly|protein_coding||OR4F5||4/305|OR4F5||1/1|OR4F5||transcript||11/918|11/918|MODERATE

… and docs #4

antonylebechec · 2024-08-29T16:47:37Z

In order to consider also variants' annotations into transcripts prioritization, INFO column of VCF is included into the transcripts view/bubble. Thus, it is now allowed to parameterize prioritization profiles for transcripts with annotations from variants.

Here is a example of a parametrization with an annotation from transcripts 'LIST_S2_score' and an annotation from variants 'CLNSIG':

{
  "transcripts": {
    "LIST_S2_score": [
      {
        "type": "gt",
        "value": "0.75",
        "score": 10,
        "flag": "PASS",
        "comment": ["Very Good LIST Score"]
      },
      {
        "type": "gt",
        "value": "0.50",
        "score": 10,
        "flag": "PASS",
        "comment": ["Good LIST Score"]
      }
    ],
    "CLNSIG": [
      {
        "type": "eq",
        "value": "pathogenic",
        "score": 100,
        "flag": "PASS",
        "comment": ["Pathogenic"]
      }
    ]
  }
}

…e_view Add INFO annotations for transcripts prioritization #256

#273

antonylebechec · 2024-09-19T16:50:17Z

New options for transcripts prioritization:

{
    "profiles": ["transcripts"],
    "prioritization_config": "prioritization_transcripts_profiles.json",
    "prioritization_score_mode": "HOWARD",
    "pzprefix": "PZT",
    "pzfields": ["Score", "Flag", "spliceAI_score", "spliceAI_pred"],
    "prioritization_transcripts_order": {
        "PZTFlag": "ASC",
        "PZTScore": "DESC",
        "spliceIA_score": "DESC"
    },
    "prioritization_transcripts": "transcripts.tsv",
    "prioritization_transcripts_force": false,
    "prioritization_transcripts_version_force": false
}

Sections profiles and prioritization_config and prioritization_score_mode define prioritization criteria.
Sections pzprefix and pzfields define specific INFO/tags to add in VCF, specific to the prioritized/chosen transcripts (e.g. PZTScore, PZTFlag, PZTspliceAI_score, PZTspliceAI_pred).
Section prioritization_transcripts_order defines the order of transcripts to determine which one is chosen (by default only PZTFlag and PZTScore). All available annotation can be used (e.g. scores, length of transcript, predictions...)
Sections prioritization_transcripts and prioritization_transcripts_force and prioritization_transcripts_version_force determine a list of transcript of preference, in case of equal order (usually PZTScore), or by forcing the order, and by forcing to consider transcript version (useful for refSeq version)

antonylebechec · 2024-09-20T16:26:56Z

New options to control transcript struct mapping:

{
    "from_column_format": [
        {
            "transcripts_column": "ANN",
            "transcripts_infos_column": "Feature_ID",
            "column_rename": {
                "Gene_Name": "genename",
                "Feature_ID": "THETRANSCRIPTOFSNPEFF"
            },
            "column_clean": true,
            "column_case": null
        }
    ],
    "from_columns_map": [
        {
            "transcripts_column": "Ensembl_transcriptid",
            "transcripts_infos_columns": [
                "genename",
                "Ensembl_geneid",
                "LIST_S2_score",
                "LIST_S2_pred"
            ],
            "column_rename": {
                "LIST_S2_score": "LISTScore",
                "LIST_S2_pred": "LISTPred"
            },
            "column_clean": false,
            "column_case": null
        },
        {
            "transcripts_column": "Ensembl_transcriptid",
            "transcripts_infos_columns": [
                "genename",
                "VARITY_R_score",
                "Aloft_pred"
            ],
            "column_rename": null,
            "column_clean": false,
            "column_case": "lower"
        }
    ]
}

Section column_rename rename columns/fields
Section column_clean clean columns/fields names to remove all special characters (especially . in snpEff annotations)
Section column_case change case of columns/fields into loweror upper

Theses options are useful to control fields names, to merge fields from multiple source (e.g. genename). All these options are processed (i.e. combinaison of rename and clean and case). Beware of prioritization parameters that will take into account these name changing.

…e_view Add options to control transcripts view struct #256

antonylebechec · 2024-09-23T14:14:08Z

New options to merge and map transcript IDs (e.g. from Ensembl to refSeq), in order to merge multiple-sourced annotations in transcript view.

{
            "table": "transcripts",
            "column_id": "transcript",
            "transcripts_info_json": "transcripts_json",
            "transcripts_info_field": "transcripts_json",
            "transcript_id_remove_version": true,
            "transcript_id_mapping_file": "My_transcripts_mapping_file.tsv.gz",
            "transcript_id_mapping_force": false,
            "struct": {...}

Section transcript_id_remove_version remove possible version of transcript (NM_123456.2 to NM_123456)
Section transcript_id_mapping_file indicate a transcript mapping file that provides mapping between transcripts IDs
Section transcript_id_mapping_force allows to filter transcript IDs only if they are present in the transcript mapping file

Beware of transcript version in transcript mapping file, to prevent fix of transcript with and without version (or use remove version option to be consistent)

Example of transcripts mapping file:

NM_001005484	ENST00000641515.1
NR_024540
NR_036266
NM_001346900
NM_001346897
NR_047551
NM_001346941.2
NM_005228

Example of transcripts view with these options:

…e_view Add trancript mapping and filter, and manage version #256

antonylebechec · 2024-09-24T13:38:23Z

New option to export transcripts view as aa file.

"export": {
   "output": "/tmp/output.tsv.gz"
}

Section output define export file path, in multiple format (TSV, VCF, Parquet...)

Example of output file in TSV:

Example of output file in VCF:

Example of command line:

howard calculation --input="tests/data/example.ann.transcripts.vcf.gz" --output="/tmp/example.calculation.transcripts.tsv" --param="tests/data/param.transcripts.json" --calculations='TRANSCRIPTS_ANNOTATIONS,TRANSCRIPTS_PRIORITIZATION,TRANSCRIPTS_EXPORT'

antonylebechec · 2024-09-24T16:48:11Z

New option to extract NOMEN from a field (e.g. hgvs, 'snpeff_hgvs`) with a dynamic table.column transcript list (e.g. from annotation, prioritization) rather than a list of transcripts list file.

"transcripts": {
    "table": "transcripts",
    "column_id": "transcript",
    "transcripts_info_json": "transcripts_json",
    "transcripts_info_field": "transcripts_json",
    "transcript_id_remove_version": true,
    "transcript_id_mapping_file": "transcripts.for_mapping.tsv",
    "transcript_id_mapping_force": false,
    "struct": {
        "from_column_format": [
            {
                "transcripts_column": "ANN",
                "transcripts_infos_column": "Feature_ID",
                "column_clean": true
            }
        ],
        "from_columns_map": [
            {
                "transcripts_column": "Ensembl_transcriptid",
                "transcripts_infos_columns": [
                    "genename",
                    "Ensembl_geneid",
                    "LIST_S2_score",
                    "LIST_S2_pred"
                ]
            },
            {
                "transcripts_column": "Ensembl_transcriptid",
                "transcripts_infos_columns": [
                    "genename",
                    "VARITY_R_score",
                    "Aloft_pred"
                ]
            }
        ]
    },
    "prioritization": {
        "profiles": ["transcripts"],
        "prioritization_config": "prioritization_transcripts_profiles_fields_renamed.json",
        "pzprefix": "PZT",
        "pzfields": ["Score", "Flag", "LIST_S2_score", "LIST_S2_pred"],
        "prioritization_score_mode": "HOWARD",
    }
}
"calculation":  {
   "calculations":  {
      "NOMEN": {
         "options"{
                "hgvs_field": "snpeff_hgvs",
                "transcripts": "transcripts.tsv",
                "transcripts_table": "variants",
                "transcripts_column": "PZTTranscript",
                "transcripts_order": ["column", "file"]
            }
         }
      }
   }

Within section calculation::calculations::NOMEN::options:

Section hgvs_field is the column with all HGVS annotation (multiple NOMEN
Section transcripts is a file with a list of transcripts of preference (by order)
Sections transcripts_table and transcripts_column define where transcripts for each variant are defined (usually after a transcript prioritization)
Section transcripts_order is the order to consider lists (by default dynamic column first, then file)

This option is useful to provide a NOMEN corresponding to the "best" transcript of the variant, after transcript prioritization.
Beware of transcripts mapping, espacialy between refSeq and Ensembl. This will result that prioritized transcript should not be available in the HGVS column (usually, transcript annotation with a database such as 'dbNSFP' with Ensembl transcript source, and transcript annotation for HGVS with snpEff tools with refSeq transcript source)

antonylebechec · 2024-09-24T18:05:44Z

In order to extract a specific prioritize column from another prioritization profile (e.g. transcripts2), add the field with prefix in fields section (e.g. PZTScore_transcripts2.

"prioritization": {
    "profiles": ["transcripts", "transcripts2"],
    "prioritization_config": "prioritization_transcripts_profiles_fields_renamed2.json",
    "pzprefix": "PZT",
    "pzfields": [
        "Score",
        "Flag",
        "LIST_S2_score",
        "LIST_S2_pred",
        "PZTFlag_transcripts",
        "PZTScore_transcripts",
        "PZTFlag_transcripts2",
        "PZTScore_transcripts2"
    ],
    "prioritization_score_mode": "HOWARD"
}

Example of output:

   #CHROM       POS REF ALT       transcript PZTFlag_transcripts  PZTScore_transcripts PZTFlag_transcripts2  PZTScore_transcripts2
0    chr1     69101   A   G     NM_001005484                PASS                     0                 PASS                      0
1    chr1     69101   A   G  ENST00000335137                PASS                     0                 PASS                      0
2    chr1     28736   A   C        NR_036051                PASS                   200                 PASS                    400
3    chr1     28736   A   C        NR_036266                PASS                   200                 PASS                    400
4    chr1     28736   A   C        NR_036267                PASS                   200                 PASS                    400
5    chr1     28736   A   C        NR_036268                PASS                   200                 PASS                    400
6    chr1     28736   A   C        NR_024540                PASS                   200                 PASS                    400
7    chr1     35144   A   C        NR_036051                PASS                   100                 PASS                    200
8    chr1     35144   A   C        NR_036266                PASS                   100                 PASS                    200
9    chr1     35144   A   C        NR_036267                PASS                   100                 PASS                    200
10   chr1     35144   A   C        NR_036268                PASS                   100                 PASS                    200
11   chr1     35144   A   C        NR_026818                PASS                   100                 PASS                    200
12   chr1     35144   A   C        NR_026820                PASS                   100                 PASS                    200
13   chr1     35144   A   C        NR_026822                PASS                   100                 PASS                    200
14   chr1    768251   A   G        NR_047519                PASS                   100                 PASS                    200
15   chr1    768251   A   G        NR_047526                PASS                   100                 PASS                    200
16   chr1    768251   A   G        NR_047521                PASS                   100                 PASS                    200
17   chr1    768251   A   G        NR_047523                PASS                   100                 PASS                    200
18   chr1    768251   A   G        NR_047524                PASS                   100                 PASS                    200
19   chr1    768251   A   G        NR_047525                PASS                   100                 PASS                    200
20   chr1    768252   A   G        NR_047519                PASS                   100                 PASS                    200
21   chr1    768252   A   G        NR_047526                PASS                   100                 PASS                    200
22   chr1    768252   A   G        NR_047521                PASS                   100                 PASS                    200
23   chr1    768252   A   G        NR_047523                PASS                   100                 PASS                    200
24   chr1    768252   A   G        NR_047524                PASS                   100                 PASS                    200
25   chr1    768252   A   G        NR_047525                PASS                   100                 PASS                    200
26   chr1    768253   A   G        NR_047519                PASS                   100                 PASS                    200
27   chr1    768253   A   G        NR_047526                PASS                   100                 PASS                    200
28   chr1    768253   A   G        NR_047521                PASS                   100                 PASS                    200
29   chr1    768253   A   G        NR_047523                PASS                   100                 PASS                    200
30   chr1    768253   A   G        NR_047524                PASS                   100                 PASS                    200
31   chr1    768253   A   G        NR_047525                PASS                   100                 PASS                    200
32   chr7  55249063   G   A        NM_005228                PASS                     0                 PASS                      0
33   chr7  55249063   G   A     NM_001346897                PASS                     0                 PASS                      0
34   chr7  55249063   G   A     NM_001346898                PASS                     0                 PASS                      0
35   chr7  55249063   G   A     NM_001346941                PASS                     0                 PASS                      0
36   chr7  55249063   G   A     NM_001346899                PASS                     0                 PASS                      0
37   chr7  55249063   G   A     NM_001346900                PASS                     0                 PASS                      0
38   chr7  55249063   G   A        NR_047551                PASS                   100                 PASS                    200

Add transcripts options #256 into parameters help docs #4

antonylebechec added the enhancement New feature or request label Jul 31, 2024

antonylebechec self-assigned this Jul 31, 2024

bioinfo-chru-strasbourg pushed a commit that referenced this issue Jul 31, 2024

add transcript view #256, first step generating a transcript table

937b97d

antonylebechec assigned JbaptisteLam Jul 31, 2024

bioinfo-chru-strasbourg pushed a commit that referenced this issue Aug 1, 2024

transcripts view #256 calculation transcripts json on INFO column, doc …

f47e2cf

…#4

bioinfo-chru-strasbourg pushed a commit that referenced this issue Aug 28, 2024

Add transcript prioritization #256

49206a8

bioinfo-chru-strasbourg pushed a commit that referenced this issue Aug 28, 2024

Add transcript prioritization #256

33bed01

bioinfo-chru-strasbourg pushed a commit that referenced this issue Aug 28, 2024

Add transcript prioritization #256 and docs #4

10eba80

bioinfo-chru-strasbourg pushed a commit that referenced this issue Aug 28, 2024

Add include transcripts annotations INFO field in strutured format #256…

bee3859

… and docs #4

bioinfo-chru-strasbourg pushed a commit that referenced this issue Aug 29, 2024

Add INFO annotations for transcripts prioritization #256

b035a98

antonylebechec added a commit that referenced this issue Aug 29, 2024

Merge pull request #262 from bioinfo-chru-strasbourg/transcript_bubbl…

ef5a4a4

…e_view Add INFO annotations for transcripts prioritization #256

JbaptisteLam added this to the Transcript handling milestone Sep 11, 2024

bioinfo-chru-strasbourg pushed a commit that referenced this issue Sep 18, 2024

Add transcript prioritization options, such as add fields, order by #256

dea4173

#273

bioinfo-chru-strasbourg pushed a commit that referenced this issue Sep 19, 2024

New options for transcripts prioritization #256

8d26b9d

bioinfo-chru-strasbourg pushed a commit that referenced this issue Sep 20, 2024

Add options to control transcripts view struct #256

9efb7c5

antonylebechec added a commit that referenced this issue Sep 20, 2024

Merge pull request #276 from bioinfo-chru-strasbourg/transcript_bubbl…

7ee30f9

…e_view Add options to control transcripts view struct #256

bioinfo-chru-strasbourg pushed a commit that referenced this issue Sep 23, 2024

Add trancript mapping and filter, and manage version #256

9471f5f

antonylebechec added a commit that referenced this issue Sep 23, 2024

Merge pull request #277 from bioinfo-chru-strasbourg/transcript_bubbl…

d27578b

…e_view Add trancript mapping and filter, and manage version #256

antonylebechec mentioned this issue Sep 23, 2024

Match between Ensembl / Refseq transcript during prioritization step #274

Closed

bioinfo-chru-strasbourg pushed a commit that referenced this issue Sep 24, 2024

Add transcript export file #256

9d6046e

bioinfo-chru-strasbourg pushed a commit that referenced this issue Sep 24, 2024

Add dynamic transcript for NOMEN extraction/annotation #256

e0ee58c

antonylebechec mentioned this issue Sep 25, 2024

Export only prioritize transcript annotation #273

Closed

antonylebechec added a commit that referenced this issue Oct 7, 2024

Merge pull request #285 from bioinfo-chru-strasbourg/docs_and_help

15830ee

Add transcripts options #256 into parameters help docs #4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transcripts level #256

Add transcripts level #256

antonylebechec commented Jul 31, 2024 •

edited

Loading

antonylebechec commented Jul 31, 2024

antonylebechec commented Aug 1, 2024

antonylebechec commented Aug 28, 2024

antonylebechec commented Aug 28, 2024

antonylebechec commented Aug 29, 2024

antonylebechec commented Sep 19, 2024

antonylebechec commented Sep 20, 2024

antonylebechec commented Sep 23, 2024 •

edited

Loading

antonylebechec commented Sep 24, 2024 •

edited

Loading

antonylebechec commented Sep 24, 2024

antonylebechec commented Sep 24, 2024 •

edited

Loading

Add transcripts level #256

Add transcripts level #256

Comments

antonylebechec commented Jul 31, 2024 • edited Loading

antonylebechec commented Jul 31, 2024

antonylebechec commented Aug 1, 2024

antonylebechec commented Aug 28, 2024

antonylebechec commented Aug 28, 2024

antonylebechec commented Aug 29, 2024

antonylebechec commented Sep 19, 2024

antonylebechec commented Sep 20, 2024

antonylebechec commented Sep 23, 2024 • edited Loading

antonylebechec commented Sep 24, 2024 • edited Loading

antonylebechec commented Sep 24, 2024

antonylebechec commented Sep 24, 2024 • edited Loading

antonylebechec commented Jul 31, 2024 •

edited

Loading

antonylebechec commented Sep 23, 2024 •

edited

Loading

antonylebechec commented Sep 24, 2024 •

edited

Loading

antonylebechec commented Sep 24, 2024 •

edited

Loading