You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an example of my gff file:
CP054425.1 RefSeq gene 1 1503 . + . ID=gene-EE567_RS00005;Name=dnaA;gbkey=Gene;gene=dnaA;gene_biotype=protein_coding;locus_tag=EE567_RS00005;old_locus_tag=EE567_000005
CP054425.1 Protein Homology CDS 1 1503 . + 0 ID=cds-WP_007057882.1;Parent=gene-EE567_RS00005;Dbxref=Genbank:WP_007057882.1;Name=WP_007057882.1;gbkey=CDS;gene=dnaA;inference=COORDINATES: similar to AA sequence:RefSeq:WP_013139947.1;locus_tag=EE567_RS00005;product=chromosomal replication initiator protein DnaA;protein_id=WP_007057882.1;transl_table=11
CP054425.1 RefSeq gene 2238 3362 . + . ID=gene-EE567_RS00010;Name=dnaN;gbkey=Gene;gene=dnaN;gene_biotype=protein_coding;locus_tag=EE567_RS00010;old_locus_tag=EE567_000010
CP054425.1 Protein Homology CDS 2238 3362 . + 0 ID=cds-WP_012576463.1;Parent=gene-EE567_RS00010;Dbxref=Genbank:WP_012576463.1;Name=WP_012576463.1;gbkey=CDS;gene=dnaN;inference=COORDINATES: similar to AA sequence:RefSeq:WP_007051765.1;locus_tag=EE567_RS00010;product=DNA polymerase III subunit beta;protein_id=WP_012576463.1;transl_table=11
I have tried to modify the GFF to include the transcript and make it simpler as follows but still have problems with the output.
CP054425.1 RefSeq gene 1 1503 . + . ID=gene-EE567_RS00005;Name=dnaA
CP054425.1 Protein Homology transcript 1 1503 . + . ID=transcript-TWP_007057882.1;Parent=gene-EE567_RS00005;biotype=protein_coding
CP054425.1 Protein Homology exon 1 1503 . + . ID=exon-WP_007057882.1;Parent=transcript-TWP_007057882.1
CP054425.1 RefSeq gene 2238 3362 . + . ID=gene-EE567_RS00010;Name=dnaN
CP054425.1 Protein Homology transcript 2238 3362 . + . ID=transcript-TWP_012576463.1;Parent=gene-EE567_RS00010;biotype=protein_coding
CP054425.1 Protein Homology exon 2238 3362 . + . ID=exon-WP_012576463.1;Parent=transcript-TWP_012576463.1
Hi @Adrian-Howard,
Bacterial genomes in NCBI are annotated with only CDS which means in the GFF file the CDS is directly attached to the gene without any transcript and exons that is why the variants are being annotated as intergenic.
Could you try again with the following GFF example:
Hello,
I am trying to run vep offline mode with a bacterial genome but variants that fall in a cds region are printed as intergenic.
My reference files where downloaded directly from NCBI genome and have prepared them ass suggested in the link https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gff
This is an example of my gff file:
CP054425.1 RefSeq gene 1 1503 . + . ID=gene-EE567_RS00005;Name=dnaA;gbkey=Gene;gene=dnaA;gene_biotype=protein_coding;locus_tag=EE567_RS00005;old_locus_tag=EE567_000005
CP054425.1 Protein Homology CDS 1 1503 . + 0 ID=cds-WP_007057882.1;Parent=gene-EE567_RS00005;Dbxref=Genbank:WP_007057882.1;Name=WP_007057882.1;gbkey=CDS;gene=dnaA;inference=COORDINATES: similar to AA sequence:RefSeq:WP_013139947.1;locus_tag=EE567_RS00005;product=chromosomal replication initiator protein DnaA;protein_id=WP_007057882.1;transl_table=11
CP054425.1 RefSeq gene 2238 3362 . + . ID=gene-EE567_RS00010;Name=dnaN;gbkey=Gene;gene=dnaN;gene_biotype=protein_coding;locus_tag=EE567_RS00010;old_locus_tag=EE567_000010
CP054425.1 Protein Homology CDS 2238 3362 . + 0 ID=cds-WP_012576463.1;Parent=gene-EE567_RS00010;Dbxref=Genbank:WP_012576463.1;Name=WP_012576463.1;gbkey=CDS;gene=dnaN;inference=COORDINATES: similar to AA sequence:RefSeq:WP_007051765.1;locus_tag=EE567_RS00010;product=DNA polymerase III subunit beta;protein_id=WP_012576463.1;transl_table=11
This is the output i get:
#Uploaded_variation Location Allele Consequence IMPACT SYMBOL Gene Feature_type Feature BIOTYPE EXON INTRON HGVSc HGVSp cDNA_position CDS_position Protein_position Amino_acids Codons STRAND
CP054425.1_459_C/G CP054425.1:459 G intergenic_variant MODIFIER - - - - - - - - - - - - - - -
CP054425.1_462_G/A CP054425.1:462 A intergenic_variant MODIFIER - - - - - - - - - - - - - - -
CP054425.1_723_C/T CP054425.1:723 T intergenic_variant MODIFIER - - - - - - - - - - - - - - -
I have tried to modify the GFF to include the transcript and make it simpler as follows but still have problems with the output.
CP054425.1 RefSeq gene 1 1503 . + . ID=gene-EE567_RS00005;Name=dnaA
CP054425.1 Protein Homology transcript 1 1503 . + . ID=transcript-TWP_007057882.1;Parent=gene-EE567_RS00005;biotype=protein_coding
CP054425.1 Protein Homology exon 1 1503 . + . ID=exon-WP_007057882.1;Parent=transcript-TWP_007057882.1
CP054425.1 RefSeq gene 2238 3362 . + . ID=gene-EE567_RS00010;Name=dnaN
CP054425.1 Protein Homology transcript 2238 3362 . + . ID=transcript-TWP_012576463.1;Parent=gene-EE567_RS00010;biotype=protein_coding
CP054425.1 Protein Homology exon 2238 3362 . + . ID=exon-WP_012576463.1;Parent=transcript-TWP_012576463.1
This is the output i get:
#Uploaded_variation Location Allele Consequence IMPACT SYMBOL Gene Feature_type Feature BIOTYPE EXON INTRON HGVSc HGVSp cDNA_position CDS_position Protein_position Amino_acids Codons STRAND
CP054425.1_459_C/G CP054425.1:459 G intergenic_variant MODIFIER dnaA gene-EE567_RS00005 Transcript transcript-TWP_007057882.1 protein_coding 1/1 - transcript-TWP_007057882.1:n.459C>G - 459 - - - - 1
CP054425.1_459_C/G CP054425.1:459 G upstream_gene_variant MODIFIER dnaN gene-EE567_RS00010 Transcript transcript-TWP_012576463.1 protein_coding - - - - - - - - - 1
CP054425.1_459_C/G CP054425.1:459 G upstream_gene_variant MODIFIER EE567_RS00020 gene-EE567_RS00020 Transcript transcript-TWP_032743437.1 protein_coding - - - - - - - - - 1
CP054425.1_459_C/G CP054425.1:459 G upstream_gene_variant MODIFIER recF gene-EE567_RS00015 Transcript transcript-TWP_032743438.1 protein_coding - - - - - - - - - 1
CP054425.1_459_C/G CP054425.1:459 G upstream_gene_variant MODIFIER gyrB gene-EE567_RS00025 Transcript transcript-TWP_172664385.1 protein_coding - - - - - - - - - 1
This is my vep command line:
vep -e -i Bio-Kult_Bi-26_all_Q30DP_norm.vcf -o Bio-Kult_Bi-26_all_Q30DP_norm_ann --tab --fields "Uploaded_variation,Location,Allele,Consequence,IMPACT,SYMBOL,Gene,Feature_type,Feature,BIOTYPE,EXON,INTRON,HGVSc,HGVSp,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,STRAND" --custom B_longum_subps_infantis_Bi-26_transcript.gff.gz,Bi-26,gff --fasta B_longum_subps_infantis_Bi-26.fasta.gz
I will play around with the gff file to see if it works.
Best,
Adrian
The text was updated successfully, but these errors were encountered: