All variants are intergenic with NCBI GFF #1620

dzc0104 · 2024-02-23T14:50:13Z

Hi,
I am attempting to annotate a customized VCF file using NCBI's GFF and (fna) FASTA files for the Newcastle disease virus (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_004786615.1/). However, I've observed that all the variants are being classified as intergenic. But this is not true, when viewed in IGV.

System

VEP version:104.3
VEP Cache version: N/A
Perl version: N/A
OS: Linux
tabix installed

###Script
#To install the bgzip and tabix (I did it in my local terminal)
#Download htslib-1.19.1.tar.gz
tar -zxvf htslib-1.19.1.tar.gz
cd htslib-1.19.1

#removing header line of gff as vep does not work with files having header line (local terminal)
grep -v '^#' genomic.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip > genomic.gff.gz
tabix -p gff genomic.gff.gz

#for compressing fasta file (local terminal and transfer all the files in super computer later)
bgzip -c GCF_004786615.1_ASM478661v1_genomic.fna > GCF_004786615.1_ASM478661v1_genomic.fna.gz
#for indexing fasta file
samtools faidx GCF_004786615.1_ASM478661v1_genomic.fna.gz

#creating a synonyms file that maps the chromosome names used in your VCF to those used in your GFF file
zcat iso1_filtered.snp.vcf.gz | grep -v '^#' | sort -k1,1 -o sorted_iso1.vcf
cut -f1 sorted_iso10.vcf > 1snpsynonyms.txt
zcat genomic.gff.gz | grep -v '^#' | sort -k1,1 -o sorted.gff

#variants annotation for snp using ASM4786615.1
vep -i iso1_filtered.snp.vcf.gz --gff /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ncbiASM478661/ncbi_dataset/data/GCF_004786615.1/genomic.gff.gz --fasta /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ncbiASM478661/ncbi_dataset/data/GCF_004786615.1/GCF_004786615.1_ASM478661v1_genomic.fna.gz --synonyms 1snpsynonyms.txt --species avian_orthoavulavirus

Full error message

I have not got any warning message as the script ran but the output file was with all intergenic variants.

Data files

A sample of the GFF after
NC_075404.1 RefSeq region 1 15186 . + . ID=NC_075404.1:1..15186;Dbxref=taxon:2560319;country=United Kingdom: N. Ireland;gbkey=Src;genome=genomic;isolate=chicken/N. Ireland/Ulster/67;mol_type=genomic RNA;old-name=Newcastle disease virus
NC_075404.1 RefSeq gene 56 1801 . + . ID=gene-QKC91_gp1;Dbxref=GeneID:80527638;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding;locus_tag=QKC91_gp1
NC_075404.1 RefSeq CDS 122 1591 . + 0 ID=cds-YP_010790286.1;Parent=gene-QKC91_gp1;Dbxref=GenBank:YP_010790286.1,GeneID:80527638;Name=YP_010790286.1;gbkey=CDS;gene=N;locus_tag=QKC91_gp1;product=nucleoprotein;protein_id=YP_010790286.1
NC_075404.1 RefSeq gene 1804 3254 . + . ID=gene-QKC91_gp2;Dbxref=GeneID:80527633;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding;locus_tag=QKC91_gp2
.....

A sample of the compressed VCF
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT iso1
NODE_1_length_6008_cov_909.877255 980 . T C 12078.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=0.924;DP=624;ExcessHet=0.0000;FS=1.120;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=19.87;ReadPosRankSum=0.149;SOR=0.728 GT:AD:DP:GQ:PL 0/1:236,372:608:99:12086,0,6929
NODE_1_length_6008_cov_909.877255 3666 . C T 15573.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-0.079;DP=770;ExcessHet=0.0000;FS=7.765;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=20.88;ReadPosRankSum=0.795;SOR=0.362 GT:AD:DP:GQ:PL 0/1:235,511:746:99:15581,0,5829
NODE_1_length_6008_cov_909.877255 3812 . A G 534.64 ReadPosRankSum-8 AC=1;AF=0.500;AN=2;BaseQRankSum=1.096;DP=826;ExcessHet=0.0000;FS=15.515;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=0.66;ReadPosRankSum=-12.298;SOR=2.487 GT:AD:DP:GQ:PL 0/1:722,85:807:99:542,0,23105
NODE_1_length_6008_cov_909.877255 4631 . T C 1817.64 ReadPosRankSum-8 AC=1;AF=0.500;AN=2;BaseQRankSum=-3.725;DP=846;ExcessHet=0.0000;FS=22.208;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=2.24;ReadPosRankSum=-13.945;SOR=1.685 GT:AD:DP:GQ:PL 0/1:680,133:813:99:1825,0,21905
NODE_2_length_2668_cov_848.858356 289 . G A 924.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-1.811;DP=720;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.97;MQRankSum=0.000;QD=1.50;ReadPosRankSum=-5.861;SOR=0.631 GT:AD:DP:GQ:PL 0/1:531,87:618:99:932,0,16256
.....

Synonyms text file format
NODE_1_length_6008_cov_909.877255 NC_075404.1
NODE_1_length_6008_cov_909.877255 NC_075404.1
NODE_1_length_6008_cov_909.877255 NC_075404.1
NODE_1_length_6008_cov_909.877255 NC_075404.1
NODE_2_length_2668_cov_848.858356 NC_075404.1
NODE_2_length_2668_cov_848.858356 NC_075404.1
.....

VEP output

ENSEMBL VARIANT EFFECT PREDICTOR v104.3

Output produced at 2024-02-09 19:23:53

Using API version 104, DB version ?

ensembl-funcgen version 104.f1c7762

ensembl-io version 104.1d3bb6e

ensembl version 104.1af1dce

ensembl-variation version 104.20f5335

Column descriptions:

Uploaded_variation : Identifier of uploaded variant

Location : Location of variant in standard coordinate format (chr:start or chr:start-end)

Allele : The variant allele used to calculate the consequence

Gene : Stable ID of affected gene

Feature : Stable ID of feature

Feature_type : Type of feature - Transcript, RegulatoryFeature or MotifFeature

Consequence : Consequence type

cDNA_position : Relative position of base pair in cDNA sequence

CDS_position : Relative position of base pair in coding sequence

Protein_position : Relative position of amino acid in protein

Amino_acids : Reference and variant amino acids

Codons : Reference and variant codon sequence

Existing_variation : Identifier(s) of co-located known variants

Extra column keys:

IMPACT : Subjective impact classification of consequence type

DISTANCE : Shortest distance from variant to transcript

STRAND : Strand of the feature (1/-1)

FLAGS : Transcript quality flags

SOURCE : Source of transcript

genomic.gff.gz : /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ncbiASM478661/ncbi_dataset/data/GCF_004786615.1/genomic.gff.gz (overlap)

#Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra
NODE_1_length_6008_cov_909.877255_980_T/C NODE_1_length_6008_cov_909.877255:980 C - - - intergenic_variant - - - - - - IMPACT=MODIFIER
NODE_1_length_6008_cov_909.877255_3666_C/T NODE_1_length_6008_cov_909.877255:3666 T - - - intergenic_variant - - - - - - IMPACT=MODIFIER
NODE_1_length_6008_cov_909.877255_3812_A/G NODE_1_length_6008_cov_909.877255:3812 G - - - intergenic_variant - - - - - - IMPACT=MODIFIER
NODE_1_length_6008_cov_909.877255_4631_T/C NODE_1_length_6008_cov_909.877255:4631 C - - - intergenic_variant - - - - - - IMPACT=MODIFIER
....

nuno-agostinho · 2024-02-23T16:36:06Z

Hey @dzc0104,

Thank you for your question. The problem is related with using the NCBI GTF/GFF annotation for microorganisms: we currently require the GTF/GFF annotation to explicitly describe the transcript and its exons.

For your use case, you could use the following modified annotation:

##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ASM478661v1
#!genome-build-accession NCBI_Assembly:GCF_004786615.1
##sequence-region NC_075404.1 1 15186
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2560319
NC_075404.1	RefSeq	region	1	15186	.	+	.	ID=NC_075404.1:1..15186;Dbxref=taxon:2560319;country=United Kingdom: N. Ireland;gbkey=Src;genome=genomic;isolate=chicken/N. Ireland/Ulster/67;mol_type=genomic RNA;old-name=Newcastle disease virus
NC_075404.1	RefSeq	gene	56	1801	.	+	.	ID=gene-QKC91_gp1;Dbxref=GeneID:80527638;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding;locus_tag=QKC91_gp1
NC_075404.1	RefSeq	transcript	122	1591	.	+	0	ID=transcript-YP_010790286.1;Parent=gene-QKC91_gp1;Dbxref=GenBank:YP_010790286.1,GeneID:80527638;Name=YP_010790286.1;gbkey=CDS;gene=N;locus_tag=QKC91_gp1;product=nucleoprotein;protein_id=YP_010790286.1
NC_075404.1	RefSeq	exon	122	1591	.	+	0	ID=exon-YP_010790286.1;Parent=transcript-YP_010790286.1;Dbxref=GenBank:YP_010790286.1,GeneID:80527638;Name=YP_010790286.1;gbkey=CDS;gene=N;locus_tag=QKC91_gp1;product=nucleoprotein;protein_id=YP_010790286.1
NC_075404.1	RefSeq	gene	1804	3254	.	+	.	ID=gene-QKC91_gp2;Dbxref=GeneID:80527633;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding;locus_tag=QKC91_gp2
NC_075404.1	RefSeq	transcript	1887	3074	.	+	0	ID=transcript-YP_010790287.1;Parent=gene-QKC91_gp2;Dbxref=GenBank:YP_010790287.1,GeneID:80527633;Name=YP_010790287.1;gbkey=CDS;gene=P;locus_tag=QKC91_gp2;product=phosphoprotein;protein_id=YP_010790287.1
NC_075404.1	RefSeq	exon	1887	3074	.	+	0	ID=exon-YP_010790287.1;Parent=transcript-YP_010790287.1;Dbxref=GenBank:YP_010790287.1,GeneID:80527633;Name=YP_010790287.1;gbkey=CDS;gene=P;locus_tag=QKC91_gp2;product=phosphoprotein;protein_id=YP_010790287.1
NC_075404.1	RefSeq	gene	3256	4496	.	+	.	ID=gene-QKC91_gp3;Dbxref=GeneID:80527634;Name=M;gbkey=Gene;gene=M;gene_biotype=protein_coding;locus_tag=QKC91_gp3
NC_075404.1	RefSeq	transcript	3290	4384	.	+	0	ID=transcript-YP_010790288.1;Parent=gene-QKC91_gp3;Dbxref=GenBank:YP_010790288.1,GeneID:80527634;Name=YP_010790288.1;gbkey=CDS;gene=M;locus_tag=QKC91_gp3;product=matrix protein;protein_id=YP_010790288.1
NC_075404.1	RefSeq	exon	3290	4384	.	+	0	ID=exon-YP_010790288.1;Parent=transcript-YP_010790288.1;Dbxref=GenBank:YP_010790288.1,GeneID:80527634;Name=YP_010790288.1;gbkey=CDS;gene=M;locus_tag=QKC91_gp3;product=matrix protein;protein_id=YP_010790288.1
NC_075404.1	RefSeq	gene	4498	6289	.	+	.	ID=gene-QKC91_gp4;Dbxref=GeneID:80527635;Name=F;gbkey=Gene;gene=F;gene_biotype=protein_coding;locus_tag=QKC91_gp4
NC_075404.1	RefSeq	transcript	4544	6205	.	+	0	ID=transcript-YP_010790289.1;Parent=gene-QKC91_gp4;Dbxref=GenBank:YP_010790289.1,GeneID:80527635;Name=YP_010790289.1;gbkey=CDS;gene=F;locus_tag=QKC91_gp4;product=fusion protein;protein_id=YP_010790289.1
NC_075404.1	RefSeq	exon	4544	6205	.	+	0	ID=exon-YP_010790289.1;Parent=transcript-YP_010790289.1;Dbxref=GenBank:YP_010790289.1,GeneID:80527635;Name=YP_010790289.1;gbkey=CDS;gene=F;locus_tag=QKC91_gp4;product=fusion protein;protein_id=YP_010790289.1
NC_075404.1	RefSeq	gene	6321	8322	.	+	.	ID=gene-QKC91_gp5;Dbxref=GeneID:80527636;Name=HN;gbkey=Gene;gene=HN;gene_biotype=protein_coding;locus_tag=QKC91_gp5
NC_075404.1	RefSeq	transcript	6412	8262	.	+	0	ID=transcript-YP_010790290.1;Parent=gene-QKC91_gp5;Dbxref=GenBank:YP_010790290.1,GeneID:80527636;Name=YP_010790290.1;gbkey=CDS;gene=HN;locus_tag=QKC91_gp5;product=hemagglutinin-neuraminidase;protein_id=YP_010790290.1
NC_075404.1	RefSeq	exon	6412	8262	.	+	0	ID=exon-YP_010790290.1;Parent=transcript-YP_010790290.1;Dbxref=GenBank:YP_010790290.1,GeneID:80527636;Name=YP_010790290.1;gbkey=CDS;gene=HN;locus_tag=QKC91_gp5;product=hemagglutinin-neuraminidase;protein_id=YP_010790290.1
NC_075404.1	RefSeq	gene	8370	15072	.	+	.	ID=gene-QKC91_gp6;Dbxref=GeneID:80527637;Name=L;gbkey=Gene;gene=L;gene_biotype=protein_coding;locus_tag=QKC91_gp6
NC_075404.1	RefSeq	transcript	8381	14995	.	+	0	ID=transcript-YP_010790291.1;Parent=gene-QKC91_gp6;Dbxref=GenBank:YP_010790291.1,GeneID:80527637;Name=YP_010790291.1;gbkey=CDS;gene=L;locus_tag=QKC91_gp6;product=RNA-dependent RNA polymerase;protein_id=YP_010790291.1
NC_075404.1	RefSeq	exon	8381	14995	.	+	0	ID=exon-YP_010790291.1;Parent=transcript-YP_010790291.1;Dbxref=GenBank:YP_010790291.1,GeneID:80527637;Name=YP_010790291.1;gbkey=CDS;gene=L;locus_tag=QKC91_gp6;product=RNA-dependent RNA polymerase;protein_id=YP_010790291.1

As this is not the first time we got this question (see #1074), I am going to talk with the team about the possibility of supporting these NCBI GTF/GFF annotation files for microorganisms. Maybe we can consider each CDS as a single-exon transcript. I will keep you updated on this.

Best regards,
Nuno

dzc0104 · 2024-02-26T18:01:03Z

Thank you for the response @nuno-agostinho
It worked for that reference. I have a question did you edit the gff file manually? I have other two references 1) https://www.ncbi.nlm.nih.gov/nuccore/NC_039223.1
2) https://www.ncbi.nlm.nih.gov/nuccore/AF077761 - this one has gff3 files and I tried to convert it into gff and even gtf but could not. Gff3 did not even bgzipped and tabixed.

nuno-agostinho · 2024-02-27T17:23:54Z

Hi @dzc0104,

I manually created the file by basically:

Duplicating the CDS lines
Changing the feature to transcript and exon
Changing their IDs to something unique
Changing their Parent IDs:
- Put the gene ID as the parent ID of the transcript
- Put the transcript ID as the parent ID of the exon

Tell me if you need further instructions.

this one has gff3 files and I tried to convert it into gff and even gtf but could not. Gff3 did not even bgzipped and tabixed.

If you downloaded the GFF3 annotation via the Send to form in the top right corner of the record, you need to remove the last empty lines of the file before running bgzip and tabix. Tell me if this worked.

Cheers,
Nuno

dzc0104 · 2024-03-10T14:12:17Z

@nuno-agostinho Yay! It worked. Thank you very much, Nuno.

Regard,
Deepa

dzc0104 · 2024-03-18T19:33:26Z

@nuno-agostinho I still have a question. How can position 77 be associated with multiple types of genes, namely F, M, NP, and P? During my analysis, I observed that genomic position 77 is annotated with gene symbols F, M, NP, and P across various transcripts like this
Iso7- Vep.xlsx

I got this information from a dataset https://www.ncbi.nlm.nih.gov/nuccore/AF077761 that includes details about gene symbols and transcript types. But I'm not sure what it means biologically to have different gene types at the same position.

nuno-agostinho · 2024-03-19T10:09:15Z

Hi @dzc0104,

The only results associated with genes F and M are upstream_gene_variant or downstram_gene_variant. Marking variants as upstream/downstream a gene is useful to understand variants that may affect those genes (maybe as regulatory regions).

However, the default distance between a variant and a transcript used by VEP to annotate up/downstream variants is 5 000 bp (optimised for vertebrates) and the genome you mentioned is small (15 186 bp). Please try to decrease the --distance parameter to make it more sense for your use case.

Hope this makes it clear.

Cheers,
Nuno

dzc0104 · 2024-05-22T17:19:24Z

Hi @nuno-agostinho,

Thank you for your assistance.

As part of my data analysis, I've identified synonymous variants and now I'm exploring their potential impacts at the amino acid level. While synonymous variants traditionally aren't thought to have functional impacts on protein structure, they can affect RNA stability, protein folding, evolutionary conservation, splicing regulation, and regulatory elements.

I've utilized Variant Effect Predictor (VEP) with the SIFT option (-sift b), but unfortunately, I didn't receive any relevant data in the output. Does this lack of prediction indicate that there are no available predictions for my variants?

Here's the command I used:
vep -i iso1p1_filtered.snp.vcf.gz
--gff /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ref/AF077761/sequence.gff3.gz
--fasta /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ref/AF077761/AF077761.fasta.gz
--species avian_orthoavulavirus
--sift b

Additionally, I'm seeking recommendations for other tools to analyze the functional impacts of synonymous variants, particularly those focusing on RNA-level effects, splicing regulation, and non-protein-coding impacts.

Thank you for your guidance! 😊

I have attached hereby the link to the VCF file.

iso1p1_filtered.snp.vcf.gz

Best regards,
Deepa

nuno-agostinho · 2024-05-23T12:58:15Z

Hi @dzc0104,

VEP only returns pre-computed SIFT results stored in Ensembl databases in --database or --cache modes. However, we don't have SIFT results for avian orthoavulavirus. You may want to consider installing and running SIFT on your data, as per https://sift.bii.a-star.edu.sg.

Regarding additional tools to help predict variant consequences, some articles list such tools:

Hope this information was useful.

Cheers,
Nuno

Joshua-Macleod · 2024-08-15T21:55:48Z

Hi @nuno-agostinho,

I have a similar issue as the one originally reported by @dzc0104 regarding intergenic variant calling.

I've built .gff3 files using both prokka and bakta for reference genomes against which I'm looking to find variants. Here's an excerpt of a bakta .gff3 below:

contig00001     Prodigal        CDS     265     723     .       +       0       ID=KAHBKG_00010;Name=Transcriptional regulator CtsR;locus_tag=KAHBKG_00010;product=Transcriptional regulator CtsR;Dbxref=COG:COG4463,COG:K,RefSeq:WP_003760062.1,SO:0001217,UniParc:UPI00000CC18E,UniRef:UniRef100_H1GA27,UniRef:UniRef50_A0A143YMT3,UniRef:UniRef90_G2ZA06;gene=ctsR
contig00001     Prodigal        CDS     736     1254    .       +       0       ID=KAHBKG_00015;Name=Protein-arginine kinase activator protein McsA;locus_tag=KAHBKG_00015;product=Protein-arginine kinase activator protein McsA;Dbxref=COG:COG3880,COG:O,RefSeq:WP_003760064.1,SO:0001217,UniParc:UPI0001EB894E,UniRef:UniRef100_A0A823H5C3,UniRef:UniRef50_H1GA28,UniRef:UniRef90_H1GA28;gene=mcsA
contig00001     Prodigal        CDS     1251    2273    .       +       0       ID=KAHBKG_00020;Name=protein arginine kinase;locus_tag=KAHBKG_00020;product=protein arginine kinase;Dbxref=COG:COG3869,COG:O,EC:2.7.14.1,GO:0004111,GO:0004672,GO:0005524,GO:0016310,GO:0046314,RefSeq:WP_010990301.1,SO:0001217,UniParc:UPI000013952D,UniRef:UniRef100_Q92F44,UniRef:UniRef50_Q48759,UniRef:UniRef90_Q48759;gene=mcsB
contig00001     Prodigal        CDS     2302    4764    .       +       0       ID=KAHBKG_00025;Name=endopeptidase Clp ATP-binding chain C;locus_tag=KAHBKG_00025;product=endopeptidase Clp ATP-binding chain C;Dbxref=COG:COG0542,COG:O,RefSeq:WP_003770116.1,SO:0001217,UniParc:UPI00000CC190,UniRef:UniRef100_A0A3H2VSB6,UniRef:UniRef50_A0A0F7N4K2,UniRef:UniRef90_A0A097B1Z0,VFDB:VFC0282,VFDB:VFG000079;gene=clpC

I've tried to make use of your method here:

Duplicating the CDS lines

Changing the feature to transcript and exon

Changing their IDs to something unique

Changing their Parent IDs:

Put the gene ID as the parent ID of the transcript

Put the transcript ID as the parent ID of the exon

and even changing CDS to gene in the .gff3 file and including a biotype to remedy the warning (just on the off chance...):

contig00001     Prodigal        gene    265     723     .       +       .       ID=gene-KAHBKG_00010;locus_tag=KAHBKG_00010;gene_biotype=protein_coding
contig00001     Prodigal        transcript      265     723     .       +       .       ID=KAHBKG_00010_t1000;Parent=gene-KAHBKG_00010;locus_tag=KAHBKG_00010
contig00001     Prodigal        exon    265     723     .       +       0       ID=KAHBKG_00010_e1000;Parent=KAHBKG_00010_t1000;locus_tag=KAHBKG_00010

However, I still receive warnings (WARNING: Unable to determine biotype of KAHBKG_01390) for approx. 30 IDs/locus_tags per .gff3 and variants are still called as intergenic even if the locations fall within a CDS.

Any recommendations here, or if you'd like me to provide test data, do let me know.

Cheers,
Joshua

nuno-agostinho · 2024-08-16T09:11:34Z

Hi @Joshua-Macleod,

Based on that warning, I would say that those lines have no field indicating their biotype, so VEP can't determine whether they are part of a protein_coding transcript or not.

Could you show me the lines in your GFF3 file relative to KAHBKG_01390?

Best,
Nuno

Joshua-Macleod · 2024-08-16T10:20:01Z

Hi @nuno-agostinho,

Thanks for getting back to me.

Here are the lines:

contig00001     Prodigal        gene    270089  271192  .       +       .       ID=gene-KAHBKG_01390;locus_tag=KAHBKG_01390;gene_biotype=protein_coding;Name=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;product=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;Dbxref=COG:COG0820,COG:J,EC:2.1.1.192,GO:0000049,GO:0002935,GO:0005737,GO:0008757,GO:0016433,GO:0019843,GO:0031167,GO:0046872,GO:0051539,GO:0070040,GO:0070475,RefSeq:WP_003725208.1,SO:0001217,UniParc:UPI00000CC251,UniRef:UniRef100_Q92EH6,UniRef:UniRef50_Q8Y9P2,UniRef:UniRef90_Q8Y9P2;gene=rlmN
contig00001     Prodigal        transcript      270089  271192  .       +       .       ID=KAHBKG_01390_t1272;Parent=gene-KAHBKG_01390;locus_tag=KAHBKG_01390;Name=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;product=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;Dbxref=COG:COG0820,COG:J,EC:2.1.1.192,GO:0000049,GO:0002935,GO:0005737,GO:0008757,GO:0016433,GO:0019843,GO:0031167,GO:0046872,GO:0051539,GO:0070040,GO:0070475,RefSeq:WP_003725208.1,SO:0001217,UniParc:UPI00000CC251,UniRef:UniRef100_Q92EH6,UniRef:UniRef50_Q8Y9P2,UniRef:UniRef90_Q8Y9P2;gene=rlmN
contig00001     Prodigal        exon    270089  271192  .       +       0       ID=KAHBKG_01390_e1272;Parent=KAHBKG_01390_t1272;locus_tag=KAHBKG_01390;Name=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;product=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;Dbxref=COG:COG0820,COG:J,EC:2.1.1.192,GO:0000049,GO:0002935,GO:0005737,GO:0008757,GO:0016433,GO:0019843,GO:0031167,GO:0046872,GO:0051539,GO:0070040,GO:0070475,RefSeq:WP_003725208.1,SO:0001217,UniParc:UPI00000CC251,UniRef:UniRef100_Q92EH6,UniRef:UniRef50_Q8Y9P2,UniRef:UniRef90_Q8Y9P2;gene=rlmN

Worth noting, these aren't loci outputted by vep (edit: presumably wouldn't be for the same reason they're noted in the warnings - I didn't put two and two together).

Cheers,
Joshua

nuno-agostinho self-assigned this Feb 23, 2024

nuno-agostinho added custom annotation GFF/GTF labels Feb 23, 2024

nuno-agostinho mentioned this issue Mar 4, 2024

Support NCBI microbe GTF/GFF with no transcripts (CDS only) #1627

Open

4 tasks

nuno-agostinho assigned likhitha-surapaneni and unassigned nuno-agostinho Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All variants are intergenic with NCBI GFF #1620

All variants are intergenic with NCBI GFF #1620

dzc0104 commented Feb 23, 2024

nuno-agostinho commented Feb 23, 2024 •

edited

Loading

dzc0104 commented Feb 26, 2024 •

edited

Loading

nuno-agostinho commented Feb 27, 2024 •

edited

Loading

dzc0104 commented Mar 10, 2024

dzc0104 commented Mar 18, 2024

nuno-agostinho commented Mar 19, 2024 •

edited

Loading

dzc0104 commented May 22, 2024 •

edited

Loading

nuno-agostinho commented May 23, 2024

Joshua-Macleod commented Aug 15, 2024 •

edited

Loading

nuno-agostinho commented Aug 16, 2024 •

edited

Loading

Joshua-Macleod commented Aug 16, 2024 •

edited

Loading

All variants are intergenic with NCBI GFF #1620

All variants are intergenic with NCBI GFF #1620

Comments

dzc0104 commented Feb 23, 2024

System

Full error message

Data files

ENSEMBL VARIANT EFFECT PREDICTOR v104.3

Output produced at 2024-02-09 19:23:53

Using API version 104, DB version ?

ensembl-funcgen version 104.f1c7762

ensembl-io version 104.1d3bb6e

ensembl version 104.1af1dce

ensembl-variation version 104.20f5335

Column descriptions:

Uploaded_variation : Identifier of uploaded variant

Location : Location of variant in standard coordinate format (chr:start or chr:start-end)

Allele : The variant allele used to calculate the consequence

Gene : Stable ID of affected gene

Feature : Stable ID of feature

Feature_type : Type of feature - Transcript, RegulatoryFeature or MotifFeature

Consequence : Consequence type

cDNA_position : Relative position of base pair in cDNA sequence

CDS_position : Relative position of base pair in coding sequence

Protein_position : Relative position of amino acid in protein

Amino_acids : Reference and variant amino acids

Codons : Reference and variant codon sequence

Existing_variation : Identifier(s) of co-located known variants

Extra column keys:

IMPACT : Subjective impact classification of consequence type

DISTANCE : Shortest distance from variant to transcript

STRAND : Strand of the feature (1/-1)

FLAGS : Transcript quality flags

SOURCE : Source of transcript

genomic.gff.gz : /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ncbiASM478661/ncbi_dataset/data/GCF_004786615.1/genomic.gff.gz (overlap)

nuno-agostinho commented Feb 23, 2024 • edited Loading

dzc0104 commented Feb 26, 2024 • edited Loading

nuno-agostinho commented Feb 27, 2024 • edited Loading

dzc0104 commented Mar 10, 2024

dzc0104 commented Mar 18, 2024

nuno-agostinho commented Mar 19, 2024 • edited Loading

dzc0104 commented May 22, 2024 • edited Loading

nuno-agostinho commented May 23, 2024

Joshua-Macleod commented Aug 15, 2024 • edited Loading

nuno-agostinho commented Aug 16, 2024 • edited Loading

Joshua-Macleod commented Aug 16, 2024 • edited Loading

nuno-agostinho commented Feb 23, 2024 •

edited

Loading

dzc0104 commented Feb 26, 2024 •

edited

Loading

nuno-agostinho commented Feb 27, 2024 •

edited

Loading

nuno-agostinho commented Mar 19, 2024 •

edited

Loading

dzc0104 commented May 22, 2024 •

edited

Loading

Joshua-Macleod commented Aug 15, 2024 •

edited

Loading

nuno-agostinho commented Aug 16, 2024 •

edited

Loading

Joshua-Macleod commented Aug 16, 2024 •

edited

Loading