Multiple gene names mapping to a single gene ID #217

crj32 · 2019-03-29T08:19:47Z

Hi

I have done the genome guided assembly on my data following the exact steps in the nature protocols paper and I have often multiple gene_names per single gene id in the merged .gtf file. Is this common? This must be incorrect? Because basically the tool has merged several genes together to make its own gene.....

Thanks,

Chris

chr20 StringTie exon 63734154 63734824 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000484569.1"; exon_number "1"; gene_name "ZGPAT"; ref_gene_id "ENSG00000197114.11";
chr20 StringTie exon 63735159 63735236 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000484569.1"; exon_number "2"; gene_name "ZGPAT"; ref_gene_id "ENSG00000197114.11";
chr20 StringTie transcript 63735463 63738441 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63735463 63735564 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "1"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63737845 63737902 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "2"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63737973 63738060 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "3"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63738183 63738441 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "4"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie transcript 63736283 63738234 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63736283 63736396 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "1"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63737533 63737647 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "2"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63737821 63737902 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "3"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63737973 63738060 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "4"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63738183 63738234 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "5"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";

mrijnkels · 2019-05-01T19:05:31Z

Hi, we have the same issue. we see it for genes that are next to each other and genes that are several 100kb apart.
Would really like to find out how to prevent this as it makes the merged stringtie file not very usefull

gpertea · 2019-05-02T15:17:29Z

This is a difficult issue to solve within StringTie, which makes assembly decisions based primarily on the read alignment data. Reference annotation is often imperfect and lacking, and in order to allow for the discovery of novel isoforms, StringTie always uses the read alignments as the basis of transcript assembly. Unfortunately read alignments can also be wrong/imperfect and may actually "bridge" neighboring genes, as it seems to be the case in the situations you are reporting here.

Using a better or more stringent read alignment strategy may help with this problem. Or some post-alignment filtering can be applied to the alignment data in order to eliminate large, low scoring alignments which seem to spuriously "connect" neighboring genes.

mrijnkels · 2019-06-25T20:33:52Z

So any suggestions on how to generate a better more stringent alignment strategy?

m-waqas mentioned this issue Apr 30, 2020

Same gene id for multiple genes #270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple gene names mapping to a single gene ID #217

Multiple gene names mapping to a single gene ID #217

crj32 commented Mar 29, 2019

mrijnkels commented May 1, 2019

gpertea commented May 2, 2019

mrijnkels commented Jun 25, 2019

Multiple gene names mapping to a single gene ID #217

Multiple gene names mapping to a single gene ID #217

Comments

crj32 commented Mar 29, 2019

mrijnkels commented May 1, 2019

gpertea commented May 2, 2019

mrijnkels commented Jun 25, 2019