-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple gene names mapping to a single gene ID #217
Comments
Hi, we have the same issue. we see it for genes that are next to each other and genes that are several 100kb apart. |
This is a difficult issue to solve within StringTie, which makes assembly decisions based primarily on the read alignment data. Reference annotation is often imperfect and lacking, and in order to allow for the discovery of novel isoforms, StringTie always uses the read alignments as the basis of transcript assembly. Unfortunately read alignments can also be wrong/imperfect and may actually "bridge" neighboring genes, as it seems to be the case in the situations you are reporting here. Using a better or more stringent read alignment strategy may help with this problem. Or some post-alignment filtering can be applied to the alignment data in order to eliminate large, low scoring alignments which seem to spuriously "connect" neighboring genes. |
So any suggestions on how to generate a better more stringent alignment strategy? |
Hi
I have done the genome guided assembly on my data following the exact steps in the nature protocols paper and I have often multiple gene_names per single gene id in the merged .gtf file. Is this common? This must be incorrect? Because basically the tool has merged several genes together to make its own gene.....
Thanks,
Chris
chr20 StringTie exon 63734154 63734824 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000484569.1"; exon_number "1"; gene_name "ZGPAT"; ref_gene_id "ENSG00000197114.11";
chr20 StringTie exon 63735159 63735236 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000484569.1"; exon_number "2"; gene_name "ZGPAT"; ref_gene_id "ENSG00000197114.11";
chr20 StringTie transcript 63735463 63738441 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63735463 63735564 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "1"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63737845 63737902 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "2"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63737973 63738060 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "3"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie exon 63738183 63738441 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000496820.2"; exon_number "4"; gene_name "RP4-583P15.15"; ref_gene_id "ENSG00000273154.3";
chr20 StringTie transcript 63736283 63738234 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63736283 63736396 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "1"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63737533 63737647 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "2"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63737821 63737902 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "3"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63737973 63738060 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "4"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
chr20 StringTie exon 63738183 63738234 1000 + . gene_id "MSTRG.40027"; transcript_id "ENST00000444951.5"; exon_number "5"; gene_name "LIME1"; ref_gene_id "ENSG00000203896.9";
The text was updated successfully, but these errors were encountered: