stringtie merge merging multiple real genes #190

jokelley · 2018-07-28T16:40:59Z

I am using stringtie merge to find additional genes for a species, however, when I run the reference guided merge the program combines real genes that are close together, two examples below. Is there a way to force the merge to keep the existing reference genes and annotate new genes? Or is there a program that would do this? I need to keep the existing gene set while also identifying possible novel genes.

Kennyluo4 · 2019-08-02T21:25:52Z

I have the same issue with it. Do you figure out how to solve this now? I tried to use -g parameter to limit the gap for merging transcripts but it didn't work. It's impossible to continue with the DEG and other downstream analysis when you have a "gene" that included several reference genes. I look into one assembled "gene" that contain 5 reference genes. There is no overlap between these genes, some even have 10 kb gap. Yet, they are merged together as a new "gene".

jokelley · 2019-08-02T22:04:17Z

I was not able to solve this within stringtie. Perhaps there have been updates? Stringtie developers, any insight?

…

On Fri, Aug 2, 2019 at 2:25 PM Ziliang Luo ***@***.***> wrote: I have the same issue with it. Do you figure out how to solve this now? I tried to use -g parameter to limit the gap for merging transcripts but it didn't work. It's impossible to continue with the DEG and other downstream analysis when you have a "gene" that included several reference genes. I look into one assembled "gene" that contain 5 reference genes. There is no overlap between these genes, some even have 10 kb gap. Yet, they are merged together as a new "gene". — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#190?email_source=notifications&email_token=AAQA5STQG4G45SB63NG6GPLQCSQ6DA5CNFSM4FMUNNU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3O36RY#issuecomment-517848903>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQA5SXD6QFSOKBMRZGQN5DQCSQ6DANCNFSM4FMUNNUQ> .

Kennyluo4 · 2019-08-05T16:28:17Z

I was not able to solve this within stringtie. Perhaps there have been updates? Stringtie developers, any insight?
…
On Fri, Aug 2, 2019 at 2:25 PM Ziliang Luo @.***> wrote: I have the same issue with it. Do you figure out how to solve this now? I tried to use -g parameter to limit the gap for merging transcripts but it didn't work. It's impossible to continue with the DEG and other downstream analysis when you have a "gene" that included several reference genes. I look into one assembled "gene" that contain 5 reference genes. There is no overlap between these genes, some even have 10 kb gap. Yet, they are merged together as a new "gene". — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#190?email_source=notifications&email_token=AAQA5STQG4G45SB63NG6GPLQCSQ6DA5CNFSM4FMUNNU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3O36RY#issuecomment-517848903>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQA5SXD6QFSOKBMRZGQN5DQCSQ6DANCNFSM4FMUNNUQ .

So, do you use other assembler for your study? I saw some suggestions to use the simplified protocol to ignore the novel transcripts and merging step, or make changes in alignment step. But I don't want to play with the alignment settings, because it takes too much time. I also don't want to lose novel transcripts using simplified method because I'm try to identify lncRNA and study alternative splicing.
I double checked the genes incorporated with multiple ref genes, they are merged because there are some novel transcripts/isoforms spanning across ref genes. That's why they are merged despite that ref genes are distant from each other. I tried to use stringent parameters for assembly. E.g. increasing value for -f (fraction of isoforms) -j (junction coverage) -c (coverage allowed for the predicted transcripts). The result is better, some merged ref are separated but there are still some merged together.

angarb · 2020-04-03T17:40:24Z

Hi, @Kennyluo4 @jokelley @gpertea!

Was a solution ever determined? We are having the same problem! We we love to incorporate novel splice sites into out analysis, but removing the -e option seems to result in these merged long transcripts spanning multiple genes.

Thanks for any input!

jokelley · 2020-04-03T23:35:54Z

This was never solved in Stringtie. We modified code and came up with a home-grown solution.

…

On Fri, Apr 3, 2020 at 10:40 AM angarb ***@***.***> wrote: Hi, @Kennyluo4 <https://github.com/Kennyluo4> @jokelley <https://github.com/jokelley> @gpertea <https://github.com/gpertea>! Was a solution ever determined? We are having the same problem! We we love to incorporate novel splice sites into out analysis, but removing the -e option seems to result in these merged long transcripts spanning multiple genes. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#190 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQA5SRZJJITWD7IFED2W3TRKYNRPANCNFSM4FMUNNUQ> .

Kennyluo4 · 2020-04-04T15:55:45Z

@angarb, You can only alleviate it by playing around the assembly parameters. Or you should adjust the alignment stringency to improve the alignment acuracy. If there are really many reads pairs linking the two "gene model"s, you should probably trust your data. Not all genome annotations are perfect and the evidence they used for annotation is not complete. There is chance that your assembled transcripts from your specific tissue/cell line are real despite the difference. If you realy trust the annotation and the alignment is very good, this issue may be related to alternative splicing or gene fusion events.
You can try other assemblers such as Trinity for the analysis.

m-waqas mentioned this issue Apr 30, 2020

Same gene id for multiple genes #270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stringtie merge merging multiple real genes #190

stringtie merge merging multiple real genes #190

jokelley commented Jul 28, 2018

Kennyluo4 commented Aug 2, 2019

jokelley commented Aug 2, 2019 via email

Kennyluo4 commented Aug 5, 2019

angarb commented Apr 3, 2020 •

edited

Loading

jokelley commented Apr 3, 2020 via email

Kennyluo4 commented Apr 4, 2020

stringtie merge merging multiple real genes #190

stringtie merge merging multiple real genes #190

Comments

jokelley commented Jul 28, 2018

Kennyluo4 commented Aug 2, 2019

jokelley commented Aug 2, 2019 via email

Kennyluo4 commented Aug 5, 2019

angarb commented Apr 3, 2020 • edited Loading

jokelley commented Apr 3, 2020 via email

Kennyluo4 commented Apr 4, 2020

angarb commented Apr 3, 2020 •

edited

Loading