Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strawberry does not output any transcripts for our BAM file #21

Closed
lauraht opened this issue Jul 29, 2018 · 16 comments
Closed

Strawberry does not output any transcripts for our BAM file #21

lauraht opened this issue Jul 29, 2018 · 16 comments

Comments

@lauraht
Copy link

lauraht commented Jul 29, 2018

Hello!

I tried to run strawberry in the default mode (no -g option) and with default parameters on the alignment BAM file generated by minimap2. The BAM file was sorted and they are single-end reads of a human sample. The strawberry output gtf file does not contain any transcripts (it is practically empty but only contains a comment line). There was no error message.

Based on the warnings in execution.log (such as unreasonable intron sizes either too long or too short, and filtering intron with only 1 read support), I added the following options:

--min-support-4-intron 1.0 -J 1000000 -j 20

But the strawberry output gtf file still does not contain any transcripts. And there was no error message.

Does this mean Strawberry does not work with the alignment BAM file generated by minimap2? What may be the reason of not working?

For strand information, minimap2 uses ts tags instead of XS tags. So we added XS tags in the minimap2 generated BAM file, so that our BAM file contains XS tags to indicate the strand properly. I'd assume Strawberry uses XS tags in the BAM file to identify the strand, right?

I’d appreciate your advice very much.

I am also uploading tracking.log and execution.log here.

Thank you very much!

execution.log
tracking.log

@ruolin
Copy link
Owner

ruolin commented Jul 30, 2018

@lauraht Thanks for reporting this problem. Currently Strawberry does not support single-end reads for denovo assembly. I am working on it right now to extend to single end reads.

@lauraht
Copy link
Author

lauraht commented Jul 31, 2018

Hi Ruolin,

Thank you so much for your response!

Great to know you are working on extending Strawberry to support single-end reads.

I was wondering if it might be possible to get a rough idea about the approximate time frame this may become available? I’d greatly appreciate it.

Thank you very much!

@ruolin
Copy link
Owner

ruolin commented Jul 31, 2018

@lauraht Hi Lauraht. Thank you very much for your interest in Strawberry. I will try my best to make it available in a week.

@lauraht
Copy link
Author

lauraht commented Aug 1, 2018

Thanks a lot Ruolin!

@ruolin
Copy link
Owner

ruolin commented Aug 3, 2018

@lauraht Actually, I just used Hista2 to align a single-end RNA-seq data. And Strawberry works fine. Do you mind sharing your data? I probably just need the first few thousand lines of your bam file.

@lauraht
Copy link
Author

lauraht commented Aug 3, 2018

Hi Ruolin,

Here is a sub-portion (first a few thousand lines) of our BAM file:
sub_test.bam.gz

Thanks!

@ruolin
Copy link
Owner

ruolin commented Aug 3, 2018

@lauraht Thanks I can reproduce your errors now. Are these pacbio ccs reads?

@lauraht
Copy link
Author

lauraht commented Aug 6, 2018

Yes. Thanks!

@ruolin
Copy link
Owner

ruolin commented Aug 8, 2018

@lauraht Hi the problem is not actually because of the single end. This is a work to extend Strawberry to third generation data. I am working on it right now. But it might take a little bit longer.

@ruolin
Copy link
Owner

ruolin commented Aug 26, 2018

Hi @lauraht Sorry about the delay. Now I have added Pacbio ccs read support. Please check out https://github.com/ruolin/strawberry/releases/tag/1.0.1

@lauraht
Copy link
Author

lauraht commented Aug 26, 2018

Thank you very much Ruolin!

I will give it a try then.

@ruolin
Copy link
Owner

ruolin commented Aug 26, 2018

Sure. Please let me know whether it works or not.

@lauraht
Copy link
Author

lauraht commented Sep 2, 2018

It works! Thanks Ruolin!

For Pacbio ccs reads, we'd better use "--min-support-4-intron 1.0" when running Strawberry, right?

@ruolin
Copy link
Owner

ruolin commented Sep 2, 2018

I think I would actually use the default setting. Pacbio ccs is still less accurate than illumina, the splice alignment might not be good. However if you want supersensitive detection of transcripts, you can try 1. Based on the small test set, it seems to me the default is good. Do you have some ground truth so you can benchmark?

@lauraht
Copy link
Author

lauraht commented Sep 3, 2018

Thanks.
I don't really have the ground truth. But by matching with transcripts in the human reference annotation, I found that using "--min-support-4-intron 1.0" Strawberry assembles relatively more correct transcripts (that match the reference transcripts) than using the default setting.

@ruolin
Copy link
Owner

ruolin commented Sep 14, 2018

Thanks. That is good to know. I will close this issue for now.

@ruolin ruolin closed this as completed Sep 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants