Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing genes in allGenes.results.txt #37

Open
sagarutturkar opened this issue Sep 12, 2018 · 4 comments
Open

Missing genes in allGenes.results.txt #37

sagarutturkar opened this issue Sep 12, 2018 · 4 comments

Comments

@sagarutturkar
Copy link

sagarutturkar commented Sep 12, 2018

I followed QoRTs "Example Walkthrough" to run JunctionSeq and got the results. However,
I noticed the number of aggregate genes in GFF vs allGenesresults file are different.

GFF file:
grep -c "aggregate_gene" withNovel.forJunctionSeq.gff = 54,720
Results file:
cut -f 2 allGenes.results.txt | sort | uniq | wc -l = 52447

The numbers does not match. Is there any default filtering applied in the writeCompleteResults method? This matters to me because a specific gene of interest which found to be DE (following a custom HtSeq-DESeq2 pipeline) was excluded from the final results derived from JunctionSeq.

@hartleys
Copy link
Owner

Can you send the exact commands run using JunctionSeq, and the log output it printed to the console?

Did you use the "use.multigene.aggregates" parameter? The default is FALSE.

@sagarutturkar
Copy link
Author

Thanks. I was able to get the information for all genes after using the parameter use.multigene.aggregates.

@MWSchmid
Copy link

Hi Steve

I recently processed some data with reads of 36 bp size and around 33% of all features were multigene aggregates (RNA-seq atlas for different mouse tissues). The problem is that a gene is classified entirely as multigene aggregate even if only one small exon is shared with another gene. Having also the information of the splice junction, I think that it's quite a loss to remove the entire gene because of a single exon.

I think that it would be better to label only the features that are really affected by multiple genes as multigene aggregates (instead of all features of a gene that overlaps at a small spot with another gene). Or to set a different default for use.multigene.aggregates (or at least make it more clear in the JunctionSeq user guide, at the moment it's just used in one code example but without further explanation).

Nice tools otherwise.

Best regards,

Marc

Besides - with the same annotation but different data (read length of 125 bp and different cell type) around 7.5 % are multigene aggregates. Is this difference due to the read length or due to the different cell/tissue type?

@scseekers
Copy link

@MWSchmid

Yes in DEXSeq the exons shared between two genes are removed rather than discarding the entire gene. However, the Junctionseq seems to throw away such genes which are kinda unfair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants