Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plotEnrichment or a similar tool #329

Closed
dpryan79 opened this issue Apr 4, 2016 · 5 comments
Closed

plotEnrichment or a similar tool #329

dpryan79 opened this issue Apr 4, 2016 · 5 comments
Assignees
Milestone

Comments

@dpryan79
Copy link
Collaborator

dpryan79 commented Apr 4, 2016

It's been internally requested that we have a tool that can produce things like (A) FRiP scores or (B) histograms of reads per feature type (exons, UTR, etc.). Such a tool might be called plotEnrichment. The output would be something like an image and an optional table of values. The input would be a BED or GTF file. One difficulty is that deeptoolsintervals is currently only storing a single feature type. libGTF, on which it's based, can store everything, so this shouldn't be a real problem. What I expect I'll do is implement a slightly tweaked GTF/BED parser only for this tool that will end up using the same C code. Then, we'll use mapReduce (possibly in a slightly modified form) to chop the genome into bins and then iterate over reads in each (after first checking to ensure that there's at least one GTF/BED feature in said region).

I guess there's a reason that I implemented counting overlap sets in libGTF...

@dpryan79 dpryan79 self-assigned this Apr 4, 2016
@dpryan79
Copy link
Collaborator Author

dpryan79 commented Apr 6, 2016

There's now a feature/plotEnrichment branch where this is largely implemented (though it's a definite work in progress) via the command plotEnrichment. The image output is as below:

enrichment example

peaks are any regions in a BED file, whereas for GTF files the feature column is used. I would suggest that if people want introns that they annotated them (or just vaguely estimate them as the difference between gene and exon, though the real value will be higher).

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Apr 6, 2016

@thomasmanke, since you were asking about this.

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Apr 7, 2016

As requested, BED files now get individual labels (either the file name or whatever you specify with --regionLabels). GTF files will always use the feature column and ignore --regionLabels. Here are two bed files (genes19.bed and genesX.bed) with new labels:

enrichment example

@dpryan79
Copy link
Collaborator Author

dpryan79 commented Apr 17, 2016

I've merged the branch containing this into develop. I'll add the Galaxy wrapper on Monday and then close the issue.

@dpryan79 dpryan79 added this to the 2.3 milestone Apr 17, 2016
@dpryan79
Copy link
Collaborator Author

The wrapper is now included (and being tested by planemo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant