-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
computeGCBias is really slow in deepTools2 compared to deepTools 1.5 #383
Comments
Is this with python2, python3, or both? |
|
Eeek |
I suspect that the 'chunksize' that is given to map_reduce is too small. |
That could be, the other big change is the switch from bx-python to twobitreader for the 2bit stuff. They're both python, but perhaps the latter isn't as performant. |
I ran a test with 11M reads and default parameters that took only 3 mins. @asrichter are you using a genome assembly with lot of contigs? I wonder if this is related to #385 |
Yes, I did. Please have a look at human genome assembly hs37d5. |
I had the same issue with many contigs before with plotEnrichment. So the fix that @dpryan79 used for plotEnrichment in my case earlier, should also work for computeGCbias i think.. But then I think we should have a generic solution to handle multiprocessing with many contigs. |
@vivekbhr Unlikely, though perhaps. I think the GC bias stuff is the only code not using the unified mapReduce module, so they won't then benefit from that. I need to run some tests with this and also python3 versions of tools to see where slow downs are happening (in the case of python3 there may be nothing we can do, since the language itself is simply slower than python2). |
I've started profiling 1.5.11 and 2.3.2 under python2 with the Input.bam file in the test dataset. I'll post more when that's done (it takes a bit of time when run single threaded...). |
cProfile says that the slowdown is from Under python3 this is a quick operation, but then |
I have a sneaking suspicion that this is the twobitreader. It's written in pure python I think, so it's the likely culprit. Performance was acceptable through 2.2.4 and then tanked in 2.3.0 when we made the switch. bx-python seems to be getting converted to python3 (it's needed for Galaxy), so perhaps we can switch back to it for 2bit stuff. |
Grr, the python3-compatible version of bx-python that's in development won't install on the central servers due to having some unusual dependencies. I'll just write a freaking 2bit file reader for python that's fast and can be installed everywhere :( |
Update: the C library is written and seems to work properly. I'll work on it a bit more today and try to get the python wrapper done today/tomorrow. |
Update: The python module is now written and seems to work. I'll do some performance comparisons to see if it resolves the problems (that will also debug things). |
Well that solves that issue. We can switch to py2bit for the 2.4 release. If someone really needs this earlier then they can either switch to any version before 2.3 or the develop branch once I implement and merge the feature branch that will contain this. I'll close this issue after having done so. |
That feature branch is now merged, closing this issue. I have a couple performance tests to do with py2bit (namely, whether it really makes sense to |
computeGCBias is VERY SLOW in the current deepTools 2 develop version. Even on small input BAM files with only thousands of alignments, it runs hours. When using 16 or more CPUs for example, each thread is at about 5-10% CPU usage only, also at directly attached storage without NFS bottleneck. The problem occurs in both python 2 and 3.
In deepTools 1.5.9.1, it used to be much faster.
Any ideas which change could have caused the performance loss?
The text was updated successfully, but these errors were encountered: