-
Notifications
You must be signed in to change notification settings - Fork 18k
compress/bzip2: Slow performance #6754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
Sorry for the long delay. Work got in the way of writing a comparison between Go and Python. I hate two test scenarios- The first is a 1GB file of data from /dev/zero, bzip compressed. The second is a 1GB file of data from /dev/urandom also bzip compressed. The first should be a best case performance since all of the data is RLE encoded and the compressed file is a few hundred bytes. The second case should be a worst-case scenario where the data is not generally compressible and the compressed file is larger than the source. Results: Decompressing /home/tcameron/tmp/decompress/zeros.data.bz2 Go 1.1 Decompress time: 3.212 sec Py 2.7 Decompress time: 3.070 sec Decompressing /home/tcameron/tmp/decompress/random.data.bz2 Go 1.1 Decompress time: 528.765 sec Py 2.7 Decompress time: 104.724 sec Let's call the zeros.dat.bz2 test even. Milliseconds for this file do not really interest me. It is worth noting that Python's version is faster...but by less than a quarter of a second. This could be down to lots of things and I'm not necessarily interested in tracking them down. The random.dat.bz2 test is much more enlightening. Slower by a factor of >5 is surprising to me, and it equates to roughly 1.9MB/sec. I understand there hasn't been much effort to optimize the bzip library for speed, so I figured my real-world experience could be used to help the project in some way. My actual use case of this is a syslog file parser, which I've been writing to replace a Python script I previously wrote and to drive the lessons of Go into my brain. I see very similar results with text file processing, but since I can not offer the text files themselves for others to test with, I've tried something a bit more reproducible. These tests are being performed on a Lenovo T430 with an SSD, Intel Core i5-3320M CPU @ 2.60GHz, and 8GB RAM while plugged into an AC power source. The Operating System is Ubuntu 13.10 with Kernel 3.11.0-13-generic, x86_64 architecture. To review the source of each test application, please review my Github repos: https://github.com/tomc603/pycompresstest https://github.com/tomc603/gocompresstest |
After running the same tests with Go 1.2rc5 a couple times just to confirm I'm not crazy (still a possibility though), it seems data that is RLE is actually twice as slow as Go 1.1. For these particular tests, I'm not seeing a 30% increase in speed, though I'm exercising the two most extreme cases. Results: Decompressing /home/tcameron/tmp/decompress/zeros.data.bz2 Go 1.1 Decompress time: 3.000 sec Go 1.2rc5 Decompress time: 6.612 sec Decompressing /home/tcameron/tmp/decompress/random.data.bz2 Go 1.1 Decompress time: 534.020 sec Go 1.2rc5 Decompress time: 499.078 sec |
CL https://golang.org/cl/131840043 mentions this issue. |
CL https://golang.org/cl/131470043 mentions this issue. |
CL https://golang.org/cl/13852 mentions this issue. |
CL https://golang.org/cl/13853 mentions this issue. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Attachments:
The text was updated successfully, but these errors were encountered: