-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Is this package faster than DataDog/zstd now? #463
Comments
I wouldn't expect it to be faster. The C library is extremely optimized by a lot of dedicated people, but the wrapper may have its downsides. To be honest I don't really worry about it, since CGO is so undesirable to most. For streams I'd say it is mostly about 0.7x the speed of the c library, which is reasonable for most. For smaller objects (EncodeAll), it varies a bit more, but often close enough. My main goal is to have it "best in class" for Go algorithms for most workloads, which I think is reasonable to claim. Once I've finished up this rather large feature for s2 I will probably return to zstd. I would like to add a "single-threaded" stream decoder, and maybe improve the multithreaded decoder, alongside with a fully multithreaded encoder, that can utilize all cores. Time permitting of course. |
Thank you for the reply! Is there any chance you'd be able to share the benchmark source code that you're using to test Compression speed? Just now I tried compressing a 114 MB file using the streaming API, and your library appears 10X faster (300MB/s for your library, vs 30MB/s for datadog). I would like to compare the code I am using to see if I am just doing something wrong (although it seems unlikely, since I am just doing |
I have this rather clunky test-application: I use GOPATH for it, so I can test in-branch changes of my old stuff. Here are results for different inputs: https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit?usp=sharing This is one of the test scripts I use (windows bat script):
|
Ah, I think I made a basic mistake -- I am using bazel to build and run my benchmarks, and I forgot to run with Thanks! I would not have caught this without seeing |
I ran some benchmarks using a dataset consisting of a mixture of source code files and compiled binary files that have a compression ratio of ~0.33 on average.
I found that DataDog/zstd outperformed this library in only a few cases. In particular, I was seeing slightly less decompression throughput from kp/compress for small blobs when using the streaming APIs from each package (a few % difference). But for larger files, kp/compress had about 30% better decompression throughput. For compression, kp/compress nearly always has higher throughput when using the streaming API -- around 4X higher throughput for files less than 2MB, and around 10X higher for larger files (tested with files 24MB and up).
Is this expected? My experience using this lib does not match the benchmarks in the godoc, which show
zstd
as being faster across the board. Have improvements been made to this lib that aren't reflected in the godoc, or is it more likely that my methodology for benchmarking DataDog/zstd is not resulting in a fair comparison?My methodology is the following:
Compression
sync.Pool
to reuse encoders, and usingencoder.ReadFrom
the read-end of a pipe. Data is then written to the write-end of the pipe.NewWriter
every time. Usingio.Copy
to copy the read end of the pipe to the writer.Decompression
decoder.WriteTo
to get bytes out of it.NewReader
every time and usingio.Copy
to get bytes out of the reader.I can share all the data/code that I'm using if these results aren't expected, it will just take a bit of work to clean up and make it easily runnable -- figured I'd send out an initial probe to see whether these results are surprising or not.
The text was updated successfully, but these errors were encountered: