-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: Don't allocate dataStorage when using byteBuf #721
Conversation
Applications that don't read from files don't need the multi-KiB buffers allocated by blockDec.reset, so their peak memory usage can be reduced a bit. This slows down Decoder.Reset, though only by a few ns per call. The allocations part of benchstat's output shows only noise, because these allocations don't actually show up due to caching (verified by putting a print statement near the allocations in the old cod). name old time/op new time/op delta Decoder_DecoderReset/kppkn.gtb.zst/stream-8 395µs ± 5% 393µs ± 2% ~ (p=0.912 n=10+10) Decoder_DecoderReset/kppkn.gtb.zst/stream-single-8 41.2ns ± 0% 43.2ns ± 1% +4.66% (p=0.000 n=10+10) Decoder_DecoderReset/kppkn.gtb.zst/buffer-8 391µs ± 0% 390µs ± 0% ~ (p=0.063 n=10+10) Decoder_DecoderReset/kppkn.gtb.zst/buffer-single-8 390µs ± 0% 390µs ± 0% ~ (p=0.280 n=10+10) Decoder_DecoderReset/geo.protodata.zst/stream-8 188µs ± 3% 190µs ± 2% ~ (p=0.105 n=10+10) Decoder_DecoderReset/geo.protodata.zst/stream-single-8 41.2ns ± 0% 43.2ns ± 1% +4.73% (p=0.000 n=10+10) Decoder_DecoderReset/geo.protodata.zst/buffer-8 96.1µs ± 0% 95.7µs ± 0% -0.40% (p=0.008 n=9+9) Decoder_DecoderReset/geo.protodata.zst/buffer-single-8 95.8µs ± 0% 95.6µs ± 1% ~ (p=0.068 n=9+10) Decoder_DecoderReset/plrabn12.txt.zst/stream-8 609µs ± 4% 609µs ± 4% ~ (p=0.905 n=10+9) Decoder_DecoderReset/plrabn12.txt.zst/stream-single-8 41.2ns ± 0% 43.2ns ± 1% +4.79% (p=0.000 n=10+9) Decoder_DecoderReset/plrabn12.txt.zst/buffer-8 1.30ms ± 0% 1.29ms ± 0% -0.21% (p=0.004 n=10+10) Decoder_DecoderReset/plrabn12.txt.zst/buffer-single-8 1.30ms ± 0% 1.30ms ± 0% ~ (p=0.549 n=10+9) Decoder_DecoderReset/lcet10.txt.zst/stream-8 523µs ± 2% 518µs ± 3% ~ (p=0.408 n=8+10) Decoder_DecoderReset/lcet10.txt.zst/stream-single-8 41.2ns ± 0% 43.2ns ± 1% +4.71% (p=0.000 n=10+10) Decoder_DecoderReset/lcet10.txt.zst/buffer-8 954µs ± 0% 954µs ± 0% ~ (p=0.796 n=10+10) Decoder_DecoderReset/lcet10.txt.zst/buffer-single-8 954µs ± 0% 953µs ± 0% ~ (p=0.604 n=9+10) Decoder_DecoderReset/asyoulik.txt.zst/stream-8 339µs ± 3% 327µs ± 4% -3.71% (p=0.002 n=10+10) Decoder_DecoderReset/asyoulik.txt.zst/stream-single-8 41.2ns ± 1% 43.2ns ± 1% +4.77% (p=0.000 n=10+9) Decoder_DecoderReset/asyoulik.txt.zst/buffer-8 338µs ± 0% 338µs ± 1% ~ (p=0.912 n=10+10) Decoder_DecoderReset/asyoulik.txt.zst/buffer-single-8 338µs ± 0% 338µs ± 0% ~ (p=1.000 n=9+9) Decoder_DecoderReset/alice29.txt.zst/stream-8 363µs ± 3% 359µs ± 2% ~ (p=0.278 n=10+9) Decoder_DecoderReset/alice29.txt.zst/stream-single-8 41.2ns ± 0% 43.1ns ± 1% +4.62% (p=0.000 n=10+10) Decoder_DecoderReset/alice29.txt.zst/buffer-8 411µs ± 0% 411µs ± 1% ~ (p=1.000 n=10+10) Decoder_DecoderReset/alice29.txt.zst/buffer-single-8 411µs ± 0% 411µs ± 0% ~ (p=0.321 n=8+9) Decoder_DecoderReset/html_x_4.zst/stream-8 239µs ± 3% 244µs ± 1% ~ (p=0.028 n=10+9) Decoder_DecoderReset/html_x_4.zst/stream-single-8 41.6ns ± 3% 43.2ns ± 0% +3.80% (p=0.000 n=9+10) Decoder_DecoderReset/html_x_4.zst/buffer-8 263µs ± 1% 263µs ± 1% ~ (p=0.971 n=10+10) Decoder_DecoderReset/html_x_4.zst/buffer-single-8 263µs ± 1% 263µs ± 0% ~ (p=0.529 n=10+10) Decoder_DecoderReset/paper-100k.pdf.zst/stream-8 80.1µs ± 3% 80.1µs ± 2% ~ (p=0.370 n=9+8) Decoder_DecoderReset/paper-100k.pdf.zst/stream-single-8 41.2ns ± 1% 43.1ns ± 0% +4.58% (p=0.000 n=10+10) Decoder_DecoderReset/paper-100k.pdf.zst/buffer-8 22.9µs ± 2% 22.7µs ± 0% ~ (p=0.015 n=10+8) Decoder_DecoderReset/paper-100k.pdf.zst/buffer-single-8 22.7µs ± 0% 22.6µs ± 0% ~ (p=0.021 n=8+10) Decoder_DecoderReset/fireworks.jpeg.zst/stream-8 59.4µs ± 1% 59.7µs ± 3% ~ (p=0.079 n=9+8) Decoder_DecoderReset/fireworks.jpeg.zst/stream-single-8 41.2ns ± 0% 43.2ns ± 0% +4.70% (p=0.000 n=6+10) Decoder_DecoderReset/fireworks.jpeg.zst/buffer-8 13.2µs ± 0% 13.1µs ± 0% -0.96% (p=0.000 n=10+9) Decoder_DecoderReset/fireworks.jpeg.zst/buffer-single-8 13.2µs ± 0% 13.0µs ± 0% -0.91% (p=0.000 n=10+10) Decoder_DecoderReset/urls.10K.zst/stream-8 497µs ± 2% 503µs ± 2% ~ (p=0.143 n=10+10) Decoder_DecoderReset/urls.10K.zst/stream-single-8 41.3ns ± 0% 43.2ns ± 0% +4.75% (p=0.000 n=9+9) Decoder_DecoderReset/urls.10K.zst/buffer-8 1.12ms ± 0% 1.12ms ± 0% ~ (p=0.123 n=10+10) Decoder_DecoderReset/urls.10K.zst/buffer-single-8 1.12ms ± 0% 1.12ms ± 0% +0.21% (p=0.002 n=9+10) Decoder_DecoderReset/html.zst/stream-8 176µs ± 4% 179µs ± 3% ~ (p=0.113 n=9+10) Decoder_DecoderReset/html.zst/stream-single-8 41.2ns ± 0% 43.1ns ± 0% +4.72% (p=0.000 n=10+10) Decoder_DecoderReset/html.zst/buffer-8 104µs ± 0% 104µs ± 0% ~ (p=0.024 n=9+9) Decoder_DecoderReset/html.zst/buffer-single-8 103µs ± 1% 104µs ± 0% +0.65% (p=0.000 n=10+9) Decoder_DecoderReset/comp-data.bin.zst/stream-8 41.1µs ± 3% 42.6µs ± 4% +3.62% (p=0.001 n=10+9) Decoder_DecoderReset/comp-data.bin.zst/stream-single-8 41.2ns ± 0% 43.2ns ± 1% +4.77% (p=0.000 n=10+9) Decoder_DecoderReset/comp-data.bin.zst/buffer-8 10.1µs ± 0% 10.1µs ± 1% ~ (p=0.288 n=10+10) Decoder_DecoderReset/comp-data.bin.zst/buffer-single-8 10.1µs ± 0% 10.1µs ± 0% ~ (p=0.033 n=10+9) Decoder_DecodeAll/kppkn.gtb.zst-8 390µs ± 0% 390µs ± 0% ~ (p=0.842 n=9+10) Decoder_DecodeAll/geo.protodata.zst-8 95.6µs ± 0% 95.5µs ± 1% ~ (p=0.739 n=10+10) Decoder_DecodeAll/plrabn12.txt.zst-8 1.30ms ± 0% 1.29ms ± 0% ~ (p=0.497 n=9+10) Decoder_DecodeAll/lcet10.txt.zst-8 954µs ± 0% 954µs ± 0% ~ (p=1.000 n=10+9) Decoder_DecodeAll/asyoulik.txt.zst-8 338µs ± 0% 338µs ± 0% ~ (p=0.912 n=10+10) Decoder_DecodeAll/alice29.txt.zst-8 409µs ± 0% 410µs ± 1% ~ (p=0.052 n=10+10) Decoder_DecodeAll/html_x_4.zst-8 263µs ± 0% 264µs ± 0% +0.36% (p=0.008 n=10+9) Decoder_DecodeAll/paper-100k.pdf.zst-8 22.5µs ± 1% 22.6µs ± 1% ~ (p=0.127 n=10+10) Decoder_DecodeAll/fireworks.jpeg.zst-8 13.0µs ± 0% 12.8µs ± 1% -1.31% (p=0.000 n=9+10) Decoder_DecodeAll/urls.10K.zst-8 1.12ms ± 0% 1.12ms ± 1% ~ (p=0.043 n=10+10) Decoder_DecodeAll/html.zst-8 103µs ± 0% 103µs ± 0% ~ (p=0.043 n=10+10) Decoder_DecodeAll/comp-data.bin.zst-8 10.0µs ± 0% 10.0µs ± 1% ~ (p=1.000 n=9+10)
b.dataStorage = make([]byte, 0, cSize+compressedBlockOverAlloc) | ||
} else { | ||
b.dataStorage = make([]byte, 0, maxCompressedBlockSizeAlloc) | ||
if bb, ok := br.(*byteBuf); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this achieve the same, leaving the rest of the code?
if bb, ok := br.(*byteBuf); ok { | |
if bb, ok := br.(*byteBuf); !ok && cap(b.dataStorage) < cSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, but it appears to be quite a bit slower, even on the DecodeAll benchmarks. I'm not sure why yet.
name old time/op new time/op delta
Decoder_DecoderReset/kppkn.gtb.zst/stream-8 395µs ± 5% 425µs ± 3% +7.62% (p=0.001 n=10+5)
Decoder_DecoderReset/kppkn.gtb.zst/stream-single-8 41.2ns ± 0% 44.0ns ± 2% +6.73% (p=0.001 n=10+5)
Decoder_DecoderReset/kppkn.gtb.zst/buffer-8 391µs ± 0% 414µs ± 1% +6.07% (p=0.001 n=10+5)
Decoder_DecoderReset/kppkn.gtb.zst/buffer-single-8 390µs ± 0% 414µs ± 1% +5.99% (p=0.001 n=10+5)
Decoder_DecoderReset/geo.protodata.zst/stream-8 188µs ± 3% 197µs ± 2% +4.61% (p=0.001 n=10+5)
Decoder_DecoderReset/geo.protodata.zst/stream-single-8 41.2ns ± 0% 43.9ns ± 1% +6.34% (p=0.001 n=10+5)
Decoder_DecoderReset/geo.protodata.zst/buffer-8 96.1µs ± 0% 101.7µs ± 1% +5.93% (p=0.001 n=9+5)
Decoder_DecoderReset/geo.protodata.zst/buffer-single-8 95.8µs ± 0% 101.9µs ± 1% +6.37% (p=0.001 n=9+5)
Decoder_DecoderReset/plrabn12.txt.zst/stream-8 609µs ± 4% 645µs ± 4% +5.81% (p=0.005 n=10+5)
Decoder_DecoderReset/plrabn12.txt.zst/stream-single-8 41.2ns ± 0% 43.8ns ± 1% +6.36% (p=0.001 n=10+5)
Decoder_DecoderReset/plrabn12.txt.zst/buffer-8 1.30ms ± 0% 1.38ms ± 1% +6.43% (p=0.001 n=10+5)
Decoder_DecoderReset/plrabn12.txt.zst/buffer-single-8 1.30ms ± 0% 1.37ms ± 1% +5.81% (p=0.001 n=10+5)
Decoder_DecoderReset/lcet10.txt.zst/stream-8 523µs ± 2% 550µs ± 4% +5.14% (p=0.002 n=8+5)
Decoder_DecoderReset/lcet10.txt.zst/stream-single-8 41.2ns ± 0% 44.1ns ± 1% +7.05% (p=0.001 n=10+5)
Decoder_DecoderReset/lcet10.txt.zst/buffer-8 954µs ± 0% 1019µs ± 1% +6.76% (p=0.001 n=10+5)
Decoder_DecoderReset/lcet10.txt.zst/buffer-single-8 954µs ± 0% 1016µs ± 0% +6.46% (p=0.001 n=9+5)
Decoder_DecoderReset/asyoulik.txt.zst/stream-8 339µs ± 3% 353µs ± 2% +3.97% (p=0.013 n=10+5)
Decoder_DecoderReset/asyoulik.txt.zst/stream-single-8 41.2ns ± 1% 43.8ns ± 1% +6.37% (p=0.001 n=10+5)
Decoder_DecoderReset/asyoulik.txt.zst/buffer-8 338µs ± 0% 360µs ± 1% +6.45% (p=0.001 n=10+5)
Decoder_DecoderReset/asyoulik.txt.zst/buffer-single-8 338µs ± 0% 359µs ± 1% +5.99% (p=0.001 n=9+5)
Decoder_DecoderReset/alice29.txt.zst/stream-8 363µs ± 3% 383µs ± 3% +5.52% (p=0.001 n=10+5)
Decoder_DecoderReset/alice29.txt.zst/stream-single-8 41.2ns ± 0% 43.8ns ± 1% +6.27% (p=0.001 n=10+5)
Decoder_DecoderReset/alice29.txt.zst/buffer-8 411µs ± 0% 439µs ± 1% +6.89% (p=0.001 n=10+5)
Decoder_DecoderReset/alice29.txt.zst/buffer-single-8 411µs ± 0% 434µs ± 0% +5.78% (p=0.002 n=8+5)
Decoder_DecoderReset/html_x_4.zst/stream-8 239µs ± 3% 247µs ± 2% +3.75% (p=0.024 n=10+4)
Decoder_DecoderReset/html_x_4.zst/stream-single-8 41.6ns ± 3% 43.8ns ± 1% +5.27% (p=0.001 n=9+5)
Decoder_DecoderReset/html_x_4.zst/buffer-8 263µs ± 1% 280µs ± 1% +6.34% (p=0.001 n=10+5)
Decoder_DecoderReset/html_x_4.zst/buffer-single-8 263µs ± 1% 279µs ± 1% +6.08% (p=0.001 n=10+5)
Decoder_DecoderReset/paper-100k.pdf.zst/stream-8 80.1µs ± 3% 85.6µs ± 1% +6.82% (p=0.001 n=9+5)
Decoder_DecoderReset/paper-100k.pdf.zst/stream-single-8 41.2ns ± 1% 43.8ns ± 0% +6.15% (p=0.001 n=10+5)
Decoder_DecoderReset/paper-100k.pdf.zst/buffer-8 22.9µs ± 2% 24.1µs ± 1% +5.57% (p=0.001 n=10+5)
Decoder_DecoderReset/paper-100k.pdf.zst/buffer-single-8 22.7µs ± 0% 24.2µs ± 2% +6.49% (p=0.002 n=8+5)
Decoder_DecoderReset/fireworks.jpeg.zst/stream-8 59.4µs ± 1% 62.9µs ± 1% +6.05% (p=0.001 n=9+5)
Decoder_DecoderReset/fireworks.jpeg.zst/stream-single-8 41.2ns ± 0% 43.6ns ± 1% +5.73% (p=0.004 n=6+5)
Decoder_DecoderReset/fireworks.jpeg.zst/buffer-8 13.2µs ± 0% 14.0µs ± 1% +5.92% (p=0.001 n=10+5)
Decoder_DecoderReset/fireworks.jpeg.zst/buffer-single-8 13.2µs ± 0% 13.9µs ± 1% +5.65% (p=0.001 n=10+5)
Decoder_DecoderReset/urls.10K.zst/stream-8 497µs ± 2% 529µs ± 3% +6.45% (p=0.001 n=10+5)
Decoder_DecoderReset/urls.10K.zst/stream-single-8 41.3ns ± 0% 43.9ns ± 1% +6.38% (p=0.001 n=9+5)
Decoder_DecoderReset/urls.10K.zst/buffer-8 1.12ms ± 0% 1.19ms ± 0% +6.14% (p=0.001 n=10+5)
Decoder_DecoderReset/urls.10K.zst/buffer-single-8 1.12ms ± 0% 1.19ms ± 1% +6.01% (p=0.001 n=9+5)
Decoder_DecoderReset/html.zst/stream-8 176µs ± 4% 186µs ± 3% +5.67% (p=0.004 n=9+5)
Decoder_DecoderReset/html.zst/stream-single-8 41.2ns ± 0% 43.8ns ± 1% +6.44% (p=0.001 n=10+5)
Decoder_DecoderReset/html.zst/buffer-8 104µs ± 0% 110µs ± 1% +6.57% (p=0.001 n=9+5)
Decoder_DecoderReset/html.zst/buffer-single-8 103µs ± 1% 110µs ± 0% +6.48% (p=0.001 n=10+5)
Decoder_DecoderReset/comp-data.bin.zst/stream-8 41.1µs ± 3% 45.6µs ± 2% +10.88% (p=0.001 n=10+5)
Decoder_DecoderReset/comp-data.bin.zst/stream-single-8 41.2ns ± 0% 43.8ns ± 1% +6.22% (p=0.001 n=10+5)
Decoder_DecoderReset/comp-data.bin.zst/buffer-8 10.1µs ± 0% 10.8µs ± 1% +6.27% (p=0.001 n=10+5)
Decoder_DecoderReset/comp-data.bin.zst/buffer-single-8 10.1µs ± 0% 10.7µs ± 0% +5.89% (p=0.001 n=10+5)
Decoder_DecodeAll/kppkn.gtb.zst-8 390µs ± 0% 415µs ± 1% +6.43% (p=0.001 n=9+5)
Decoder_DecodeAll/geo.protodata.zst-8 95.6µs ± 0% 100.9µs ± 1% +5.61% (p=0.001 n=10+5)
Decoder_DecodeAll/plrabn12.txt.zst-8 1.30ms ± 0% 1.38ms ± 1% +6.53% (p=0.001 n=9+5)
Decoder_DecodeAll/lcet10.txt.zst-8 954µs ± 0% 1016µs ± 1% +6.50% (p=0.001 n=10+5)
Decoder_DecodeAll/asyoulik.txt.zst-8 338µs ± 0% 359µs ± 0% +6.14% (p=0.001 n=10+5)
Decoder_DecodeAll/alice29.txt.zst-8 409µs ± 0% 438µs ± 1% +7.11% (p=0.001 n=10+5)
Decoder_DecodeAll/html_x_4.zst-8 263µs ± 0% 279µs ± 1% +5.96% (p=0.001 n=10+5)
Decoder_DecodeAll/paper-100k.pdf.zst-8 22.5µs ± 1% 24.0µs ± 1% +6.53% (p=0.001 n=10+5)
Decoder_DecodeAll/fireworks.jpeg.zst-8 13.0µs ± 0% 13.7µs ± 0% +5.14% (p=0.001 n=9+5)
Decoder_DecodeAll/urls.10K.zst-8 1.12ms ± 0% 1.19ms ± 1% +6.36% (p=0.001 n=10+5)
Decoder_DecodeAll/html.zst-8 103µs ± 0% 110µs ± 1% +6.52% (p=0.001 n=10+5)
Decoder_DecodeAll/comp-data.bin.zst-8 10.0µs ± 0% 10.6µs ± 1% +5.95% (p=0.001 n=9+5)
Credit to @greatroar for finding this. Alternative to #721
@greatroar Could you check #741 - I can't see how that would have regressions. |
Credit to @greatroar for finding this. Alternative to #721
Applications that don't read from files don't need the multi-KiB buffers allocated by blockDec.reset, so their peak memory usage can be reduced a bit. This slows down Decoder.reset, though only by a few ns per call.
The allocations part of benchstat's output shows only noise, because these allocations don't actually show up due to caching (verified by putting a print statement near the allocations in the old code).