-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: Improve decoder memcopy #637
Merged
Merged
Commits on Jul 4, 2022
-
Improve memcopy for small matches. Up to 30% increased throughput, depending on input. ``` benchmark old MB/s new MB/s speedup Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32 1284.77 1525.03 1.19x Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32 1107.87 1614.28 1.46x Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32 3947.25 4100.49 1.04x Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32 10281.12 10316.14 1.00x Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32 8115.99 8829.85 1.09x Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32 1578.08 2290.47 1.45x Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32 17079.65 16716.41 0.98x Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32 2020.09 2166.56 1.07x Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32 35781.31 35745.53 1.00x Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32 33125.43 32785.93 0.99x Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32 19394.38 19643.49 1.01x Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32 10494.30 10653.09 1.02x Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32 7425.77 7506.51 1.01x Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32 2855.17 3396.09 1.19x benchmark old MB/s new MB/s speedup BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-32 537.74 651.27 1.21x BenchmarkDecoder_DecoderSmall/geo.protodata.zst-32 1500.59 1610.11 1.07x BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-32 410.13 505.82 1.23x BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-32 467.83 601.25 1.29x BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-32 434.53 530.71 1.22x BenchmarkDecoder_DecoderSmall/alice29.txt.zst-32 433.95 544.87 1.26x BenchmarkDecoder_DecoderSmall/html_x_4.zst-32 2860.31 3189.40 1.12x BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-32 5336.43 5437.24 1.02x BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-32 12327.10 12350.86 1.00x BenchmarkDecoder_DecoderSmall/urls.10K.zst-32 660.52 774.52 1.17x BenchmarkDecoder_DecoderSmall/html.zst-32 1076.67 1284.53 1.19x BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-32 569.30 576.15 1.01x BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32 812.16 813.72 1.00x BenchmarkDecoder_DecodeAll/geo.protodata.zst-32 1943.14 1933.04 0.99x BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32 712.27 715.46 1.00x BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32 688.23 775.97 1.13x BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32 702.87 700.17 1.00x BenchmarkDecoder_DecodeAll/alice29.txt.zst-32 717.44 720.89 1.00x BenchmarkDecoder_DecodeAll/html_x_4.zst-32 1960.55 1968.90 1.00x BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32 5981.50 6169.12 1.03x BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32 13140.18 13145.86 1.00x BenchmarkDecoder_DecodeAll/urls.10K.zst-32 983.71 988.16 1.00x BenchmarkDecoder_DecodeAll/html.zst-32 1624.80 1624.92 1.00x BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32 569.84 570.96 1.00x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/fastest-32 504.31 622.83 1.24x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/default-32 564.68 717.57 1.27x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/better-32 615.18 766.33 1.25x BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/best-32 786.17 857.17 1.09x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/fastest-32 12860.99 12870.57 1.00x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/default-32 619.06 617.54 1.00x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/better-32 630.33 625.20 0.99x BenchmarkDecoder_DecodeAllFiles/.tracker.bin/best-32 609.12 612.50 1.01x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-32 658.22 659.45 1.00x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-32 723.60 729.95 1.01x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-32 735.73 737.52 1.00x BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-32 745.43 749.55 1.01x BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-32 12801.86 12967.61 1.01x BenchmarkDecoder_DecodeAllFiles/e.txt/default-32 680.29 677.69 1.00x BenchmarkDecoder_DecodeAllFiles/e.txt/better-32 739.23 733.45 0.99x BenchmarkDecoder_DecodeAllFiles/e.txt/best-32 820.16 825.62 1.01x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-32 1186.63 1194.87 1.01x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-32 1384.74 1412.45 1.02x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-32 1104.17 1107.00 1.00x BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-32 409.59 409.27 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-32 392.32 391.89 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-32 296.47 296.65 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-32 296.52 296.68 1.00x BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-32 299.85 295.83 0.99x BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-32 988.75 996.39 1.01x BenchmarkDecoder_DecodeAllFiles/html.txt/default-32 987.11 989.51 1.00x BenchmarkDecoder_DecodeAllFiles/html.txt/better-32 1027.64 1038.21 1.01x BenchmarkDecoder_DecodeAllFiles/html.txt/best-32 973.41 989.86 1.02x BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-32 12976.96 13045.11 1.01x BenchmarkDecoder_DecodeAllFiles/pi.txt/default-32 678.88 674.53 0.99x BenchmarkDecoder_DecodeAllFiles/pi.txt/better-32 746.38 747.36 1.00x BenchmarkDecoder_DecodeAllFiles/pi.txt/best-32 823.52 827.84 1.01x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-32 2115.58 2121.84 1.00x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-32 1767.98 1779.35 1.01x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-32 2306.86 2328.47 1.01x BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-32 1660.52 1684.65 1.01x BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-32 13027.08 12999.49 1.00x BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-32 13054.18 13084.25 1.00x BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-32 13067.23 13099.47 1.00x BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-32 13079.77 13104.13 1.00x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/fastest-32 10354.84 11838.70 1.14x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/default-32 11557.12 13404.78 1.16x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/better-32 12644.67 14519.37 1.15x BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/best-32 15934.00 17312.77 1.09x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/fastest-32 35354.57 34836.95 0.99x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/default-32 11392.27 11275.11 0.99x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/better-32 11793.77 11771.24 1.00x BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/best-32 11203.91 11142.52 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-32 12089.54 11983.77 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-32 12604.67 12514.75 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-32 13265.79 13152.64 0.99x BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-32 13078.85 12983.91 0.99x BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-32 52477.17 52657.54 1.00x BenchmarkDecoder_DecodeAllFilesP/e.txt/default-32 11947.06 11809.75 0.99x BenchmarkDecoder_DecodeAllFilesP/e.txt/better-32 13184.17 13140.65 1.00x BenchmarkDecoder_DecodeAllFilesP/e.txt/best-32 14630.26 14718.01 1.01x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-32 3013.25 3088.05 1.02x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-32 3125.61 3091.48 0.99x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-32 3181.68 3034.74 0.95x BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-32 3351.22 3526.91 1.05x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-32 1188.15 1136.88 0.96x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-32 1215.39 1193.99 0.98x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-32 1219.20 1206.23 0.99x BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-32 1216.72 1200.26 0.99x BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-32 16901.32 17076.26 1.01x BenchmarkDecoder_DecodeAllFilesP/html.txt/default-32 16819.66 16892.32 1.00x BenchmarkDecoder_DecodeAllFilesP/html.txt/better-32 17805.12 17873.77 1.00x BenchmarkDecoder_DecodeAllFilesP/html.txt/best-32 16916.87 17184.02 1.02x BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-32 52314.15 51687.88 0.99x BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-32 11878.94 11778.57 0.99x BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-32 13303.16 13162.44 0.99x BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-32 14622.76 14717.80 1.01x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-32 34134.48 37031.10 1.08x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-32 33589.32 35277.28 1.05x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-32 43754.89 44761.13 1.02x BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-32 32422.22 34107.42 1.05x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-32 52706.00 52396.81 0.99x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-32 52527.76 52048.36 0.99x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-32 52177.25 52688.64 1.01x BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-32 52443.28 52799.86 1.01x BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32 13992.47 13994.15 1.00x BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32 34107.95 34221.23 1.00x BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32 12012.34 11976.30 1.00x BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32 12630.22 13384.70 1.06x BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32 12327.02 12251.04 0.99x BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32 11932.73 11896.92 1.00x BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32 31233.38 36258.56 1.16x BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32 97435.31 100317.73 1.03x BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32 62247.22 62306.36 1.00x BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32 18659.58 18592.14 1.00x BenchmarkDecoder_DecodeAllParallel/html.zst-32 28464.78 28519.30 1.00x BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32 3114.03 3297.01 1.06x BenchmarkDecoderSilesia/multithreaded-writer-32 1099.69 1104.92 1.00x BenchmarkDecoderSilesia/multithreaded-writer-himem-32 1093.10 1102.98 1.01x BenchmarkDecoderSilesia/singlethreaded-writer-32 803.85 818.55 1.02x BenchmarkDecoderSilesia/singlethreaded-writerto-32 812.83 828.19 1.02x BenchmarkDecoderSilesia/singlethreaded-himem-32 813.14 828.32 1.02x BenchmarkDecoderEnwik9/multithreaded-writer-32 877.55 996.49 1.14x BenchmarkDecoderEnwik9/multithreaded-writer-himem-32 961.20 1036.76 1.08x BenchmarkDecoderEnwik9/singlethreaded-writer-32 632.07 631.96 1.00x BenchmarkDecoderEnwik9/singlethreaded-writerto-32 634.62 634.52 1.00x BenchmarkDecoderEnwik9/singlethreaded-himem-32 763.68 758.40 0.99x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-32 1626.86 1730.88 1.06x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-himem-32 2299.80 2375.04 1.03x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writer-32 1221.34 1221.43 1.00x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writerto-32 1236.18 1237.97 1.00x BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-himem-32 1749.21 1754.96 1.00x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-32 839.51 933.63 1.11x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-himem-32 1055.54 1100.37 1.04x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writer-32 574.91 613.88 1.07x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writerto-32 579.19 618.72 1.07x BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-himem-32 780.67 867.96 1.11x ```
Configuration menu - View commit details
-
Copy full SHA for 6251e7e - Browse repository at this point
Copy the full SHA 6251e7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e0adf7 - Browse repository at this point
Copy the full SHA 5e0adf7View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.