Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve benchmarks #12

Merged
merged 2 commits into from
Jul 23, 2020
Merged

Improve benchmarks #12

merged 2 commits into from
Jul 23, 2020

Conversation

milesgranger
Copy link
Owner

Make benchmarks more comprehensive as suggested in BurntSushi/rust-snappy#34

Copy link

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@milesgranger milesgranger merged commit 92e2814 into master Jul 23, 2020
@milesgranger milesgranger deleted the improve-benchmarks branch July 23, 2020 13:56
@milesgranger milesgranger mentioned this pull request Jul 23, 2020
@martindurant
Copy link

Update these benchmarks perhaps? Some time has passed, and c-snappy won't have changed, but I bet there's a chance that rust-snappy (snap) has, or that there's a faster, newer alternative.

@milesgranger
Copy link
Owner Author

Hey @martindurant!

Thanks for the push, looking back into it now, seems like (one of?) the most efficient ways to pass back the bytes from Rust is to use PyBytes::new_with and there, calculate the (de)compressed sizes, thus avoiding a double allocation. One for the (de)compression portion and another converting it to PyBytes for Python transferal.

After prototyping with that a bit, it seems like cramjam could be reliably faster than python-snappy, less a few cases:

Output from this evening's session.

--------------------------------------------------------------------------------------------------------- benchmark: 24 tests ----------------------------------------------------------------------------------------------------------
Name (time in us)                                             Min                   Max                  Mean             StdDev                Median                IQR            Outliers          OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-cramjam]        66.7232 (4.42)       223.8648 (1.51)        69.4540 (4.30)      6.1899 (2.01)        68.0690 (4.38)      0.7018 (3.47)      405;862  14,398.0150 (0.23)       6365           1
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-snappy]         52.7422 (3.50)       250.5509 (1.69)        54.7293 (3.39)      5.2059 (1.69)        53.5757 (3.45)      0.4624 (2.29)     827;2155  18,271.7505 (0.30)      14479           1
test_snappy_raw[alice29.txt-cramjam]                     291.6441 (19.33)      443.5740 (2.98)       299.0047 (18.52)    13.9264 (4.52)       293.5077 (18.88)     2.9434 (14.56)     396;745   3,344.4289 (0.05)       3267           1
test_snappy_raw[alice29.txt-snappy]                      599.9287 (39.77)      842.8260 (5.67)       612.4785 (37.93)    19.3221 (6.28)       603.6360 (38.83)    19.1506 (94.76)      157;57   1,632.7105 (0.03)       1532           1
test_snappy_raw[asyoulik.txt-cramjam]                    308.9788 (20.48)      579.4521 (3.90)       319.7238 (19.80)    24.1271 (7.84)       311.3754 (20.03)     8.2855 (41.00)     206;371   3,127.7000 (0.05)       2978           1
test_snappy_raw[asyoulik.txt-snappy]                     532.7296 (35.31)      931.2029 (6.27)       548.0748 (33.94)    35.2371 (11.45)      535.6853 (34.46)    16.2870 (80.59)     110;140   1,824.5686 (0.03)       1838           1
test_snappy_raw[fireworks.jpeg-cramjam]                   40.8231 (2.71)       405.6497 (2.73)        42.1765 (2.61)      4.8386 (1.57)        41.3642 (2.66)      0.5813 (2.88)     821;1427  23,709.8683 (0.38)      18561           1
test_snappy_raw[fireworks.jpeg-snappy]                    15.0851 (1.0)        229.4863 (1.54)        16.1488 (1.0)       3.0783 (1.0)         15.5442 (1.0)       0.4978 (2.46)    1578;2553  61,923.9929 (1.0)       36309           1
test_snappy_raw[geo.protodata-cramjam]                   106.9540 (7.09)       283.0932 (1.90)       110.6508 (6.85)      7.8300 (2.54)       108.2791 (6.97)      0.8768 (4.34)     631;1820   9,037.4441 (0.15)       7890           1
test_snappy_raw[geo.protodata-snappy]                    143.0362 (9.48)       510.1310 (3.43)       148.3856 (9.19)     10.6992 (3.48)       146.0779 (9.40)      2.3735 (11.74)     462;820   6,739.2005 (0.11)       5858           1
test_snappy_raw[html-cramjam]                            145.9508 (9.68)       511.5601 (3.44)       150.0290 (9.29)     10.3035 (3.35)       147.0926 (9.46)      0.7492 (3.71)     457;1320   6,665.3774 (0.11)       6255           1
test_snappy_raw[html-snappy]                             156.4212 (10.37)      331.1597 (2.23)       161.2253 (9.98)      9.9066 (3.22)       157.9481 (10.16)     0.9718 (4.81)     487;1283   6,202.4992 (0.10)       5832           1
test_snappy_raw[html_x_4-cramjam]                        156.4962 (10.37)      502.4560 (3.38)       161.3468 (9.99)     11.1545 (3.62)       158.2210 (10.18)     0.8270 (4.09)     447;1083   6,197.8281 (0.10)       5745           1
test_snappy_raw[html_x_4-snappy]                         634.0877 (42.03)      831.9281 (5.60)       649.9287 (40.25)    22.6907 (7.37)       639.3418 (41.13)    19.1698 (94.85)      187;84   1,538.6304 (0.02)       1541           1
test_snappy_raw[kppkn.gtb-cramjam]                       201.9522 (13.39)      320.4923 (2.16)       207.1594 (12.83)    10.3099 (3.35)       204.0809 (13.13)     1.4184 (7.02)      402;769   4,827.2011 (0.08)       4540           1
test_snappy_raw[kppkn.gtb-snappy]                        503.6397 (33.39)      713.1672 (4.80)       515.4781 (31.92)    20.4209 (6.63)       506.2551 (32.57)    15.6360 (77.37)     202;129   1,939.9465 (0.03)       1876           1
test_snappy_raw[lcet10.txt-cramjam]                      282.5959 (18.73)      507.0879 (3.41)       289.8384 (17.95)    16.2355 (5.27)       284.2164 (18.28)     2.4006 (11.88)     243;701   3,450.1979 (0.06)       3359           1
test_snappy_raw[lcet10.txt-snappy]                     1,590.2971 (105.42)   2,046.9069 (13.77)    1,627.8576 (100.80)   47.7467 (15.51)    1,615.7541 (103.95)   42.9840 (212.69)      56;32     614.3044 (0.01)        588           1
test_snappy_raw[paper-100k.pdf-cramjam]                   46.5959 (3.09)       172.2910 (1.16)        48.9973 (3.03)      6.1246 (1.99)        47.1906 (3.04)      0.2806 (1.39)    1057;2378  20,409.2833 (0.33)      11432           1
test_snappy_raw[paper-100k.pdf-snappy]                    20.4099 (1.35)       148.6209 (1.0)         21.6680 (1.34)      3.3306 (1.08)        21.1489 (1.36)      0.2021 (1.0)      782;4809  46,151.0589 (0.75)      18799           1
test_snappy_raw[plrabn12.txt-cramjam]                    340.2256 (22.55)      509.3641 (3.43)       348.9451 (21.61)    16.8228 (5.46)       342.2776 (22.02)     3.9656 (19.62)     143;356   2,865.7807 (0.05)       1652           1
test_snappy_raw[plrabn12.txt-snappy]                   2,184.4772 (144.81)   2,704.6911 (18.20)    2,232.1360 (138.22)   57.8854 (18.80)    2,213.1621 (142.38)   60.5900 (299.81)      38;13     448.0014 (0.01)        414           1
test_snappy_raw[urls.10K-cramjam]                        224.4790 (14.88)      483.0160 (3.25)       230.8938 (14.30)    15.5295 (5.04)       226.2061 (14.55)     1.7162 (8.49)      129;332   4,330.9956 (0.07)       1640           1
test_snappy_raw[urls.10K-snappy]                       1,811.8913 (120.11)   2,210.5193 (14.87)    1,848.8149 (114.49)   48.7328 (15.83)    1,836.4950 (118.15)   41.7850 (206.76)      45;28     540.8870 (0.01)        451           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@martindurant
Copy link

That's what I wanted to hear! It is interesting that the comparison is so favourable in some cases, but still marginally worse in others. Still: I'm convinced, and the rest of the algorithms were already better. I'll try to get some help with the conda-forge recipe, and then fastparquet can finally ditch python-snappy. In fact, there will be an argument to archive python-snappy (which I also co-maintain) eventually.

@martindurant
Copy link

If you are happy with this change, it should be worth a release.

@martindurant
Copy link

ping @milesgranger , would love to know when you think you might include this and release it, and please update the benchmarks in https://github.com/milesgranger/pyrus-cramjam/blob/master/benchmarks/README.md when ready.

@milesgranger
Copy link
Owner Author

Hi, at least until after the weekend most likely. And there are still some details to flesh out. That was just some prototyping; there are some issues with the approach that may require a fair amount of refactoring. I'll make a PR and ping you when ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants