-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM-specific optimisations for inflate. #256
Conversation
Change-Id: Id4cda552b39bfb39ab35ec499dbe122b43b6d1a1
In inflate_fast() the output pointer always has plenty of room to write. This means that so long as the target is capable, wide un-aligned loads and stores can be used to transfer several bytes at once. When the reference distance is too short simply unroll the data a little to increase the distance. Change-Id: I59854eb25d2b1e43561c8a2afaf9175bf10cf674
@ProgramMax, more of the PNG optimisation here, FYI. Corresponding Chromium patch (now with green bots!) is here. |
Thank you for pinging me. :) |
exploited. | ||
*/ | ||
static inline unsigned char FAR *chunkcopy_safe(unsigned char FAR *out, | ||
const unsigned char FAR * Z_RESTRICT from, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you can use Z_RESTRICT here. Maybe that's true if you came in via inflate.c, but maybe not if you came in via infback.c.
There's a longer discussion of that at https://chromium-review.googlesource.com/c/chromium/src/+/641575/4/third_party/zlib/contrib/arm/chunkcopy.h#230
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination is to give infback.c and inflate.c different implementations; but it could still be argued that the assumption is too dangerous for something, somewhere out there, written in the last twenty-something years.
This is past-life work, now, and I'm not sure how I'm supposed to reconcile that now that I need to fix it. So I won't. |
I can fix it and add it to the Adler-32 + CRC32 merge request in: #251 |
This adds two optimizations for ARM: NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs) ARM(v7+) specific optimization for inflate I've also connected inflate optimization to the build using the following source as template. mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16 Additional info: https://codereview.chromium.org/2676493007/ https://codereview.chromium.org/2722063002/ Sources: madler/zlib#251 (only the first commit) madler/zlib#256 Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
This adds two optimizations for ARM: NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs) ARM(v7+) specific optimization for inflate I've also connected inflate optimization to the build using the following source as template. mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16 Additional info: https://codereview.chromium.org/2676493007/ https://codereview.chromium.org/2722063002/ Sources: madler/zlib#251 (only the first commit) madler/zlib#256 Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
This adds two optimizations for ARM: NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs) ARM(v7+) specific optimization for inflate I've also connected inflate optimization to the build using the following source as template. mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16 Additional info: https://codereview.chromium.org/2676493007/ https://codereview.chromium.org/2722063002/ Sources: madler/zlib#251 (only the first commit) madler/zlib#256 Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
In
inflate_fast()
the output pointer always has plenty of room to write. This means that so long as the target is capable, wide un-aligned loads and stores can be used to transfer several bytes at once. When the reference distance is too short simply unroll the data a little to increase the distance.For PNG decode this comes out at about 33% faster overall across a wide set of files. Small PNGs tend to benefit the least because they don't ever enter into
inflate_fast()
where the most straightforward assumptions can be made.