-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366
Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366
Conversation
…ch64) only Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
Aside, not really to Aras per se, but more a "note to the rest of the OpenEXR team, while this topic is on our mind": In OIIO/OSL, the approach I took was to make simd wrapper classes -- within each method of which are I'm not necessarily advocating copying/using any of my code, and in fact there's probably plenty to criticize about my specific implementation. (As well as a variety of alternate implementations open sourced, with varying levels of complexity.) Just saying that isolating it in one header behind some simple wrapper classes can have big maintainability benefits. |
@@ -46,6 +46,10 @@ | |||
# define IMF_HAVE_NEON | |||
#endif | |||
|
|||
#if defined(__aarch64__) | |||
# define IMF_HAVE_NEON_AARCH64 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm groping in the dark without an AArch64 machine, but don't you mean the #define IMF_HAVE_NEON_AARCH64
to be inside the #if defined(__ARM_NEON)
above? Don't you want this only if __ARM_NEON
and __aarch64__
? Or does __aarch64__
sufficient itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64-bit ARM makes NEON the standard feature that's "always there", so no need for a separate check for it. Very similar to how x64 made SSE2 be guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
#if defined(__ARM_NEON) | ||
# define IMF_HAVE_NEON 1 | ||
#if defined(__aarch64__) | ||
# define IMF_HAVE_NEON_AARCH64 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this duplicate what's in ImfSimd.h
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but internal_zip.c already did this for all SIMD sets including SSE. I don't know the reasoning, just went with the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this file doesn't include ImfSimd.h, so that's appropriate.
Yeah that makes a lot of sense. OpenEXR does not use SIMD extensively (yet?), and for a very limited use the "SIMD wrapper" can be really short and compact. E.g. for my recent investigations into floating point data compression, handing both SSE and NEON in a single wrapper was total under 100 lines of code (simd.h). ASTC encoder also has a much more exntesive and fairly nice wrapper, astcenc_vecmathlib.h and friends. |
…ch64) only (AcademySoftwareFoundation#1366) Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
…ch64) only (AcademySoftwareFoundation#1366) Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
…ch64) only (#1366) Should fix #1365. Recent PR (#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
Should fix #1365. Recent PR (#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms.