Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366

aras-p · 2023-03-20T17:32:16Z

Should fix #1365. Recent PR (#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms.

…ch64) only Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

lgritz · 2023-03-20T17:58:39Z

Aside, not really to Aras per se, but more a "note to the rest of the OpenEXR team, while this topic is on our mind":

In OIIO/OSL, the approach I took was to make simd wrapper classes -- within each method of which are #if clauses for SSE, AVX, NEON, or no SIMD -- and that header is the only place where any simd intrinsics can be found in the entire code base. I had to do this for my own sanity, as even with the experience of having written this, I can never remember what the cryptic intrinsic names mean, so code littered with them is just gobbledygook to me unless I have the intrinsic man pages open and am looking them up line by line. And then if I find something like this issue (I've used the wrong intrinsic, etc.) it's usually isolated to a single place in one header to fix, even if it is ultimately called all over the place. It also means that if a new ISA comes out (e.g. a new set of AVX extensions, or porting to ARM), I can merely add some new #if clauses in that header and don't need to recode any actual "algorithms".

I'm not necessarily advocating copying/using any of my code, and in fact there's probably plenty to criticize about my specific implementation. (As well as a variety of alternate implementations open sourced, with varying levels of complexity.) Just saying that isolating it in one header behind some simple wrapper classes can have big maintainability benefits.

cary-ilm · 2023-03-20T18:17:58Z

src/lib/OpenEXR/ImfSimd.h

@@ -46,6 +46,10 @@
 #    define IMF_HAVE_NEON
 #endif

+#if defined(__aarch64__)
+#    define IMF_HAVE_NEON_AARCH64 1


I'm groping in the dark without an AArch64 machine, but don't you mean the #define IMF_HAVE_NEON_AARCH64 to be inside the #if defined(__ARM_NEON) above? Don't you want this only if __ARM_NEON and __aarch64__? Or does __aarch64__ sufficient itself?

64-bit ARM makes NEON the standard feature that's "always there", so no need for a separate check for it. Very similar to how x64 made SSE2 be guaranteed.

Got it, thanks.

cary-ilm · 2023-03-20T18:18:04Z

src/lib/OpenEXRCore/internal_zip.c

-#if defined(__ARM_NEON)
-#    define IMF_HAVE_NEON 1
+#if defined(__aarch64__)
+#    define IMF_HAVE_NEON_AARCH64 1


Doesn't this duplicate what's in ImfSimd.h above?

Yes, but internal_zip.c already did this for all SIMD sets including SSE. I don't know the reasoning, just went with the flow.

Right, this file doesn't include ImfSimd.h, so that's appropriate.

aras-p · 2023-03-20T19:56:19Z

was to make simd wrapper classes -- within each method of which are #if clauses for SSE, AVX, NEON, or no SIMD -- and that header is the only place where any simd intrinsics can be found in the entire code base

Yeah that makes a lot of sense. OpenEXR does not use SIMD extensively (yet?), and for a very limited use the "SIMD wrapper" can be really short and compact. E.g. for my recent investigations into floating point data compression, handing both SSE and NEON in a single wrapper was total under 100 lines of code (simd.h). ASTC encoder also has a much more exntesive and fairly nice wrapper, astcenc_vecmathlib.h and friends.

…ch64) only (AcademySoftwareFoundation#1366) Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

…ch64) only (#1366) Should fix #1365. Recent PR (#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms. Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

aras-p force-pushed the neon-fix-armv7 branch from e6f298d to a41a736 Compare March 20, 2023 17:33

lgritz approved these changes Mar 20, 2023

View reviewed changes

cary-ilm reviewed Mar 20, 2023

View reviewed changes

aras-p closed this Mar 20, 2023

aras-p reopened this Mar 20, 2023

cary-ilm merged commit f29c01b into AcademySoftwareFoundation:main Mar 20, 2023

This was referenced Mar 21, 2023

openexr 3.1.6 regression #2: ImfDwaCompressor does not compile on ARM v7 due to unguarded use of unavailable AARCH64 intrinsics #1367

Closed

Fix ARMv7 build for DwaCompressor, too. #1368

Merged

cary-ilm added the v3.1.7 label Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366

Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366

aras-p commented Mar 20, 2023

lgritz commented Mar 20, 2023

cary-ilm Mar 20, 2023

aras-p Mar 20, 2023

cary-ilm Mar 20, 2023

cary-ilm Mar 20, 2023

aras-p Mar 20, 2023

cary-ilm Mar 20, 2023

aras-p commented Mar 20, 2023

Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366

Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366

Conversation

aras-p commented Mar 20, 2023

lgritz commented Mar 20, 2023

cary-ilm Mar 20, 2023

Choose a reason for hiding this comment

aras-p Mar 20, 2023

Choose a reason for hiding this comment

cary-ilm Mar 20, 2023

Choose a reason for hiding this comment

cary-ilm Mar 20, 2023

Choose a reason for hiding this comment

aras-p Mar 20, 2023

Choose a reason for hiding this comment

cary-ilm Mar 20, 2023

Choose a reason for hiding this comment

aras-p commented Mar 20, 2023