Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ARMv7 build by making recent ZIP NEON optimizations be ARMv8 (aarch64) only #1366

Merged
merged 1 commit into from
Mar 20, 2023

Conversation

aras-p
Copy link
Contributor

@aras-p aras-p commented Mar 20, 2023

Should fix #1365. Recent PR (#1348) added NEON accelerated code paths for ZIP filtering. But that code uses several instructions that are ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7) platforms. Make these optimizations only kick in when building for 64-bit ARM platforms.

…ch64) only

Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
@lgritz
Copy link
Contributor

lgritz commented Mar 20, 2023

Aside, not really to Aras per se, but more a "note to the rest of the OpenEXR team, while this topic is on our mind":

In OIIO/OSL, the approach I took was to make simd wrapper classes -- within each method of which are #if clauses for SSE, AVX, NEON, or no SIMD -- and that header is the only place where any simd intrinsics can be found in the entire code base. I had to do this for my own sanity, as even with the experience of having written this, I can never remember what the cryptic intrinsic names mean, so code littered with them is just gobbledygook to me unless I have the intrinsic man pages open and am looking them up line by line. And then if I find something like this issue (I've used the wrong intrinsic, etc.) it's usually isolated to a single place in one header to fix, even if it is ultimately called all over the place. It also means that if a new ISA comes out (e.g. a new set of AVX extensions, or porting to ARM), I can merely add some new #if clauses in that header and don't need to recode any actual "algorithms".

I'm not necessarily advocating copying/using any of my code, and in fact there's probably plenty to criticize about my specific implementation. (As well as a variety of alternate implementations open sourced, with varying levels of complexity.) Just saying that isolating it in one header behind some simple wrapper classes can have big maintainability benefits.

@@ -46,6 +46,10 @@
# define IMF_HAVE_NEON
#endif

#if defined(__aarch64__)
# define IMF_HAVE_NEON_AARCH64 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm groping in the dark without an AArch64 machine, but don't you mean the #define IMF_HAVE_NEON_AARCH64 to be inside the #if defined(__ARM_NEON) above? Don't you want this only if __ARM_NEON and __aarch64__? Or does __aarch64__ sufficient itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64-bit ARM makes NEON the standard feature that's "always there", so no need for a separate check for it. Very similar to how x64 made SSE2 be guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

#if defined(__ARM_NEON)
# define IMF_HAVE_NEON 1
#if defined(__aarch64__)
# define IMF_HAVE_NEON_AARCH64 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this duplicate what's in ImfSimd.h above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but internal_zip.c already did this for all SIMD sets including SSE. I don't know the reasoning, just went with the flow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this file doesn't include ImfSimd.h, so that's appropriate.

@aras-p aras-p closed this Mar 20, 2023
@aras-p
Copy link
Contributor Author

aras-p commented Mar 20, 2023

was to make simd wrapper classes -- within each method of which are #if clauses for SSE, AVX, NEON, or no SIMD -- and that header is the only place where any simd intrinsics can be found in the entire code base

Yeah that makes a lot of sense. OpenEXR does not use SIMD extensively (yet?), and for a very limited use the "SIMD wrapper" can be really short and compact. E.g. for my recent investigations into floating point data compression, handing both SSE and NEON in a single wrapper was total under 100 lines of code (simd.h). ASTC encoder also has a much more exntesive and fairly nice wrapper, astcenc_vecmathlib.h and friends.

@aras-p aras-p reopened this Mar 20, 2023
@cary-ilm cary-ilm merged commit f29c01b into AcademySoftwareFoundation:main Mar 20, 2023
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 26, 2023
…ch64) only (AcademySoftwareFoundation#1366)

Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 26, 2023
…ch64) only (AcademySoftwareFoundation#1366)

Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
cary-ilm pushed a commit that referenced this pull request Mar 28, 2023
…ch64) only (#1366)

Should fix #1365. Recent PR (#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
@cary-ilm cary-ilm added the v3.1.7 label Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

openexr 3.1.6 regression: does not compile on ARM v7 due to unguarded use of unavailable AARCH64 intrinsics
3 participants