[x86-64] Bad codegen for certain SIMD intrinsics

I am in the process of converting the x86-64 SIMD modules in libjpeg-turbo to compiler intrinsics (libjpeg-turbo/libjpeg-turbo#732.)  However, I am encountering a problem whereby Clang/LLVM tries to outsmart me when it translates certain intrinsics into assembly code, and the resulting assembly code is often not smart at all.  Here is a good example:

https://godbolt.org/z/7YGrvGbnc

Clang translates the two intrinsics into seven assembly instructions, whereas GCC correctly translates them into the two assembly instructions that correspond to the intrinsics.  The result is that, when compiled with GCC, the intrinsics version of our color conversion algorithm performs as well as the NASM version, but when compiled with Clang, the intrinsics version regresses by 20-30%.

With AVX2, Clang translates the equivalent two intrinsics into two assembly instructions, but they are slower instructions than the instructions that correspond to the intrinsics:

https://godbolt.org/z/nzx7f16e7

If someone goes to the trouble of writing intrinsics that have a [documented 1:1 correspondence with assembly instructions](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html), it's because they are trying to talk to the hardware more directly.  The compiler really shouldn't second guess them in that case.

Is there a way to disable this behavior?

I have tried all of the `-O` options, to no avail.  I did observe that changing `-msse2` to `-mssse3` (or targeting any later SIMD instruction set, such as AVX2) causes Clang to compile `_mm_slli_si128()` and `_mm_unpackhi_epi8()` into `vpshufd` and `vpunpcklbw` rather than `pslldq` and `punpckhbw`.  That behavior is inscrutable, though, since `vpshufd` and `vpunpcklbw` are both SSE2 instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[x86-64] Bad codegen for certain SIMD intrinsics #159670

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[x86-64] Bad codegen for certain SIMD intrinsics #159670

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions