[FEATURE] deinterleave for arbitrary n #1190

DenisYaroshevskiy · 2022-01-22T19:11:33Z

I did implementation for 2 and 4 but for all others requires more work.

Relevant stack overflows:
https://stackoverflow.com/a/55932030/5021064
https://stackoverflow.com/a/69083795/5021064

andipeer · 2024-03-20T08:08:31Z

In our software for physics computations, we often need to work with 3D vectors stored as AoS. Currently, we use the SSE intrinsics proposed in this article from Intel (I couldn't find it on their website anymore, but fortunately the Wayback Machine has it). We would like to get rid of our intrinsics code and use the EVE library instead, but as you stated, the interleaving functionality is currently not optimized for this task. Here is a gotbolt comparison of the code generated by the intrinsics and by deinterleave_groups(...): https://godbolt.org/z/v56c5vj51 For the compiler that we are using (GCC 11), this currently produces much longer and slower code.

Do you have any suggestion on how we could improve this using functionality that is already available in the EVE library? I tried to browse through the code, but from what I get, currently there is no high-level mapping for the _mm_shuffle_ps intrinsic to combine two registers using an arbitrary pattern. Would the way to go be to provide a SSE-specific implementation of deinterleave_groups_(...)? If so, I could try create a pull request for it.

DenisYaroshevskiy · 2024-03-20T09:04:35Z

Hi!

Thanks for sharing, it's pretty cool trick.
As a general problem this is very hard
Especially it's hard the way this code does it - where it does overlaping loads to achieve this effect.
The deinterleave_iterator that could do it this well is in the roadmap but I can't even begin to think where we might get to it.

What you want is to handwrite this function.
eve interoperates with intrinisics, no problem.

Here is the same code https://godbolt.org/z/WoqGWsKK3

I presume, if you are using eve, that you want an avx2 version of this code too. I don't know how to write that one.
If you want arm portability - my first guess it's vld3q_f32 instruction.

jfalcou · 2024-03-20T10:05:16Z

Depending on how your original code deals with its data, using a wide of struct in soa vector is maybe another way to do the migration.

DenisYaroshevskiy · 2024-03-20T10:38:12Z

+1 to Joel, can you maybe change storage format? You'd get very good perf

andipeer · 2024-03-20T14:00:08Z

Thanks for your quick and very elaborated answers! I agree that storing the data as SoA would be the best solution, but of course also the one that would induce the most changes to our code base. So for now, we will probably stick with AoS and continue to use the intrinsics for SSE, and have deinterleave_groups as a fallback for architectures were we currently don't have a hand-optimized version. The plan is to add AVX2 support in the near future, and maybe ARM in the long term.

Just want to mention that I really enjoy working with EVE! I've tried several other libraries, but none of those provided such a smooth experience when porting the code. Keep up the great work!

DenisYaroshevskiy added the feature New feature or request label Jan 22, 2022

jfalcou mentioned this issue Feb 9, 2023

[ROADMAP] Shuffle Issues #1554

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] deinterleave for arbitrary n #1190

[FEATURE] deinterleave for arbitrary n #1190

DenisYaroshevskiy commented Jan 22, 2022 •

edited

Loading

andipeer commented Mar 20, 2024

DenisYaroshevskiy commented Mar 20, 2024

jfalcou commented Mar 20, 2024 •

edited

Loading

DenisYaroshevskiy commented Mar 20, 2024

andipeer commented Mar 20, 2024

[FEATURE] deinterleave for arbitrary n #1190

[FEATURE] deinterleave for arbitrary n #1190

Comments

DenisYaroshevskiy commented Jan 22, 2022 • edited Loading

andipeer commented Mar 20, 2024

DenisYaroshevskiy commented Mar 20, 2024

jfalcou commented Mar 20, 2024 • edited Loading

DenisYaroshevskiy commented Mar 20, 2024

andipeer commented Mar 20, 2024

DenisYaroshevskiy commented Jan 22, 2022 •

edited

Loading

jfalcou commented Mar 20, 2024 •

edited

Loading