-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] deinterleave for arbitrary n #1190
Comments
In our software for physics computations, we often need to work with 3D vectors stored as AoS. Currently, we use the SSE intrinsics proposed in this article from Intel (I couldn't find it on their website anymore, but fortunately the Wayback Machine has it). We would like to get rid of our intrinsics code and use the EVE library instead, but as you stated, the interleaving functionality is currently not optimized for this task. Here is a gotbolt comparison of the code generated by the intrinsics and by Do you have any suggestion on how we could improve this using functionality that is already available in the EVE library? I tried to browse through the code, but from what I get, currently there is no high-level mapping for the |
Hi!
What you want is to handwrite this function. Here is the same code https://godbolt.org/z/WoqGWsKK3 I presume, if you are using eve, that you want an avx2 version of this code too. I don't know how to write that one. |
Depending on how your original code deals with its data, using a wide of struct in soa vector is maybe another way to do the migration. |
+1 to Joel, can you maybe change storage format? You'd get very good perf |
Thanks for your quick and very elaborated answers! I agree that storing the data as SoA would be the best solution, but of course also the one that would induce the most changes to our code base. So for now, we will probably stick with AoS and continue to use the intrinsics for SSE, and have Just want to mention that I really enjoy working with EVE! I've tried several other libraries, but none of those provided such a smooth experience when porting the code. Keep up the great work! |
I did implementation for 2 and 4 but for all others requires more work.
Relevant stack overflows:
https://stackoverflow.com/a/55932030/5021064
https://stackoverflow.com/a/69083795/5021064
The text was updated successfully, but these errors were encountered: