Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for faster shuffles #280

Open
velvia opened this issue Apr 2, 2020 · 2 comments
Open

Add support for faster shuffles #280

velvia opened this issue Apr 2, 2020 · 2 comments
Labels
Enhancement New feature or request

Comments

@velvia
Copy link

velvia commented Apr 2, 2020

Currently u32x8 shuffle1_dyn are not optimized and fallback is used which results in a whole mess of extract intrinsics. It is not very fast.

Can we please add support for _mm256_permutevar8x32_epi32 and similar variants at the u32x8 (and f32x8, etc.) levels? It is a fairly large speedup.

Thanks

@Lokathor Lokathor added the Enhancement New feature or request label Sep 22, 2020
@aldanor
Copy link

aldanor commented Dec 25, 2020

Wondering about this as well (it's 30x slower than what it should be, without warning the user).

(should this be posted to stdsimd repo?)

@Lokathor
Copy link
Contributor

Yes, all development has moved there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants