Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast round #421

Open
NamorNiradnug opened this issue May 21, 2024 · 3 comments
Open

Fast round #421

NamorNiradnug opened this issue May 21, 2024 · 3 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@NamorNiradnug
Copy link

NamorNiradnug commented May 21, 2024

.round() function is very slow compared to platform-native intrinsic on AVX (https://godbolt.org/z/3sdd9jrvW) because it provides a platform-agnostic behavior. Although there are many use cases when the exact behavior on half-way values or INFs and NaNs doesn't matter.

I think adding somewhat like round_fast function is reasonable.

@NamorNiradnug NamorNiradnug added the C-feature-request Category: a feature request, i.e. not implemented / a PR label May 21, 2024
@programmerjake
Copy link
Member

programmerjake commented May 21, 2024

what you want is usually round_ties_even (not yet available on Simd), since that usually compiles to a single instruction

e.g.: https://godbolt.org/z/Tb8xvzqo7

@NamorNiradnug
Copy link
Author

what you want is usually round_ties_even (not yet available on Simd), since that usually compiles to a single instruction

Yet still there are maybe platforms where it's not the case. Or may be such an instruction is slower than another rounding instruction.

Although at least NEON, AVX and SSE all have round_ties_even instructions.

@NamorNiradnug
Copy link
Author

e.g.: https://godbolt.org/z/Tb8xvzqo7

Thanks for a workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

2 participants