Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistently use FMA in neon butterfly3 #136

Merged
merged 1 commit into from
Feb 29, 2024
Merged

Conversation

ejmahler
Copy link
Owner

Small PR that makes explicit use of FMA in butterfly3.

Mild performance gains - 5-10% for the smallest butterflies, and the gains are smaller the more non-butterfly3 work it's doing. Still a clear win.

It's clear that the compiler doesn't automate mul -> add into a FMA, and I notice that the neon prime butterflies don't make any explicit use of FMA, so we stand to gain a lot by rewriting the neon prime butterflies to explicitly use FMA. That's a bigger task to update the automated script though so it's not included here.

@ejmahler ejmahler merged commit e9c4ec2 into master Feb 29, 2024
18 checks passed
@ejmahler ejmahler deleted the neon-fma-butterfly3 branch February 29, 2024 04:17
@HEnquist
Copy link
Contributor

Nice!

It's clear that the compiler doesn't automate mul -> add into a FMA,

Yes compilers don't usually automatically use fma. That gives one less rounding so the result is a tiny bit more accurate, but it's different so not completely equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants