Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sum and difference of products using fma #8

Open
fu5ha opened this issue Nov 5, 2019 · 6 comments
Open

Implement sum and difference of products using fma #8

fu5ha opened this issue Nov 5, 2019 · 6 comments
Labels
enhancement New feature or request

Comments

@fu5ha
Copy link
Owner

fu5ha commented Nov 5, 2019

https://pharr.org/matt/blog/2019/11/03/difference-of-floats.html

@fu5ha fu5ha added the enhancement New feature or request label Nov 5, 2019
@joeftiger
Copy link
Contributor

joeftiger commented Mar 23, 2021

Is this still a thing? I am wondering whether I want to give my time to create a PR for this, but as far as I understood it will take one additional instruction per method call:

mul add mul

vs

mul fma fma add

It will have higher precision/less errors, I am not sure however whether this should (if introduced) be introduced into other API calls.

EDIT: First call could be fma mul

@fu5ha
Copy link
Owner Author

fu5ha commented Mar 23, 2021

It shouldn't replace that pattern in other code wholesale, just providing methods like sum_of_products(a, b, c, d)/ difference_of_products(a, b, c, d)

@joeftiger
Copy link
Contributor

As a side note:
I used FMA wherever possible in my ray tracer and improved the runtime of a bench scene from 02:10 min to 01:12 min. It really astounded me that the compiler for rust is not smart enough to figure it out itself. The performance improvements using FMA are really impressive.

@fu5ha
Copy link
Owner Author

fu5ha commented Mar 23, 2021

There's a few reasons that Rust doesn't do that automatically. Foremost, because it provides a different answer and the Rust compiler will never optimize something that changes the result. Also, the default Rust target does not include the fma feature so it will compile to a call to libc instead of to an fma instruction directly. As such, when compiling not for a specific target, fma can actually make your code significantly slower.

Even in the case where it does successfully compile to an FMA, it's not a set in stone perf benefit. It depends on architecture and also the surrounding algorithm being used. If you have too many FMAs, you can saturate the limited number of FMA units on the ALU and get worse performance because you've stalled the pipeline. Also I suspect there might be side chain reasons that converting to FMA provided that much of a speed up in your case, even in the best case, just going to FMA shouldn't provide a 2x perf improvement. Perhaps the auto vectorizer was able to figure out your code better because of its new structure with FMAs and vectorize some hot code.

@joeftiger
Copy link
Contributor

Nice insights above! Thank you for your input :-)

I noticed that ultraviolet already uses fma in matrices for determinant e.g.
Would it make sense to use fma for vector's mag_sq() function as well?

@fu5ha
Copy link
Owner Author

fu5ha commented Mar 28, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants