Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve FMA usage in laqr5 #681

Merged
merged 1 commit into from
Jun 25, 2022
Merged

Improve FMA usage in laqr5 #681

merged 1 commit into from
Jun 25, 2022

Conversation

angsch
Copy link
Collaborator

@angsch angsch commented Jun 20, 2022

Description

Rearrange the application of the Householder reflector to save one instruction per dot product if fused-multiply-add is available. The proposed compute pattern is already realized by dlahqr.

Rearrange the application of the Householder reflector
to save one instruction per dot product if FMA is
available.

The update from the right, H * (I - tau * v * v**T),
for example, changes from
    H - (tau * (H * v)) * v**T
to
    H - (H * (v * tau)) * v**T.
The instruction savings are due to the special structure
of v, whose first component is implicitly one (and used
for storing tau).
@codecov
Copy link

codecov bot commented Jun 20, 2022

Codecov Report

Merging #681 (6c53bb3) into master (f40d220) will not change coverage.
The diff coverage is 0.00%.

❗ Current head 6c53bb3 differs from pull request most recent head bd8f99b. Consider uploading reports for the commit bd8f99b to get more accurate results

@@           Coverage Diff           @@
##           master     #681   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1894     1894           
  Lines      184062   184140   +78     
=======================================
- Misses     184062   184140   +78     
Impacted Files Coverage Δ
SRC/claqr5.f 0.00% <0.00%> (ø)
SRC/dlaqr5.f 0.00% <0.00%> (ø)
SRC/slaqr5.f 0.00% <0.00%> (ø)
SRC/zlaqr5.f 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f40d220...bd8f99b. Read the comment docs.

Copy link
Collaborator

@thijssteel thijssteel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR. Wouldn't surprise me if these small changes result in a noticeable speedup.

H( K+1, K+1 ) = H( K+1, K+1 ) - REFSUM
H( K+2, K+1 ) = H( K+2, K+1 ) - REFSUM*V( 2, M )
H( K+3, K+1 ) = H( K+3, K+1 ) - REFSUM*V( 3, M )
T1 = CONJG( V( 1, M ) )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only a single column, this probably doesn't actually add anything performance wise, but I like it for consistency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, Thijs, I missed that one. I will upload a revision in the few days. My experiments show an improvement for small problems; less flops are good for accuracy and performance. Let me use that opportunity to say thank you for your contribution with the optimal bulge packing. It's great work :-)

@langou langou merged commit 7d90a67 into Reference-LAPACK:master Jun 25, 2022
@angsch angsch deleted the laqr5 branch July 7, 2022 19:14
@julielangou julielangou added this to the LAPACK 3.11.0 milestone Nov 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants