-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement xGEMMTR and cblas_xGEMMTR #887
Implement xGEMMTR and cblas_xGEMMTR #887
Conversation
149c24c
to
68ae7c8
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #887 +/- ##
=========================================
Coverage 0.00% 0.00%
=========================================
Files 1937 1918 -19
Lines 190566 188614 -1952
=========================================
+ Misses 190566 188614 -1952 ☔ View full report in Codecov by Sentry. |
Thanks Martin. Two parts. (1) overall in favor. (2) rant against this specific GEMMT routine. To help me: who knows usage of GEMMT? (1) Oh my. Adding a routine in the reference Level 3 BLAS? I am overall in favor since GEMMT already exists in many BLAS implementations. And I am happy to see Level 3 BLAS evolves and adapts. (2-1) I must say that overall I am not a big fan of this GEMMT routine. I wonder why people need it. It seems popular enough. In many libraries. (Thanks for the survey.) But why do users need it? What are some examples of usage? (2-2) I would rather see a routine like (2-3) I wonder whether users need (2-4) If we use (2-5) Also I do not like the fact that the symmetry of (2-6) Speaking of (2-7) One argument that I can see is for GEMMT is that the users as a |
And let me add. The name. GEMMT. I understand that this is institutionalized in several BLAS libraries. Fine with me. But I find GEMMT a very cryptic name. I would have not have guessed the functionality from the GEMMT name. |
I share the doubts about adding industry standard extensions to what is (or used to be?) seen as the reference implementation of something that was largely agreed on by working groups decades ago. Though to some extent that would hold for Reference-LAPACK as well - is it the classical reference if old algorithms get upgraded and new ones added, or is it the new and improved UCDenver LAPACK ? |
At least some library using it, is MUMPS 5.6.1 in
IMHO, only allowing D to be a symmetric tridiagonal is a too large restriction.
For the most application cases I could think off, the combined variant would be better. For example, when solving a Lyapunov equation with the Bartels-Stewart algorithm and the Schur-Decomposition, one ends up with these type of updates.
The is also an implementation of this functionality in https://github.com/SLICOT/SLICOT-Reference/blob/main/src/MB01RU.f, which is used in the case I mentioned in (2-3). But the approach is horribly slow.
Again, I could mention https://github.com/SLICOT/SLICOT-Reference/blob/main/src/MB01RU.f at this point.
I implemented the routine for sake of completeness, since many of the optimized libraries provide it, but not regarding the meaningful usage, which is truly a medium to big problem as you pointed out. Ignoring the BLAS-like extensions, that are common to almost all the optimized libraries, is also not a good way on the other hand. First, if the reference implementation provides it, one had a bigger chance that the optimized version behave the same way. Second, while developing stuff on top of BLAS and LAPACK, many developers (at least the ones I know), switch back to the reference implementation for debugging purpose. Once a BLAS-like extensions is used, this either requires code changes or a build-system, which checks for the availability of the extensions. Third, if applications/libraries use BLAS-like extensions, they can crash library exchange systems like |
Even tough the BLAS standard is now a very old one, one can slowly adjust it a bit to match the changed requirements by the users (as long a now old functionality gets disturbed). Furthermore, there are some |
Would it be thinkable that the reference implements |
One major issue with And if I had forgotten this "detail". |
From the top of my head, I can say that GEMMT is used by ABINIT and possibly other Computational Chemistry, Molecular Modelling, and Materials Science packages. I can have a look into other packages that use GEMMT if necessary. |
Yet two additional packages, PETSc and VASP. |
After some search, I came up with the following packages that depend on MUMPS, PETSc, and ABINIT which could possibly benefit from GEMMT.
|
I was about to type this a few days ago but got distracted by work. However if this function is going to be implemented in the reference, it should actually bring in some significant benefits beyond an implicit 2xDGEMM calls if possible in my opinion. |
(1) I find the fact that the input/output matrix In any case, I think, at this point the name of the subroutine is set. The reason to include the routine is that it is widely used. Changing the name now would defeat the reason of including it. |
I think by "this function" you mean Advantage of Advantage of |
Sorry for my terse style, I was typing way quicker than I can make sense. Indeed that's what I meant.
Indeed. And fully agree with your following points.
I think we can live with a name change because there is no precedence to it, and like you mentioned, the naming is already not optimal. LAPACK names are already cryptic enough so in my seriously humble opinion, there is no reason to carry on the legacy of unofficial sources or suboptimal decisions made elsewhere. There is enough legacy within LAPACK to live with and. For anyone that needs to switch to the reference implementation of this "GEMMT" they will need to modify code anyways. Thus I think it gives the freedom to correct this naming. But then again that's just my opinion. Functionality-wise similar routines might be requested such as triangular-triangular multiplication that comes in Sylvester/Schur related algorithms. Hence I'm not sure halving the matmul flops is a killer feature. But again, these are all personal takes, if folks are using it apparently they need it. |
I agree that "similar routines might be requested such as triangular-triangular multiplication". We also had needs for similar "weird" kernels in <T>LAPACK. We were thinking to give names like matmult_AtrBtrCtr where each matrix has each shape in the interface. (Here tr for triangular for example.) "I'm not sure halving the matmul flops is a killer feature." FLOPS is one argument, but there are also arguments for (1) less storage, (2) in-place vs out-of-place and (3) access of data. |
Joining the thread because I'm interested in this functionality. I'm working on an iterative linear system solver with @PTNobel based on a block Krylov subspace method. This method (and many like it) need to compute small square matrices of the form I have a feature request for the hypothetical function that exploits the known symmetry of (Please excuse the lazy notation above w.r.t. |
In cuBLAS, similar functionality is called Likewise in rocBLAS / hipBLAS, they are
for a Hermitian matrix C, computing only the lower or upper triangle.
but again computes only the lower or upper triangle. In general, A B^H would be non-Hermitian, but
where D is Hermitian. A specific instance of that is when D is diagonal, e.g., generating a Hermitian matrix from its EVD or computing the backward error of an eigenvalue problem,
As a completely different application, it is also useful to compute a triangular matrix, instead of a Hermitian matrix, when the user knows that one of the triangles will be zero. This could be used in I agree the Also consider if we had this routine in other formats like packed or RFP, obviously the |
Another use case is in the Polar Decomposition. The QDWH iteration converges to a unitary matrix U, and then computes a Hermitian matrix H = U^H A. Again, we have special knowledge that U^H A is Hermitian. This is distinct from the use cases of A B A^H, with B Hermitian. @luszczek suggests |
@mgates3 very nice application! Yes, indeed, if for a given matrix A, you know U, the unitary polar factor of A, then you can recover the Hermitian polar factor by doing U^H * A. That's a really neat application. Thanks for bringing this up. And indeed the QDWH iterations algorithm does need GEMMT. It converges to U, the unitary polar factor of A, and so, "after convergence," it computes the Hermitian factor with U^H * A, which is exactly GEMMT. This is a very nice application. Thanks for mentioning. |
We need to speak about names. The PR named the routine GEMMT. The names HERKX and GEMMT are already used for this functionality in some BLAS libraries. Then HERKMM is kind of nice and was suggested by @luszczek. Then I suggested something like HEGEGEMM. But @mgates3 mentioned that this can be used for triangular matrix too, so maybe we could call this TRGEGEMM. If you have a strong opinion, write it below. |
The triangular application is rather weird and probably rare. How often is A*B a triangular matrix? So I wouldn't name it based on triangular. Interpreting the output as Hermitian (or symmetric) is much more common. I'm not sure why you're proposing repeating the
I prefer some variant of |
Hi @grisuthedragon. Would you be able to merge to/rebase with the current master? The script for the Github Actions needed to be updated. Thanks! |
5183843
to
b014222
Compare
Done. |
So finally, it seems that only the naming is the problem. I see, that from the typical BLAS naming scheme, the As far as I collect the stuff from the discussion, there are the following options: Already in use:
Potential names by functionality:
Since the functionality already exists in some implementations, I would go for one of the existing names. |
After the discussion on Reference-LAPACK#887 the name changed from xGEMMT to xGEMMTR.
da1646c
to
0e37c5c
Compare
Everyone, please review this commit. Comments welcome. Please comment by Thursday June 27th. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the minor Doxygen group fix. Otherwise it seems fine.
updated: * BLAS/SRC/sgemmtr.f * BLAS/SRC/zgemmtr.f * BLAS/SRC/cgemmtr.f * BLAS/SRC/dgemmtr.f
I changed and added the doxygen group. |
Description
The xGEMMT function implements the GEMM operation but only the lower or the upper triangle of C is accessed. The routine is provide by many optimized BLAS libraries but not by the reference implementation. Due to the fact, that projects like MUMPS used it makes it necessary to provide this function in the reference implementation. This enables projects, that used this function to in runtime environments that allow the exchange of the BLAS libraries, like update-alternative on Debian/Ubuntu or FlexiBLAS.
The xGEMMT function are provided for example by:
The PR includes subroutines, the tests, and the CBLAS interface.
EDIT After the discussion, the routine is now called
xGEMMTR
.Checklist