-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getrs implementation #338
getrs implementation #338
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine if they pass the unit tests.
@ndellingwood could we include this PR in the promotion ? This won't hurt trilinos integration testing. |
Thanks @kyungjoo-kim . Unit tests passed. |
@vqd8a Did you also check the cmake modification to include unit tests ? src directory should be fine as they are merely hpp files and they are grabbed by *.hpp. However, the testing files may need to be included manually in the cmake. |
//Second, compute X by solving the system U*X = Y for X | ||
SerialTrsm<Side::Left,Uplo::Upper,Trans::NoTranspose,Diag::NonUnit,Algo::Trsm::Unblocked>::invoke(one, A, B); | ||
|
||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return an error code if it's always zero? It would be better to follow the LAPACK error code convention here, for example by returning an error code instead of assert
ing on input dimension error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error checking is intended as this is designed for small batch codes. It means that the code is used in parallel for. Unfortunately, error checking brings some amount of overhead and it is difficult (maybe impossible if the device is a GPU) to handle the error during the parallel run. So, the user should understand that the batched blas routines do not check errors and the numeric error should be checked in a higher level when the user think it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, but then why return an integer? Why not just make these functions void
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reserve the integer for a potential use case if an application has a strong use case that requires error checking.
(double*)B.data(), B.stride_1(), | ||
format, (MKL_INT)vector_type::vector_length); | ||
} else if (A.stride_1() == 1 && B.stride_1() == 1) { | ||
mkl_dtrsm_compact(MKL_ROW_MAJOR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll continue to express my concerns about exposing TPL details in header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Your connern is well acknowledged. mkl compact routines have the mkl prefix and using the mkl prefixed routines should not harm. However, I will encapsulate all other fortran blas and lapack interface which is highly possible to conflict with users declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyungjoo-kim I'm OK with that, as long as including the standard (non-batched) kokkos-kernels BLAS wrappers doesn't automaticallly include these headers. Thanks!
@kyungjoo-kim if it passes spot-checks on white, bowman, and a kokkos-kernels check in Trilinos (to get a cmake check) then it will be good to merge. |
@vqd8a can you run spot-checks on white and bowman and post the results? Thanks! |
Thanks @ndellingwood
|
MKL test on bowman:
|
spot-check on bowman:
Any idea why it only fails when running with intel-18.2.199 for complex double? |
Add kokkos-kernels/src/batched/KokkosBatched_Trsm_Serial_Internal.hpp Lines 195 to 203 in 8dcd972
|
Re-run spotcheck on white:
|
Re-run spotcheck on bowman:
|
Re-run MKL test on bowman:
|
Do not merge yet until after the promotion. |
@vqd8a The following is the right way to generate complex random numbers.
@crtrott It generates the complex imag random numbers. Can we put a more intuitive interface instead of |
What do you mean with overriding interface? Also from a random number generator perspective what is a "max" value for float in the sense of a uniform distribution. I thought a bit about that and think that 1.0 just makes imminently the most sense. Effectively what we are doing is ignoring the exponent for the purpose of defining distributions. |
Okay. I understand it now. |
@kyungjoo-kim : Can we merge this in ? |
merge them. |
Address #332: An implementation of batched
getrs
(serial and team) and unit tests.Modified InverseLU (
getri
) such that it just calls SolveLU (getrs
) with B as identity matrix.Added new
trsm
interfaces to supporttranspose
(not conjugate) operation ingetrs
and unit tests for these interfaces.