-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half factorization #1712
base: half_solver
Are you sure you want to change the base?
Half factorization #1712
Conversation
3db59fd
to
cd9677a
Compare
cd9677a
to
5e5cd03
Compare
5e5cd03
to
c276034
Compare
c276034
to
bbefde6
Compare
bbefde6
to
72d9d50
Compare
72d9d50
to
88967e6
Compare
88967e6
to
e667ec0
Compare
50ae4c1
to
bba40e0
Compare
e667ec0
to
c32201d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I have a question regarding atomics and hip. The latest ROCm shows support for fp16 atomic operations: https://rocm.docs.amd.com/en/latest/reference/precision-support.html#atomic-operations-support, but TBH I can't figure out what operations exactly they mean with that. Did you try anything in that regard?
PairTypenameNameGenerator); | ||
|
||
|
||
TYPED_TEST(ParIlut, KernelThresholdSelectIsEquivalentToRef) | ||
{ | ||
using value_type = typename TestFixture::value_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the tests here are missing SKIP_HALF
if compiling for HIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not support compute_l_u_factors in hip, but the others still works with half precision in HIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your meaning now
cuda/solver/common_trs_kernels.cuh
Outdated
@@ -212,13 +212,15 @@ struct CudaSolveStruct : gko::solver::SolveStruct { | |||
|
|||
size_type work_size{}; | |||
|
|||
// TODO: In nullptr is considered nullptr_t not casted to const | |||
// it does not work in cuda110/100 images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// it does not work in cuda110/100 images | |
// Explicitly cast `nullptr` to `const ValueType*` to prevent compiler issues with cuda 10/11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more on the host compiler side because it goes through our binding first with specfic type
7568854
to
d68a589
Compare
d68a589
to
e1a3b3d
Compare
bea709e
to
e4973cb
Compare
88c19f5
to
5993a90
Compare
e4973cb
to
f6291e6
Compare
… in shared memory
f6291e6
to
8ba9c5a
Compare
@@ -212,12 +212,16 @@ struct CudaSolveStruct : gko::solver::SolveStruct { | |||
|
|||
size_type work_size{}; | |||
|
|||
// nullptr is considered nullptr_t not casted to the function signature | |||
// automatically Explicitly cast `nullptr` to `const ValueType*` to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// automatically Explicitly cast `nullptr` to `const ValueType*` to | |
// automatically explicitly cast `nullptr` to `const ValueType*` to |
template <bool is_upper, typename SharedValueType, typename ValueType, | ||
typename IndexType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could SharedValueType be deduced inside, instead of making it an additional template parameter? You should be able to pull the code from the kernel launch into here and add a type alias. Otherwise it is easier to accidentally call the kernel with inconsistent types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
// optimization wrongly on a custom class when IndexType is long. We set | ||
// the index explicitly with volatile to solve it. NVHPC24.1 fixed this | ||
// issue. https://godbolt.org/z/srYhGndKn | ||
volatile auto index = (i + 1) * sampleselect_oversampling; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should go this far to accommodate broken compilers. We have workarounds for compilation issues, but not really for this degree of broken-ness.
For HIP 16 bit atomics, as long as you only use load and store, you could implement them as
|
using 32 bit memory operation for 16 bit, it will have illegal memory access in the tail or head if we do not handle it in a upper level. |
Theoretically that would be an easy fix: Make sure all allocations are |
I do not like slight guarantee unless we have a way to ensure or at least check. |
I can give you a somewhat technical justification for this: |
I know the idea, sometimes it is necessary for optimized half precision by packing them (so, we will have kind of natively 32 bit by enforcing packing structure requirement) |
this pr adds the factorization with half support.
Hip does not support atomic on the 16bits type currently
TODO: