-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triangular solvers #1193
base: develop
Are you sure you want to change the base?
Triangular solvers #1193
Conversation
adds these triangular solvers: - level based solver - warp inverting solver 'winv' - load-balancing solver 'wvar' - thinned out syncfree solver - block solver
label! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first glance at the infrastructure and some of the algorithms. Really nice work!
@@ -73,7 +73,7 @@ template <typename ValueType, typename IndexType> | |||
void generate(std::shared_ptr<const HipExecutor> exec, | |||
const matrix::Csr<ValueType, IndexType>* matrix, | |||
std::shared_ptr<solver::SolveStruct>& solve_struct, | |||
bool unit_diag, const solver::trisolve_algorithm algorithm, | |||
bool unit_diag, const solver::trisolve_strategy strategy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool unit_diag, const solver::trisolve_strategy strategy, | |
bool unit_diag, const solver::trisolve_strategy* strategy, |
@@ -73,7 +73,7 @@ template <typename ValueType, typename IndexType> | |||
void generate(std::shared_ptr<const HipExecutor> exec, | |||
const matrix::Csr<ValueType, IndexType>* matrix, | |||
std::shared_ptr<solver::SolveStruct>& solve_struct, | |||
bool unit_diag, const solver::trisolve_algorithm algorithm, | |||
bool unit_diag, const solver::trisolve_strategy strategy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool unit_diag, const solver::trisolve_strategy strategy, | |
bool unit_diag, const solver::trisolve_strategy* strategy, |
@@ -70,7 +70,7 @@ template <typename ValueType, typename IndexType> | |||
void generate(std::shared_ptr<const DpcppExecutor> exec, | |||
const matrix::Csr<ValueType, IndexType>* matrix, | |||
std::shared_ptr<solver::SolveStruct>& solve_struct, | |||
bool unit_diag, const solver::trisolve_algorithm algorithm, | |||
bool unit_diag, const solver::trisolve_strategy strategy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool unit_diag, const solver::trisolve_strategy strategy, | |
bool unit_diag, const solver::trisolve_strategy* strategy, |
@@ -70,7 +70,7 @@ template <typename ValueType, typename IndexType> | |||
void generate(std::shared_ptr<const DpcppExecutor> exec, | |||
const matrix::Csr<ValueType, IndexType>* matrix, | |||
std::shared_ptr<solver::SolveStruct>& solve_struct, | |||
bool unit_diag, const solver::trisolve_algorithm algorithm, | |||
bool unit_diag, const solver::trisolve_strategy strategy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool unit_diag, const solver::trisolve_strategy strategy, | |
bool unit_diag, const solver::trisolve_strategy* strategy, |
@@ -32,13 +32,22 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |||
|
|||
template <typename ValueType, typename IndexType> | |||
__device__ __forceinline__ | |||
std::enable_if_t<!is_complex_s<ValueType>::value, ValueType> | |||
std::enable_if_t<std::is_floating_point<ValueType>::value, ValueType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have to do with int
not having a definition of is_complex
? We will probably run into issues with #1257
void should_perform_transpose_kernel(std::shared_ptr<const CudaExecutor> exec, | ||
bool& do_transpose) | ||
{ | ||
do_transpose = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this most likely belongs outside the unnamed namespace
if (levels[row] != sweep) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to sort by levels
and instead launch only threads for the rows in that level?
|
||
|
||
template <typename IndexType, bool is_upper> | ||
__global__ void level_generation_kernel(const IndexType* const rowptrs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can find a way to merge this kernel with the numerical one (same for legacy) with the right generalizations of + and *?
const auto local_row = rhs; | ||
const auto row = gid; | ||
|
||
const auto activemask = __activemask(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activemask and Volta don't play nice with each other :-( This may only be a subset of all threads in this branch
|
||
// This is "heavily inspired" by cppreference. | ||
template <class T> | ||
__device__ const T* lower_bound(const T* first, const T* const last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have an implementation in searching.cuh
This PR adds several triangular solvers. Machinery to either select the solver from a configuration file or to automatically choose a good solver will be included in a future PR. For now, the default remains
sparselib
.These solvers are added:
The changes are a little cluttered due to some renaming, e. g.
trisolve_algorithm
totrisolve_strategy
.Some minimal refactoring is still required, but in principle the code is ready for review. There are several TODOs still in the code. They are not really pressing, but it should probably be decided what to do about them. Some might just be deleted.