-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Add matrix inversion operator in linalg #14963
Conversation
src/operator/linalg_impl.h
Outdated
struct set_matrix : public mxnet::op::mxnet_op::tunable { | ||
template<typename DType> | ||
MSHADOW_XINLINE static void Map(int i, DType **p, DType *m, int step) { | ||
p[i] = m + i * step; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice stuff!
LINALG_CPU_GETRF(dgetrf, double) | ||
|
||
#ifdef __CUDACC__ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment on what and why fill up the matrix this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
DType **A_ptr = static_cast<DType **>(A_ptr_buf.dptr); \ | ||
const Tensor<gpu, 3, DType> temp(work.dptr_, A.shape_, s); \ | ||
int *pivot = reinterpret_cast<int *>(temp.dptr_ + temp.shape_.Size()); \ | ||
int *info = pivot + A.size(0) * A.size(1); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens on int32 overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pivot's range is in [0, matrix_dim), I think creating a square matrix with int32 overflow dimension is not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you know much more about this, but from what I understand A is the input matrix which gets overwritten by LU no? My question was if the product of A.size(0) and A.size(1) overflows, this can happen if both are bigger than 2^16 unless I'm mistaken. I have seen this bug in other places before we call Blas, it was nasty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to write a overflow case, it always fails on size checks in Tensor or Blob. I think it's the right way, for ndarray with overflow size, it should fail in advance and not reach the code above.
DType **B_ptr = static_cast<DType **>(B_ptr_buf.dptr); \ | ||
Tensor<gpu, 3, DType> temp(work.dptr_, A.shape_, s); \ | ||
int *pivot = reinterpret_cast<int *>(temp.dptr_ + temp.shape_.Size()); \ | ||
int *info = pivot + A.size(0) * A.size(1); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
src/operator/tensor/la_op.cc
Outdated
.add_argument("A", "NDArray-or-Symbol", "Tensor of square matrix"); | ||
|
||
NNVM_REGISTER_OP(_backward_linalg_inverse) | ||
.set_num_inputs(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have 3 inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I use ElemwiseGradUseInOut
, so the 3 inputs are out_grad, input, output. Actually, input is not used in computing in_grad, I'll change it to ElemwiseGradUseOut
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, It looked strange to me.
I'll merge this PR now and create another PR on matrix determinant which is depended upon this one. |
* add inverse cpu * add comment * add inverse backward cpu * add inverse gpu * able to compile * fix * fix * guard for lower version cuda * update docs * update docs * fix misaligned memory * add test * fix lint * fix android * fix indent * change transfer gradient * fix * refactor test * delete unnecessary copy * trigger CI * fix test
As title.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments