You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the Nsight Systems profiling results, there is a huge gap between the SubtractJacobianTerms and AlphaMinusJacobian kernels. The root cause is that the line auto jacobian = state.jacobian_ calls the default copy assignment operator, and it will copy all the data member on the host (https://github.com/NCAR/micm/blob/main/include/micm/util/sparse_matrix.hpp#L52-L56). This is unnecessary if we just want to copy the data from device to device
Acceptance Criteria
Use state.jacobian_ directly as the function argument.
Call the SubtractJacobianTerms function to re-construct the Jacobian matrix if the matrix is singular and substepping is needed. The computational time is much shorter than making a copy of the Jacobian matrix based on the profiling result.
The text was updated successfully, but these errors were encountered:
sjsprecious
changed the title
Implement the Copy function for the sparse and CUDA sparse matrix
Avoid the local copy of Jacobian matrix when doing LuDecompose
Sep 4, 2024
According to the Nsight Systems profiling results, there is a huge gap between the
SubtractJacobianTerms
andAlphaMinusJacobian
kernels. The root cause is that the lineauto jacobian = state.jacobian_
calls the default copy assignment operator, and it will copy all the data member on the host (https://github.com/NCAR/micm/blob/main/include/micm/util/sparse_matrix.hpp#L52-L56). This is unnecessary if we just want to copy the data from device to deviceAcceptance Criteria
state.jacobian_
directly as the function argument.SubtractJacobianTerms
function to re-construct the Jacobian matrix if the matrix is singular and substepping is needed. The computational time is much shorter than making a copy of the Jacobian matrix based on the profiling result.The text was updated successfully, but these errors were encountered: