Skip to content

Latest commit

 

History

History
37 lines (31 loc) · 2.21 KB

NOTES.md

File metadata and controls

37 lines (31 loc) · 2.21 KB

Toast-- Development Notes

TODO

  • Investigate warning C4910 on MSVC - some declspec conflict
  • Look at fwdsolver_mw.h instantiation requirements, determine appropriate preprocessor gaurd (e.g. Clang?)
  • MATLAB QM are always complex, will crash on forward solve with real (NB interleaved API motivation)
  • MATLAB gmsh reader out of date
  • MATLAB flourescence example performance
  • Interleaved API permits shallow copy of various inputs in the MATLAB interface to reduce round-trip, exploit
  • Review element types in libfe, some contain unfinished defintions of operators and constants, remove
  • Move semantics for mathlib vectors and matrices
  • Improved initialisation for element entries
  • Check propensity for structural nonzeros viz. direct solvers
  • Python interface build assumes Release paths on Windows
  • Default link list after make mesh appears arbitrary, resulting in enormous linklist/qmvec

Perfromance

  • Bottlenecks
    • Mesh sparsity calculation heapsort (single-threaded), called when computing the system matrix for fields
    • Solvers
      • Fast direct solvers require supernodal + BLAS implementation. Use of e.g. CHOLMOD for direct solve when computing forward and adjoint fields for an HD problem is optimum (c. N=200k, nQM = 60).
      • MKL PARDISO less competitive than CHOLMOD.
      • Simplicial solvers such as Eigen LLT, and legacy Cholesky implementation are not competitive with iterative solvers.
      • Block Krylov methods don't appear to offer significant speedup and are reliant upon fast matrix solves thus indirectly require a decent BLAS.
      • Iterative solvers (CG, BICGSTAB) readily parallelised and within an order of mangnitude of direct solvers, hot path is Sparse-Dense Ax & Cholesky substitution. No memory issues. SpMv improvements using different CSR structures have shown limited improvement.
    • Jacobian computation, fast in basis, slow in mesh. Mesh path is dominated by IntFG cost, which in turn uses a virtual method call to element IntFG in hot loop. Experiments show 50% speedup possible by extracting this call and working over all RHS, and / or precomputing element integrals to avoid scaling and indexing cost.