-
Notifications
You must be signed in to change notification settings - Fork 41
/
TODO
57 lines (53 loc) · 2.68 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
High priority:
- run large scaling tests: 3D regular meshes, Poisson, Convection
diffusion, Helmholtz, with HSS compression, push to as large as
possible, as many nodes as possible
- spack package
- Trilinos interface
- HSS C/Fortran interfaces
- integrate SLATE (start with PLASMA)
- Provide an example of factor once, solve multiple times.
- Example for reuse of sparsity structure!
- Example for a single precision preconditioner in a double precision
GMRES solve?
- Keep track of peak memory usage
- For HSS compression, store random matrix in block row distribution
instead of 2D block cyclic. This avoids data layout
transformation. Some for the HSS-times-vector product and the HSS
ULV solve!! We can let Chris Gorman or Gustavo work on this.
- to decide which fronts are HSS, look at front size, not just
separator size. This requires that symbolic factorization is already
done.
- Study behavior of replacement of tiny pivots. Look at different
matrices (including large ones!). Implement for distributed fronts!!
- distributed MC64: Jason's/Ariful's code?
Medium:
- matrix equilibration: see SuperLU_dist pdgsequ and pdlaqgs
- Use more scalable dense linear algebra, based on tiled algorithms,
mainly for dense LU and triangular solve. (requires OpenMP 4.0)
- check ordering, merge small leaves created by PTScotch/ParMetis
- MC64 licensing issue, allow compilation without MC64
- optimize communication in symbolic factorization and redistribution
into ProportionallyMappedSparseMatrix
- when calling set_csr_matrix on a MPIDist solver, you don't know the
local number of rows, begin row etc
Low priority:
- The library is not completely thread safe at the moment. explain in
the manual what is not safe! and fix
- Use mt-metis, for multithreaded nested-dissection.
- Currently the elements in the sparse matrix are stored sorted
(internally). Check whether it might be faster not to sort them.
- Perhaps the original matrix should not be explicitly
permuted. Instead access the original matrix always through the
permutation. This avoids making a copy of the original matrix.
- Do not explicitly add zeros in the sparse matrix, work on the graph
only.
- Think about proportional mapping: perhaps we can instead use the
mapping as computed by PTScotch/Parmetis. This would avoid an extra
redistribution!! But it might cause some load imbalance. Need to
set the PTScotch strategy to make sure the subdomains are well
balanced (PTscotch will also try to minimize the size of the
separator). However, the size of the subdomains does not exactly
correspond to the number of flops required for factorization of the
corresponding subtree in the elimination tree.
- add Fortran interface