Author: | Toby Isaac |
---|
This package uses (as of June 2016) PETSc 3.7, which can be obtained from bitbucket.
make PETSC_DIR=${PETSC_DIR} PETSC_ARCH=${PETSC_ARCH} all [install]
This package provides solver tools for discretizations where degrees of freedom of organized into tightly coupled "columns": all of the degrees of freedom in a column are assigned to the same MPI process, and are numbered contiguously. The columns can have different sizes.
To initialize the plugin, call:
#include <DofColumns.h>
PetscErrorCode ierr;
ierr = DofColumnsInitializePackage();CHKERRQ(ierr);
Let A
be a matrix which has column structure. The plugin expects that
structure to be described by a PetscSection. Suppose, for example, that
A
is distributed across 2 MPI processes: the first process owns two
columns, the second process one. Suppose the first and last columns contain
10 degrees of freedom, while the second contains 12. Here is what the setup
of the section looks like:
MPI_Comm comm;
PetscErrorCode ierr;
PetscSection columns;
PetscMPIInt rank;
PetscInt colStart, colEnd;
comm = PetscObjectComm((PetscObject)A);
ierr = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr);
// create the section
ierr = PetscSectionCreate(PetscObjectComm((PetscObject)A),&columns);CHKERRQ(ierr);
if (!rank) {
// I have two columns: columns 0 and 1, so my chart is [0,2)
ierr = PetscSectionSetChart(columns,0,2);CHKERRQ(ierr);
// Column 0 has 10 dofs
ierr = PetscSectionSetDof(columns,0,10);CHKERRQ(ierr);
// Column 1 has 12 dofs
ierr = PetscSectionSetDof(columns,1,12);CHKERRQ(ierr);
ierr = PetscSectionSetUp(columns);CHKERRQ(ierr);
// Column 0 contains dofs [0, 10)
ierr = PetscSectionSetOffset(columns,0,0);CHKERRQ(ierr);
// Column 1 contains dofs [10, 22)
ierr = PetscSectionSetOffset(columns,1,10);CHKERRQ(ierr);
}
else {
// I have one column: column 2, so my chart is [2,3)
ierr = PetscSectionSetChart(columns,2,3);CHKERRQ(ierr);
// Column 2 has 10 dofs
ierr = PetscSectionSetDof(columns,2,12);CHKERRQ(ierr);
ierr = PetscSectionSetUp(columns);CHKERRQ(ierr);
// Column 2 contains dofs [22, 32)
ierr = PetscSectionSetOffset(columns,2,22);CHKERRQ(ierr);
}
After you have constructed the section, attach it to the matrix:
ierr = MatSetDofColumns(A,columns);CHKERRQ(ierr);
// A maintains a reference to columns, so we can safely destroy it
ierr = PetscSectionDestroy(&columns);CHKERRQ(ierr);
Note: If the matrix A
is a block matrix (its block size is greater
than 1), then columns should describe the blocks, i.e., when using
PetscSectionSetDof()
to describe how big a column is, one should give the
number of blocks, not the total number of degrees of freedom.
Make the test executable with:
make test
The test executable is adapted from a PETSc example: it discretizes and solves a loading problem for an elastic material in a cubic domain, with user-defined material parameters and a user defined anisotropy. The problem is discretized with n_e trilinear hexahedral finite elements in each direction. The height of the domain in the z-direction is scaled by a factor of \varepsilon. All boundary conditions are natural, except for the bottom boundary, (z=0) which has a Dirichlet boundary condition on the normal component of displacement and a Robin boundary with coefficient \beta on the tangential components.
If the domain is anisotropic, then smoothed aggregation algebraic multigrid (SA-AMG) with a pointwise smoother, such as symmetric Gauss-Seidel, may be inefficient. To demonstrate this, run
make test_sor
This solves the problem to a fixed tolerance for fixed material coefficients, n_e=11 and \varepsilon=\{1.0,0.1,0.01\}: in my tests, the number of Krylov iterations to convergence are 10, 15, and 67, respectively.
Now we repeat the same test, except that we use an incomplete Cholesky factorization smoother instead of symmetric Gauss-Seidel,
make test_icc
In my tests, the problems with \varepsilon=\{1.0,0.1,0.01\} are now solved in 9, 9, and 6 iterations, respectively.
The column organization is important here: for similar problems that are easier to analyze (scalar elliptic, 7-point stencil, using geometric multigrid instead of SA-AMG), one can prove that the effectiveness of an incomplete factorization smoother should be independent of \varepsilon. That is, if the degrees of freedom are ordered properly: with the tightly-coupled degrees of freedom within a column numbered contiguously.
Unlike geometric multigrid, the coarse grids generated by SA-AMG may not have the same column organization as the fine grid. PETSc's default aggregation strategy (based on a randomized maximal independent set (MIS) algorithm) will definitely not preserve the column structure. How does this affect performance? To test the same problem with \varepsilon=0.01 and n_e=\{11,23,47\}, run
make test_h
In my tests, these problems are solved in 6, 11, and 24 iterations, respectively. We see that the lack of column structure on the coarse grids affects the convergence as the mesh size h=n_e^{-1} decreases.
Now, to test the same problems, but using coarse grids generated by the DofColumns plugin, run
make test_h_dofcol
The only difference between this and the previous test is the command line
option -pc_gamg_type dofcol
. Now the problems are solved in 6, 9, and 15
iterations, respectively. While this is not true h-independence, it is
closer.
The advantage of the DofColumns plugin is more pronounced on more difficult
problems. In the previous examples, the Robin boundary condition coefficient
was chose to be \beta=1: the strength of this boundary condition masked
deficiencies in the preconditioners. For an anisotropic discretization, there
are displacements that are relatively high-frequency in the x- and
y-directions (thus poorly represented on the coarse grid), but which
are low-energy (in the A
-norm) when \beta is small, say
\beta=0.01. This case is a more difficult test of the
smoother/hierarchy compatibility. So, to test the convergence of the multigrid
preconditioners with and without the DofColumns plugin on problems with
\varepsilon=0.01, \beta=0.01 and n_e=\{8,17,35\}, run
make test_h_weak
and
make test_h_weak_dofcol
In my tests using the default SA-AMG aggregation, the problems are solved in 16, 38, and 77 iterations, respectively; using the DofColumns aggregation, the problems are solved in 12, 16, and 20 iterations.