Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor rocALUTION documentation #211

Merged
merged 27 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions docs/api.rst → docs/api/api.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
.. meta::
:description: A sparse linear algebra library with focus on exploring fine-grained parallelism on top of the AMD ROCm runtime and toolchains
:keywords: rocALUTION, ROCm, library, API, tool

.. _api:

###
API
###
#############
API library
#############

This section provides a detailed list of the library API
This document provides the detailed API list.

Host Utility Functions
======================
Expand Down
90 changes: 90 additions & 0 deletions docs/api/backend.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
.. meta::
:description: A sparse linear algebra library with focus on exploring fine-grained parallelism on top of the AMD ROCm runtime and toolchains
:keywords: rocALUTION, ROCm, library, API, tool

.. _backends:

********
Backends
********

The rocALUTION structure is embedded with the support for accelerator devices. It is recommended to use accelerators to decrease the computational time.
.. note:: Not all functions are ported and present on the accelerator backend. This limited functionality is natural, since not all operations can be performed efficiently on the accelerators (e.g. sequential algorithms, I/O from the file system, etc.).

rocALUTION supports HIP-capable GPUs starting with ROCm 1.9. Due to its design, the library can be easily extended to support future accelerator technologies. Such an extension of the library will not affect the algorithms based on it.

If a particular function is not implemented for the used accelerator, the library moves the object to the host and computes the routine there. In such cases, a warning message of level 2 is printed. For example, if the user wants to perform an ILUT factorization on the HIP backend which is currently unavailable, the library moves the object to the host, performs the routine there and prints the following warning message:

::

*** warning: LocalMatrix::ILUTFactorize() is performed on the host

Moving objects to and from the accelerator
==========================================

All objects in rocALUTION can be moved to the accelerator and the host.

.. doxygenfunction:: rocalution::BaseRocalution::MoveToAccelerator
.. doxygenfunction:: rocalution::BaseRocalution::MoveToHost

.. code-block:: cpp

LocalMatrix<ValueType> mat;
LocalVector<ValueType> vec1, vec2;

// Perform matrix vector multiplication on the host
mat.Apply(vec1, &vec2);

// Move data to the accelerator
mat.MoveToAccelerator();
vec1.MoveToAccelerator();
vec2.MoveToAccelerator();

// Perform matrix vector multiplication on the accelerator
mat.Apply(vec1, &vec2);

// Move data to the host
mat.MoveToHost();
vec1.MoveToHost();
vec2.MoveToHost();

Asynchronous transfers
======================

The rocALUTION library also provides asynchronous transfer of data between host and HIP backend.

.. doxygenfunction:: rocalution::BaseRocalution::MoveToAcceleratorAsync
.. doxygenfunction:: rocalution::BaseRocalution::MoveToHostAsync
.. doxygenfunction:: rocalution::BaseRocalution::Sync

This can be done with :cpp:func:`rocalution::LocalVector::CopyFromAsync` and :cpp:func:`rocalution::LocalMatrix::CopyFromAsync` or with `MoveToAcceleratorAsync()` and `MoveToHostAsync()`. These functions return immediately and perform the asynchronous transfer in background mode. The synchronization is done with `Sync()`.

When using the `MoveToAcceleratorAsync()` and `MoveToHostAsync()` functions, the object still points to its original location (i.e. host for calling `MoveToAcceleratorAsync()` and accelerator for `MoveToHostAsync()`). The object switches to the new location after the `Sync()` function is called.

.. note:: The objects should not be modified during an active asynchronous transfer to avoid the possibility of generating incorrect values after the synchronization.
.. note:: To use asynchronous transfers, enable the pinned memory allocation. Uncomment ``#define ROCALUTION_HIP_PINNED_MEMORY`` in ``src/utils/allocate_free.hpp``.

Systems without accelerators
============================

rocALUTION provides full code compatibility on systems without accelerators. You can take the code from the GPU system, re-compile the same code on a machine without a GPU and it still provides the same results. Any calls to :cpp:func:`rocalution::BaseRocalution::MoveToAccelerator` and :cpp:func:`rocalution::BaseRocalution::MoveToHost` are ignored.

Memory allocations
==================

All data that is passed to and from rocALUTION uses the memory handling functions described in the code. By default, the library uses standard C++ ``new`` and ``delete`` functions for the host data. To change the default behavior, modify `src/utils/allocate_free.cpp`.

Allocation problems
-------------------

If the allocation fails, the library reports an error and exits. To change this default behavior, modify `src/utils/allocate_free.cpp`.

Memory alignment
----------------

The library can also handle special memory alignment functions. This feature needs to be uncommented before the compilation process in `src/utils/allocate_free.cpp`.

Pinned memory allocation (HIP)
------------------------------

By default, the standard host memory allocation is realized using C++ ``new`` and ``delete``. For faster PCI-Express transfers on HIP backend, use pinned host memory. You can activate this by uncommenting the corresponding macro in `src/utils/allocate_free.hpp`.
73 changes: 49 additions & 24 deletions docs/usermanual/precond.rst → docs/api/precond.rst
Original file line number Diff line number Diff line change
@@ -1,106 +1,129 @@
.. meta::
:description: A sparse linear algebra library with focus on exploring fine-grained parallelism on top of the AMD ROCm runtime and toolchains
:keywords: rocALUTION, ROCm, library, API, tool

.. _preconditioners:

###############
Preconditioners
###############
In this chapter, all preconditioners are presented. All preconditioners support local operators. They can be used as a global preconditioner via block-jacobi scheme which works locally on each interior matrix. To provide fast application, all preconditioners require extra memory to keep the approximated operator.

This document provides a category-wise listing of the preconditioners. All preconditioners support local operators. They can be used as a global preconditioner via block-jacobi scheme, which works locally on each interior matrix. To provide fast application, all preconditioners require extra memory to keep the approximated operator.

.. doxygenclass:: rocalution::Preconditioner

Code Structure
Code structure
==============
The preconditioners provide a solution to the system :math:`Mz = r`, where either the solution :math:`z` is directly computed by the approximation scheme or it is iteratively obtained with :math:`z = 0` initial guess.

Jacobi Method
The preconditioners provide a solution to the system :math:`Mz = r`, where the solution :math:`z` is either directly computed by the approximation scheme or iteratively obtained with :math:`z = 0` initial guess.

Jacobi method
=============

.. doxygenclass:: rocalution::Jacobi
.. note:: Damping parameter :math:`\omega` can be adjusted by :cpp:func:`rocalution::FixedPoint::SetRelaxation`.
.. note:: To adjust the damping parameter :math:`\omega`, use :cpp:func:`rocalution::FixedPoint::SetRelaxation`.

(Symmetric) Gauss-Seidel or (S)SOR method
==========================================

(Symmetric) Gauss-Seidel / (S)SOR Method
========================================
.. doxygenclass:: rocalution::GS
.. doxygenclass:: rocalution::SGS
.. note:: Relaxation parameter :math:`\omega` can be adjusted by :cpp:func:`rocalution::FixedPoint::SetRelaxation`.
.. note:: To adjust the relaxation parameter :math:`\omega`, use :cpp:func:`rocalution::FixedPoint::SetRelaxation`.

Incomplete Factorizations
Incomplete factorizations
=========================

ILU
---

.. doxygenclass:: rocalution::ILU
.. doxygenfunction:: rocalution::ILU::Set

ILUT
----

.. doxygenclass:: rocalution::ILUT
.. doxygenfunction:: rocalution::ILUT::Set(double)
.. doxygenfunction:: rocalution::ILUT::Set(double, int)

IC
--
---

.. doxygenclass:: rocalution::IC

AI Chebyshev
============

.. doxygenclass:: rocalution::AIChebyshev
.. doxygenfunction:: rocalution::AIChebyshev::Set

FSAI
====

.. doxygenclass:: rocalution::FSAI
.. doxygenfunction:: rocalution::FSAI::Set(int)
.. doxygenfunction:: rocalution::FSAI::Set(const OperatorType&)
.. doxygenfunction:: rocalution::FSAI::SetPrecondMatrixFormat

SPAI
====

.. doxygenclass:: rocalution::SPAI
.. doxygenfunction:: rocalution::SPAI::SetPrecondMatrixFormat

TNS
===

.. doxygenclass:: rocalution::TNS
.. doxygenfunction:: rocalution::TNS::Set
.. doxygenfunction:: rocalution::TNS::SetPrecondMatrixFormat

MultiColored Preconditioners
MultiColored preconditioners
============================

.. doxygenclass:: rocalution::MultiColored
.. doxygenfunction:: rocalution::MultiColored::SetPrecondMatrixFormat
.. doxygenfunction:: rocalution::MultiColored::SetDecomposition

MultiColored (Symmetric) Gauss-Seidel / (S)SOR
MultiColored (symmetric) Gauss-Seidel / (S)SOR
----------------------------------------------

.. doxygenclass:: rocalution::MultiColoredGS
.. doxygenclass:: rocalution::MultiColoredSGS
.. doxygenfunction:: rocalution::MultiColoredSGS::SetRelaxation
.. note:: The preconditioner matrix format can be changed using :cpp:func:`rocalution::MultiColored::SetPrecondMatrixFormat`.
.. note:: To change the preconditioner matrix format, use :cpp:func:`rocalution::MultiColored::SetPrecondMatrixFormat`.

MultiColored Power(q)-pattern method ILU(p,q)
MultiColored power(q)-pattern method ILU(p,q)
---------------------------------------------

.. doxygenclass:: rocalution::MultiColoredILU
.. doxygenfunction:: rocalution::MultiColoredILU::Set(int)
.. doxygenfunction:: rocalution::MultiColoredILU::Set(int, int, bool)
.. note:: The preconditioner matrix format can be changed using :cpp:func:`rocalution::MultiColored::SetPrecondMatrixFormat`.
.. note:: To change the preconditioner matrix format, use :cpp:func:`rocalution::MultiColored::SetPrecondMatrixFormat`.

Multi-Elimination Incomplete LU
Multi-elimination incomplete LU
===============================

.. doxygenclass:: rocalution::MultiElimination
.. doxygenfunction:: rocalution::MultiElimination::GetSizeDiagBlock
.. doxygenfunction:: rocalution::MultiElimination::GetLevel
.. doxygenfunction:: rocalution::MultiElimination::Set
.. doxygenfunction:: rocalution::MultiElimination::SetPrecondMatrixFormat

Diagonal Preconditioner for Saddle-Point Problems
Diagonal preconditioner for saddle-point problems
=================================================

.. doxygenclass:: rocalution::DiagJacobiSaddlePointPrecond
.. doxygenfunction:: rocalution::DiagJacobiSaddlePointPrecond::Set

(Restricted) Additive Schwarz Preconditioner
(Restricted) Additive Schwarz preconditioner
============================================

.. doxygenclass:: rocalution::AS
.. doxygenfunction:: rocalution::AS::Set
.. doxygenclass:: rocalution::RAS

The overlapped area is shown in :numref:`AS`.
See the overlapped area in the figure below:

.. _AS:
.. figure:: ../data/AS.png
Expand All @@ -109,12 +132,13 @@ The overlapped area is shown in :numref:`AS`.

Example of a 4 block-decomposed matrix - Additive Schwarz with overlapping preconditioner (left) and Restricted Additive Schwarz preconditioner (right).

Block-Jacobi (MPI) Preconditioner
Block-Jacobi (MPI) preconditioner
=================================

.. doxygenclass:: rocalution::BlockJacobi
.. doxygenfunction:: rocalution::BlockJacobi::Set

The Block-Jacobi (MPI) preconditioner is shown in :numref:`BJ`.
See the Block-Jacobi (MPI) preconditioner in the figure below:

.. _BJ:
.. figure:: ../data/BJ.png
Expand All @@ -123,8 +147,9 @@ The Block-Jacobi (MPI) preconditioner is shown in :numref:`BJ`.

Example of a 4 block-decomposed matrix - Block-Jacobi preconditioner.

Block Preconditioner
Block preconditioner
====================

.. doxygenclass:: rocalution::BlockPreconditioner
.. doxygenfunction:: rocalution::BlockPreconditioner::Set
.. doxygenfunction:: rocalution::BlockPreconditioner::SetDiagonalSolver
Expand All @@ -133,8 +158,8 @@ Block Preconditioner
.. doxygenfunction:: rocalution::BlockPreconditioner::SetPermutation


Variable Preconditioner
Variable preconditioner
=======================

.. doxygenclass:: rocalution::VariablePreconditioner
.. doxygenfunction:: rocalution::VariablePreconditioner::SetPreconditioner

Loading