Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor rocALUTION documentation #211

Merged
merged 27 commits into from
Apr 23, 2024
Merged

refactor rocALUTION documentation #211

merged 27 commits into from
Apr 23, 2024

Conversation

SwRaw
Copy link
Contributor

@SwRaw SwRaw commented Mar 29, 2024

Refactoring all libraries to adhere to the documentation standards that includes following home page format, index style, language improvements, link fixes, navigation, and overall content readability improvements.

-------------
Prerequisites
-------------

- A ROCm enabled platform. `ROCm Documentation <https://rocm.docs.amd.com/>`_ has more information on
supported GPUs, Linux distributions, and Windows SKUs. It also has information on how to install ROCm.
A ROCm enabled platform. For information on supported GPUs, Linux distributions, ROCm installation, and Windows SKUs, refer to `ROCm Documentation <https://rocm.docs.amd.com/>`_.

-----------------------------
Installing pre-built packages
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rocALUTION can be installed from AMD ROCm repository <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html>_. " The link doesn't look right.

directory. Only use these two installed files when needed in user code.

----------------------------------
Building and Installing rocALUTION
Building and installing rocALUTION
----------------------------------

Building from source is not necessary, as rocALUTION can be used after installing the pre-built packages as described above.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The rocALUTION source code, which is the same as for the ROCm linux distributions, is available at the rocALUTION github page <https://github.com/ROCmSoftwarePlatform/rocSPARSE>_." Is this link correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -147,7 +152,7 @@ When the parallel manager, global matrix or global vector are writing to a file,
rhs.dat.rank.2
rhs.dat.rank.3

Parallel Manager
Parallel manager
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. doxygenfunction:: rocalution::ParallelManager::GetGlobalSize
and
.. doxygenfunction:: rocalution::ParallelManager::GetLocalSize
can't be traced in the doxy file. Need to check the correct syntax.

@@ -147,7 +152,7 @@ When the parallel manager, global matrix or global vector are writing to a file,
rhs.dat.rank.2
rhs.dat.rank.3

Parallel Manager
Parallel manager
----------------
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. doxygenfunction:: rocalution::ParallelManager::SetGlobalSize
and
.. doxygenfunction:: rocalution::ParallelManager::SetLocalSize
can't be traced in the doxy file, Need to check the correct syntax.

=============================

The :cpp:func:`rocalution::Solver::Clear` function clears all the data which is in the solver, including the associated preconditioner. Thus, the solver is not anymore associated with this preconditioner.

.. note:: The preconditioner is not deleted (via destructor), only a :cpp:func:`rocalution::Preconditioner::Clear` is called.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rocalution::Preconditioner::Clear()" cannot be resolved.

================

.. doxygenclass:: rocalution::RugeStuebenAMG
.. doxygenfunction:: rocalution::RugeStuebenAMG::SetCouplingStrength
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rocalution::RugeStuebenAMG::SetCouplingStrength" cannot be resolved.

.. note:: If the input matrix is not a CSR matrix, an internal conversion will be performed to CSR format, followed by a back conversion to the previous format after the operation.
In this case, a warning message on verbosity level 2 will be printed.
.. note:: If the input matrix is not a CSR matrix, an internal conversion is performed to CSR format, followed by a back conversion to the previous format after the operation.
In this case, a warning message on verbosity level 2 is printed.

==================================================================================== =============================================================================== ======== =======
**LocalMatrix function** **Comment** **Host** **HIP**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set​Random​Normal and ExclusiveScan cannot be resolved.

@SwRaw SwRaw marked this pull request as ready for review April 9, 2024 14:51
@SwRaw SwRaw requested review from a team, YvanMokwinski and jsandham as code owners April 9, 2024 14:51
@@ -336,11 +345,11 @@ File I/O
Access
======

.. doxygenfunction:: rocalution::LocalVector::operator[](int)
.. doxygenfunction:: rocalution::LocalVector::&operator[](int)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rocalution::LocalVector::&operator cannot be resolved.

@SwRaw
Copy link
Contributor Author

SwRaw commented Apr 10, 2024

There is a lot of repetition of API content. As we call the same doxyfunction at many places. The solvers and preconditioners are individual files while being a part of API as well. I tried to remove the solvers and preconditioners from API library with a reference to the respective file but then realized that there are many cross-references to the solvers/preconditioners in the API library. Removing these would lead to many broken links so I am keeping them. I still feel the entire documentation needs a lot of rework to remove/reduce redundant content. Instead of calling the doxyfunction, call them at one place and add links to all other places where they are needed.

@cgmb cgmb changed the title refactor rocALUTION refactor rocALUTION documentation Apr 10, 2024
@SwRaw
Copy link
Contributor Author

SwRaw commented Apr 11, 2024

@lpaoletti Please review.

@SwRaw SwRaw requested a review from lpaoletti April 11, 2024 05:36
@jsandham
Copy link
Contributor

This is a great start. Approved.

@yhuiYH yhuiYH added the noCI Don't run CI on this PR label Apr 19, 2024
@ntrost57 ntrost57 merged commit c71fbd2 into ROCm:develop Apr 23, 2024
16 checks passed
yhuiYH pushed a commit that referenced this pull request May 23, 2024
* refactor rocALUTION

* refactor

* refactor

* refactor

* refactor

* Update clients.rst

* Update singlenode.rst

* refactor

* Update singlenode.rst

* refactor

* refactor

* Update singlenode.rst

* Update singlenode.rst

* Update singlenode.rst

* Update singlenode.rst

* Update singlenode.rst

* refactor

* Update singlenode.rst

* Update linux-installation.rst

* refactor

* Update windows-installation.rst

* refactor doc

* refactor doc

* Update linux-installation.rst

* refactor doc review

* Update _toc.yml.in

* Update LICENSE.md
yhuiYH added a commit that referenced this pull request May 23, 2024
* Update links (#238)

* Update links

* Update Linux_Install_Guide.rst

* refactor rocALUTION documentation (#211)

* refactor rocALUTION

* refactor

* refactor

* refactor

* refactor

* Update clients.rst

* Update singlenode.rst

* refactor

* Update singlenode.rst

* refactor

* refactor

* Update singlenode.rst

* Update singlenode.rst

* Update singlenode.rst

* Update singlenode.rst

* Update singlenode.rst

* refactor

* Update singlenode.rst

* Update linux-installation.rst

* refactor

* Update windows-installation.rst

* refactor doc

* refactor doc

* Update linux-installation.rst

* refactor doc review

* Update _toc.yml.in

* Update LICENSE.md

---------

Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
@SwRaw SwRaw deleted the swat_rocks branch June 1, 2024 13:43
jsandham added a commit to jsandham/rocALUTION that referenced this pull request Jun 21, 2024
* PMIS MPI support (ROCm#100)

* mpi to hip backend

* MPI enablement for PMIS

* MPI RS example

* clang-format

* example

* fix

* global Ext+I interpolation (ROCm#112)

* unused function

* Added PM as argument for aggregation function

* added pm for each level to MG class

* Added PM as argument for aggregation function ROCm#2

* Added PM as argument for aggregation function ROCm#3

* RS Ext+I added to global matrix

* modified example to dump some data for validation - work in progress

* RS Ext+I function added to headers

* RS Ext+I HIP implementation

* RS Ext+I host implementation

* Improved PM

* global Ext+I kernel update

* some multinode improvements (ROCm#118)

* added some more useful guards to parallel manager

* added CopyFromHostData, CopyToHostData, ExclusiveSum and Sort functionality for vector class ; moved boundary information from vector to matrix ; added GetFormat() to GlobalMatrix class

* clang-format

* P and R should be OperatorType, not LocalMatrix

* clang-format

* check if PM is valid when required

* added extra function for triple matrix product for simplicity

* clang-format

* clang-format

* skip free when ptr is nullptr

* fix memory leak in BaseAMG class

* rocsparse_csrgeam added

* renamed csr ext+i kernel because it is generally usable

* allowing CSR zero matrices with row_offset != nullptr, as well as zero vectors with size == 0

* clang-format

* duplicated row column entries throw a warning

* copy_x2x() functions added for readability and simplicity

* search and replace memcpy with copy fct

* search and replace memcpy with copy fct ROCm#2 ; fix for random csr generator to not generate duplicated row col entries

* clang-format

* fixes

* those asserts are wrong

* major version bump

* OpenMP parallel loop threshold need to be int64_t in order to work with larger structures

* Allow basic structures with 64bit entries, e.g. for global indices

* nnz should be 64bit ; also restructured RSAMG for readability

* vector size need to be 64bit locally - also added inclusive and exclusive sum functionality

* 64bit sizes for stencils

* host vector implementation changes for 64bit sizes and in/exclusive sum ; host I/O changed to always write 64bit sizes

* host stencil 64bit changes

* max residual index changed to 64bit accordingly

* solvers adjusted for 64bit nnz

* int64_t to double conversion

* allocation size should always be 64bit ; also added copy_h2h() for simplicity

* long and long long communication support added

* cleaned up types ; IndexType2 was a stupid name anyway

* removed deprecations (major release); enabled global structure support in RSAMG

* major changes to PM; added guards for transfers; removed deprecations; fixed int overflows; functionality to generate a PM from global ghost column ids, and a parent PM

* matrix conversions 64bit nnz support with guards

* host matrix I/O changed to always write 64bit sizes ; backward compatible

* host matrix implementations changed to 64bit nnz

* RSAMG restructured - global communication should not happen in local implementations ; switched to 64bit sizes

* host CSR matrix implementation

* hip implementation ; added copy_d2h/h2d/d2d for simplicity, with async flag

* adjusted unit tests to removed deprecated functions

* RSAMG MPI example updated

* fixed sanity assert

* doc update

* example should work with only 1 process

* global routines should work with single process

* global transpose operator

* using copy_h2h()

* _rocalution_sync should force a global barrier, too

* improved asynchronous apply / comm / halo apply

* accelerator must be available for pinned alloc/free

* fixing few compiler warnings

* readability

* removed the flood of printf on multi gpu systems

* adjusted openmp nested (deprecation) to v5.0

* weak scaling examples

* distributed laplacian generator

* updated rsamg example

* updated rsamg mpi example

* should use OperatorType, nothing else

* fixed RSDirectInterpolation(); fixed const PM issue

* updated unit tests

* types.hpp generated by cmake ; CSR(64/32) added on host ; moved RSPMIS communication into global matrix class

* removed old types.hpp

* initial implementation for unordered set and map on hip backend

* outsourced RSAMG to improve compilation performance; added async communication for multinode; moved multinode rspmis into globalmatrix; outsourced atomics

* clang-format

* fix for streams when not building for mpi

* SA amg merge fix

* fixed missing shared memory size

* clang-format

* clang-format

* typo

* clang format

* add blockdim to UAAMG benchmark

* adjusting unit tests for removed deprecated functions

* clang format

* test fix

* clang-format

* std::sort required algorithm header

* fixing merge error

* merge fix ROCm#2

* header cleaned up

* header cleaned up ROCm#2

* fix issue with HIP not being found

* free_pinned() does nothing on nullptr

* global triple matrix product

* proper error message when coarsening fails

* fixed a bug in global triplematrixproduct

* fixed a typo

* fixed compilation issue when HIP=off

* fixes COO and CSR conversions on both host and device, and ELL on host only (ROCm#211)

* empty matrix conversion fix

* host fallback fix for rsamg and triplematprod

* Fix documentation failures (ROCm#214)

Co-authored-by: jsandham <james.sandham@amd.com>

* Add Smoothed Aggregation to amgmpi branch (ROCm#213)

* Adding global aggregation to SAAMG (ROCm#166)

Co-authored-by: jsandham <james.sandham@amd.com>

* Add MPI support for global prolongation to SAAMG (ROCm#171)

Co-authored-by: jsandham <james.sandham@amd.com>

* Add MPI support for SAAMG global transpose (ROCm#172)

* Add MPI support for SAAMG global transpose

* Fix failures in greedy aggregation caused by unfilled aggregate_root_nodes array

---------

Co-authored-by: jsandham <james.sandham@amd.com>

* Add MPI unsmoothed aggregation (ROCm#174)

Co-authored-by: jsandham <james.sandham@amd.com>

* Adding debug printing to test triple product

* Adding debug print statements for testing

* Adding more debug printing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Fix floating point fault caused by division by zero

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Fix failures in local matrix when max_nnz_per_row is too high

* Testing

* Fix bug where we were not using a large enough hash table size

* Fix discrepency in host and hip assert in ExtractSubMatrix

* Fixing hangs in multinode hip backend

* Fix RSAMG documentation warnings

* Testing MPI uaamg

* Fix testing_local_matrix failure

* Remove comments and temporary testing code

* PR fixes

* PR fixes

* PR fixes

* Clang formatting

---------

Co-authored-by: jsandham <james.sandham@amd.com>

* removed unused variables

* Add back functions that cannot be removed until next major release (ROCm#216)

Co-authored-by: jsandham <james.sandham@amd.com>

* fix for very large sizes where local ext matrix exceeds int32

* Remove print statements from saamg testing file

---------

Co-authored-by: James Sandham <33790278+jsandham@users.noreply.github.com>
Co-authored-by: jsandham <james.sandham@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
noCI Don't run CI on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants