Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make new Accessors independent of Ginkgo #708

Merged
merged 14 commits into from
Mar 16, 2021
Merged

Conversation

thoasm
Copy link
Member

@thoasm thoasm commented Feb 21, 2021

Move the accessor scaled_reduced_row_major and reduced_row_major to the global folder accessor/ in order for others to use it without compiling Ginkgo. row_major was not copied (it must not be moved since it is part of the public interface).
A separate range was also implemented because it is needed to use the accessors.

Additionally, the following improvements / updates were performed:

  • Move each accessor to its own file
  • Add a subdirectory core/test/accessor, where the accessors are tested
  • Create separate test files for each accessor
  • Improve naming of helper functions / structures to me more descriptive

Changes done to block_col_major and row_major when migrating:

  • Remove gko::dim constructor parameter (creating is only possible with std::array now)
  • Remove copy_from functionality (since the gko::acc::range does not support it)
  • Change the value_type from length_type and stride_type from const to non-const

TODO:

@thoasm thoasm self-assigned this Feb 21, 2021
@thoasm
Copy link
Member Author

thoasm commented Feb 21, 2021

label!

@ginkgo-bot ginkgo-bot added mod:core This is related to the core module. reg:testing This is related to testing. labels Feb 21, 2021
@thoasm thoasm added the 1:ST:WIP This PR is a work in progress. Not ready for review. label Feb 21, 2021
@Slaedr
Copy link
Contributor

Slaedr commented Feb 23, 2021

One preliminary question - should the CB-GMRES-specific accessor functionality be in the new folder, or should it be part of Ginkgo proper? Would all multi-precision and layout helper code needed for Ginkgo end up in the new accessor folder? One option would be to keep fundamental accessor stuff in the new folder (those which would be needed for many types of things including matrices, preconditioners, solvers etc.) going forward, but keep solver-specific accessor code in the respective core (non-public) subfolders.

@thoasm
Copy link
Member Author

thoasm commented Feb 23, 2021

Sorry, I missed this comment.
I think it should be local to each solver, unless it is truly a generic solution that can apply to all existing accessors. Otherwise, it is harder to maintain.

@thoasm
Copy link
Member Author

thoasm commented Feb 24, 2021

format!

@codecov
Copy link

codecov bot commented Feb 24, 2021

Codecov Report

Merging #708 (b11ca27) into develop (b293b9f) will decrease coverage by 0.01%.
The diff coverage is 97.37%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #708      +/-   ##
===========================================
- Coverage    92.42%   92.41%   -0.02%     
===========================================
  Files          369      380      +11     
  Lines        27520    27486      -34     
===========================================
- Hits         25436    25400      -36     
- Misses        2084     2086       +2     
Impacted Files Coverage Δ
core/solver/cb_gmres_accessor.hpp 100.00% <ø> (ø)
core/test/base/dim.cpp 100.00% <ø> (ø)
core/test/base/range_accessors.cpp 100.00% <ø> (ø)
include/ginkgo/core/base/dim.hpp 88.88% <ø> (-2.61%) ⬇️
include/ginkgo/core/base/range_accessors.hpp 73.68% <ø> (-21.56%) ⬇️
accessor/range.hpp 81.81% <81.81%> (ø)
accessor/reduced_row_major.hpp 89.65% <89.65%> (ø)
accessor/accessor_references.hpp 94.00% <92.59%> (ø)
accessor/accessor_helper.hpp 94.49% <94.49%> (ø)
accessor/block_col_major.hpp 100.00% <100.00%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b293b9f...b11ca27. Read the comment docs.

@thoasm thoasm force-pushed the independent_accessor branch 2 times, most recently from 09a4ffa to f42bf15 Compare February 24, 2021 22:49
@thoasm thoasm added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Mar 9, 2021
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only a comment on relocatability

accessor/accessor_references.hpp Outdated Show resolved Hide resolved
accessor/accessor_helper.hpp Outdated Show resolved Hide resolved
@fritzgoebel fritzgoebel mentioned this pull request Mar 10, 2021
3 tasks
@thoasm thoasm force-pushed the independent_accessor branch 2 times, most recently from 92af985 to e8c413f Compare March 10, 2021 19:00
Copy link
Contributor

@Slaedr Slaedr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it mostly looks good. Thanks for taking care of the block col major as well. I just have a few doubts:

  • What's the purpose of the acc::range class? Just some backward compatibility? The original range class in the include directory is still there, so I'm not sure what's the need for this.
  • I see that the sub-range assignment tests were removed for mulitdimensional row-major and block-column-major. It's not obvious to me why that is.

And there a couple of other points/suggestions.

friend GKO_ACC_ATTRIBUTES constexpr bool operator<=(
const index_span &first, const index_span &second)
{
return first.end <= second.begin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Suggested change
return first.end <= second.begin;
return first.begin <= second.begin && first.end <= second.end;

I'm not sure, but maybe this is more 'natural'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a 1:1 copy from the gko::span. I think making it something completely different to operator < might be even more confusing. The current way defines a partial order, if we make this change, it is no longer partially ordered.

Copy link
Contributor

@Slaedr Slaedr Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even my suggestion would define a partial order. In fact, I think mine would define a total order, because then for any two spans A and B, at least one of A <= B or B <= A is necessarily true. Iff both are true, A = B.

The existing definition is only a partial order, true, because not all spans are comparable with <=. Also it can never happen that both A <= B and B <= A, unless both A and B refer to a single point.

But I understand if you want to keep it the way it is.

accessor/range.hpp Show resolved Hide resolved


// CUDA TOOLKIT < 11 does not support constexpr in combination with
// thrust::complex, which is why constexpr is only present in later versions
#if defined(__CUDA_ARCH__) && defined(__CUDACC_VER_MAJOR__) && \
(__CUDACC_VER_MAJOR__ < 11)

#define GKO_ENABLE_REFERENCE_CONSTEXPR
#define GKO_ACC_ENABLE_REFERENCE_CONSTEXPR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor and it's up to you, but you could reduce the verbosity:

Suggested change
#define GKO_ACC_ENABLE_REFERENCE_CONSTEXPR
#define GKO_ACC_REFERENCE_CONSTEXPR

accessor/accessor_references.hpp Show resolved Hide resolved
core/test/base/range_accessors.cpp Show resolved Hide resolved
@thoasm thoasm changed the title Make reduced storage Accessors independent of Ginkgo Make new Accessors independent of Ginkgo Mar 12, 2021
Copy link
Contributor

@Slaedr Slaedr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@thoasm thoasm added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Mar 15, 2021
Thomas Grützmacher and others added 13 commits March 15, 2021 11:37
Move the accessor {scaled,}reduced_row_major to `accessor/` in order
for others to use it without compiling Ginkgo. RowMajor was not copied
yet (it must not be moved since it is part of the public interface).
Includes:
- Add documentation to index_span
- Update documentation of range
- Give accessor helper functions more descriptive names
Co-authored-by: Thomas Grützmacher <thoasm@users.noreply.github.com>
Only the new multidimensional row_major is moved, the 2D row_major that
is part of the public interface remains.
Additionally, unnecessary code from public accessors was removed.
The code that was removed was never in the release and originally
intended for usage inside the accessors. Since they moved to an
independent directory, that code is no longer necessary.
This removal prevents it from ever becomming part of the public
interface.
Inside the accessor folder, header are now included without the
`accessor/` prefix.

Co-authored-by: Tobias Ribizel <ribizel@kit.edu>
Changes `std::array<const size_type, dim>` to
`std::array<size_type, dim>` in order to have an easier user-interface
(and const-ness is preserved because all length_types and stride_types
are const themselves).

Additionally, add a dim_type in {scaled_,}reduced_row_major accessors.
This step was forgotten previously.
Additionally, `scaled_reduced_row_major` now statically rejects complex
types (since they don't work).

Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>
Simply always using double braces ({{}}) for std::array initialization
resolved the issue.
Now, the size is always first in the constructor, and the data pointer
is followed by the corresponding stride (if existing).
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Can you also maybe check if it is indeed independent by creating something like a test_install for this independent accessor and using it by not including any ginkgo code, but only the accessor headers ?

@thoasm
Copy link
Member Author

thoasm commented Mar 15, 2021

@pratikvn what about writing the tests without including any other Ginkgo code? I only need to modify the reduced_row_major test (make 2 separate tests out of it) because I need gko::half internally.
After that, the tests are basically the test_install without checking if any public headers are included.

@pratikvn
Copy link
Member

@thoasm , if it would be possible to do that, it would be great. But I think you will need core/test/utils.hpp, but maybe you need to copy that over and not include any ginkgo headers in that.

@thoasm
Copy link
Member Author

thoasm commented Mar 15, 2021

@pratikvn I am currently setting up a custom create_independent_test, which does not link to Ginkgo in any way and only adds Ginkgo_SOURCE_DIR as the include directory. That way, if I include any public header anywhere, it fails to compile.

Thank you very much for that input, having a check if it is truly independent is quite important here (so far, I only checked that with the accessor benchmark).

The accessor tests now make sure that no dependency (on public header)
exists inside the accessor headers by only having the source directory
of Ginkgo as an include directory (and no longer linking against
Ginkgo).
Only one test (`reduced_row_major_ginkgo`) still depends on Ginkgo since
it uses `gko::half`.
@sonarcloud
Copy link

sonarcloud bot commented Mar 16, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 16 Code Smells

24.5% 24.5% Coverage
1.9% 1.9% Duplication

@thoasm thoasm merged commit 3e31cc0 into develop Mar 16, 2021
@thoasm thoasm deleted the independent_accessor branch March 16, 2021 17:58
tcojean added a commit that referenced this pull request Aug 20, 2021
Ginkgo release 1.4.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)


Related PR: #857
tcojean added a commit that referenced this pull request Aug 23, 2021
Release 1.4.0 to master

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)

Related PR: #866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. reg:testing This is related to testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants