-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make the type in deferred_factory more explicit #1439
Conversation
perhaps a general question, what is the point of a non-const factory? The |
When it is const type, users can not change its parameters unless they cast const away. |
Even for a non-const factory, the user can't change the parameters, since they are returned as a const reference. Anyway, I think modifying the parameters of a created factory doesn't fit into our workflow. They would need to create a new factory for that. But I can still see the sense in changing it, only I think some of the use-cases are not valid. |
If we were to turn the parameters from |
the parameters still use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we explicitly stored the constness of the parameters before changing to the deferred style I think it's fine to now denote that explicitly.
f062dfc
to
0b43d9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm not opposed to the change, only wanted to clarify the motivations.
typename = std::enable_if_t<std::is_convertible< | ||
FactoryType, deferred_factory_parameter< | ||
const stop::CriterionFactory>>::value>> | ||
Parameters& with_criteria(const std::vector<FactoryType>& criteria_vec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I miss this? Can you add a test for it?
8764a7f
to
4591e66
Compare
@upsj @MarcelKoch I have added tests to check whether the constructor and function can be compiled or not. Other than that, I also make mutable out which should stillbe the unintended usage mentioned in #1336. It's the last commit, so I can also move it to another pull request if needed. Do we put deferred_factory_parameter type into detail? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests. I only have smaller nits for those.
But I'm not sure that one of the changes to the parameter macro is possible.
\ | ||
template <typename... Args> \ | ||
auto with_##_name(Args&&... _value)->std::decay_t<decltype(*this)>& \ | ||
parameters_type& with_##_name(Args&&... _value) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this implicitly assumes that the name of the parameter struct is parameters_type
. But we can't guarantee that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I go this way because deferred_factory macro already assume that. With that change, I can reuse the macro into mixin part with using parameters_type = ...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't we rather fix the deferred_factory macro then? I think the old way should be usable as mixin without the type alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to keep the new approach, then you must also modify the GKO_CREATE_FACTORY_PARAMETERS
and GKO_ENABLE_LIN_OP_FACTORY
macro to fix the name of the parameters type to parameters_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this
is not actual parameter type in mixed, we have self
to cast this
to the actual parameter type.
self()
is not available in normal pr. we can either make the self
available in the factory parameter or contain using parameters_type
if the type name is not parameters_type.
I have tried additional parameter in macro which denotes how to return the type reference, but it will change many macros inputs unless I also add the overload macro.
Do you have another idea without these requirements/assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand the issue, my mind was at a different place. I think then changing the macros would be the best approach. But I think the whole change might be interface breaking. I would be fine with that, since, as you mentioned, the current implementation is not correct for the mixins anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I check some base type of parameter_types
, self is introduced in enable_parameter_type
protected
scope already.
All parameter types are inherited from that, so we always have self()
actually.
I just need to delete the GKO_ENABLE_SELF()
in the two parameters collection in solver_bases.hpp and then I can use self()
I am trying this approach based on self()
*/ | ||
struct iterative_solver_factory_parameters { | ||
template <typename Parameters, typename Factory> | ||
struct enable_iterative_solver_factory_parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change, but I think @upsj wanted to keep our usual mixin structure of Type
, template<ConcreteType> EnableType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, with this change there is no single type that represents an interative solver's parameters, because all enable_...
types are distinct due to their template parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What can the interactive solver's parameters be used for? Because the parameter_type is not passed as pointer, so we do not use the base type. and we can use template to handle the common operation if need.
I prefer this way because it reuse the macro such that I do not need to create the test for these class or modify the same code in two place in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void parse_iterative_solver_config(iterative_solver_factory_parameters& params, const config& c);
void dump_iterative_solver_config(const iterative_solver_factory_parameters& params, config& c);
No need to use templates in this case, that means our headers are slimmer and compile times are smaller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can not do it just in this way, because there's no concrete type out of them.
we still need to know the what's the actual type, so it will still require the template.
I do not think two/three parameter affect the compile time a lot and the part is not in public interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you can :) https://godbolt.org/z/3Gb8TTbbM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you can pass that like this way. like I said, it requires you know the actual type but it is done by the previous template from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following does not work.
iterative_parameter a;
parse(a, config);
a->generate() -> what type is it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the purpose of the type. It is meant to be usable by us and by users to generically access or set parameters for iterative solvers without knowing the actual type and without having to use templates.
In general, this PR is already going beyond what it aims to do (prevent future interface breaks for deferred_factory_parameter
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the previous approach lead to a rather complicated hierarchy with very little gain from that. We don't currently need the common type of the parameters, and I don't expect that we will in the near future. Mike's approach is a lot cleaner, since the hierarchy is a single chain, while in the other case it's a tree.
Regarding the compile times, as long as a user compiles GPU kernels, this will have barely any impact.
@@ -435,14 +443,15 @@ class deferred_factory_parameter { | |||
* @ingroup LinOp | |||
*/ | |||
#define GKO_FACTORY_PARAMETER(_name, ...) \ | |||
mutable _name{__VA_ARGS__}; \ | |||
_name{__VA_ARGS__}; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess now this cannot be modified from a const Factory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. with_*
and the parameters only allow changes in non-const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, with_*
returns self()
, which can be const
or non-const. .on()
also has been changed to return FactoryType
which is always non-const ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because with_*
is not const function, it can only be called by non-const and return non-const.
Yes for last question, .on()
always return FactoryType
, which is a different type than FactoryType::parameters_type
rebase! |
73b3624
to
0a771bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the separation into mixin and parameter type, the other change is less of a blocker to me, but I would prefet to make the type a macro parameter instead of inferring it via decltype
. This also makes the documentation harder to generate.
@@ -94,7 +94,8 @@ class Schwarz | |||
/** | |||
* Local solver factory. | |||
*/ | |||
GKO_DEFERRED_FACTORY_PARAMETER(local_solver, LinOpFactory); | |||
std::shared_ptr<const LinOpFactory> GKO_DEFERRED_FACTORY_PARAMETER( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to go back to this pattern? The macro becomes much more complex due to all the decltype
s, I deliberately did not go this way with the new macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are fine with me. I think it is good because it gives the same pattern like the other macro.
You only mean the decltype of the distinct parameter not the decltype(self())
for the reference return, right?
decltype(self())
is necessary to avoid additional input argument of macro or assumption of the type name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _type
parameter in the original GKO_DEFERRED_FACTORY_PARAMETER
macro occurs 4 times. We need to replace all of those by decltype(_name_)
, which complicates the macro. I dislike the current approach, because it makes the macro look like a member definition, but does much more (e.g. add mutable
and declare the function). By making the type a macro parameter, it is clearer that this is fully generated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it only need once. we have using _name_type = ...
and then all _type
just use that.
mutable
is not longer there as we said it is not intended behavior in the #1336
changing the member and using with_*
are only available in non-const object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is still more complex than leaving the type as a macro parameter, you need to use _name##_type
instead of _type
everywhere and add the type alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the main reason to use this style is that it's consistent with the non-deferred parameters. I think Tobias raises a good point that the other style makes it clearer that all the code gets generated. It might also help to document the functions, although we still might not be able to rely on generated documentation.
But for now I would prefer this style for consistency, and then with an interface break change to the other style for all parameters.
*/ | ||
struct iterative_solver_factory_parameters { | ||
template <typename Parameters, typename Factory> | ||
struct enable_iterative_solver_factory_parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, with this change there is no single type that represents an interative solver's parameters, because all enable_...
types are distinct due to their template parameters.
rebase! |
uses deferred factory parameter for combined Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
0a771bc
to
eb08f8d
Compare
Kudos, SonarCloud Quality Gate passed! 0 Bugs 25.7% Coverage The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. |
Release 1.7.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1451
Release 1.7.0 to develop The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1454
This PR makes the template type of
deferred_factory_parameter
more explicit.That is,
deferred_factory_paramter<const FactoryType>(..).on(exec)
returnsshared_ptr<const FactoryType>
anddeferred_factory_paramter<FactoryType>(..).on(exec)
returnsshared_ptr<FactoryType>
not
deferred_factory_paramter<FactoryType>(..).on(exec)
returnsshared_ptr<const FactoryType>
which is used in File Config such that users can get the non-const LinOpFactory from
deferred_factory<LinOpFactory>