Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher interface #33

bryanwweber · 2020-02-22T21:42:02Z

Idea

The matrices representing the systems of equations in Cantera are generally solved using SUNDIALS. Newer versions of SUNDIALS have a better interface to integrate with heterogeneous computing environments. This project will add GPU support to Cantera for reactor network integration.

Difficulty

Medium

Required Knowledge

C++, CUDA/OpenCL

Mentors

@kyleniemeyer

References

skrsna · 2020-03-10T20:04:03Z

Hello Bryan,
I have been exploring using automatic differentiation to solve system of ODEs and @rwest suggested to post about it here. Specifically, I'm interested in JAX (https://github.com/google/jax) which supports forward and backward mode automatic differentiation. JAX has an experimental module that supports ode integration similar to odeint or solve_ivp from scipy.integrate module. But, I'm having some difficulties setting up systems or use Cantera Solution as it is inside JAX because of the way how JAX treats external classes. This is just a heads up that I'm exploring this side of the field as I have little experience with SUNDAILS or C++. Another useful application of JAX is using it to do sensitivity analysis or uncertainty analysis of large systems using auto diff instead of brute force methods by implementing some methods from this paper https://arxiv.org/pdf/1812.01892.pdf .

bryanwweber · 2020-03-10T23:51:37Z

cc @kyleniemeyer

skrsna · 2020-04-16T21:31:47Z

Following up on my previous comment, it seems like implementing auto-diff for Cantera through JAX is complex cause the way JAX traces or builds computational graphs for custom python classes so I stopped spending more time on that. But, I was able to compile Cantera with Sundials 5.1.0 with CUDA support and I'm starting to use the NVECTOR_CUDA for Reactor setup in Cantera. Feedback, suggestions and comments are highly appreciated cause this is my first time dealing a large C++ codebase.

bryanwweber · 2020-04-17T00:17:07Z

@skrsna Please feel free to make a pull request or provide a link to a branch on your fork.

skrsna · 2020-04-23T14:06:22Z

Hi @jiweiqi, can you please elaborate on what you meant by "learn reaction kinetic models", are you trying to apply Neural Ordinary Differential Equations (https://arxiv.org/abs/1806.07366) to kinetic models or just using PyTorch autograd backend to obtain Jacobians using autodiff.

skrsna · 2020-06-19T15:45:34Z

Thanks @jiweiqi for the awesome ReacTorch package. We just released our code JAX-reactor. It's similar to ReacTorch but the backend is JAX and leverages JIT compile, vectorization, parallelization and automatic differentiation out of the box. I believe that JAX's autodiff API is more mature than PyTorch for scientific computational methods and we can also write custom differentiation rules which are gonna be useful for sensitivity analysis and uncertainty quantification. Right now profiling the JAX code revealed that bottleneck is copy data between host and the device as I'm using scipy to solve the ODE system. I'm also looking into writing BDF ODE solver in JAX to get around it. Feedback and suggestions are welcome.

ischoegl · 2021-03-19T18:48:12Z

@bryanwweber ... while the discussion here is definitely useful, it doesn't necessarily help to shed light on the actual topic, i.e. 'Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher Interface'. If I understand correctly, both of the approaches advertised above bypass the Cantera core and mainly leverage YAML for alternative implementations (another reason for #83 I guess). I'd be quite interested to hear about current thoughts on how GPU calculations can be implemented.

bryanwweber · 2021-03-19T18:53:16Z

@ischoegl I have no idea! I was just copying GSoC ideas from our wiki page into issues 😄

speth · 2021-03-20T20:01:28Z

I worked on an implementation of this a while back with @athlonshi, where we modified an old version of SUNDIALS to use MAGMA to factorize the Jacobian and do the linear solves on the GPU. It worked quite well for large mechanisms. If I recall correctly, we were getting something like a factor of 10 speedup for mechanisms with ~1000 species on just a normal consumer (gaming-targeted) GPU. I wouldn't expect the stiffness to affect the speedup, as you're using the same BDF method either way.

We didn't get much further than a proof-of-concept, because at the time I think SUNDIALS didn't have some of the flexibility for specifying the linear solver steps that it gained in more recent versions. I think that is the origin of the reference to "SUNDIALS 3.1" or higher in the initial description of this idea.

dcmvdbekerom · 2023-03-23T12:15:32Z

Hi all, I was wondering what the status is of this issue - I'd be willing to contribute to this, though not as GSoC participant.
I have experience with CUDA C++ and chemical kinetics.

bryanwweber · 2023-03-23T13:49:51Z

@dcmvdbekerom Please feel free, as far as I know the status update from 2 years ago represents the most recent progress on this issue.

speth · 2023-03-23T15:36:08Z

Hi @dcmvdbekerom, any contributions in this area would be greatly appreciated.

One significant thing that has changed since the last update I wrote is that we are now able to use sparse iterative solvers (GMRES) for the reactor network problem, thanks to the implementation of approximate Jacobians for certain reactor types that can be used to construct a preconditioner, which has provided large speed-ups for big mechanisms for the problems where we can use this (@anthony-walker is working on extending the set of problems where these can be applied).

As a result, using the GPU just for dense factorization and direct linear solve is probably much less of a clear benefit. MAGMA does appear to have implementations of algorithms for sparse matrices, including incomplete LU factorization, so I think we could still apply it for the same steps of the integration process. However, with the sparse solver, the linear algebra operations make up a smaller portion of the overall runtime compared to the evaluation of the chemical kinetics / governing equations, and to get the biggest benefits we may want to explore how to move more of those calculations onto the GPU.

If you have any questions, please feel free to ask here or in a "discussion" at https://github.com/Cantera/enhancements/discussions.

speth · 2023-03-23T17:05:35Z

I remembered that I had done a bit of profiling for the sparse preconditioned solver a while back. The results can be seen here: precon-profiling.pdf. This was for several ignition delay calculations with mechanisms of different sizes. The overall results mostly reflect the behavior on the largest mechanism (~7000 species). The key results are:

65% of the total time is spent evaluating the reactor governing equations.
- 75% of this time is spent on reaction rate evaluations
- 14% of this time is spent on evaluating thermodynamic properties
16% of the total time is spent on preconditioner setup
- 75% of this time is spent on preconditioner factorization
10% of the total time is spent on solves using the factorized preconditioner

Only these last two pieces will benefit from just applying MAGMA to the linear solver operations done by CVODES. If we want to see large benefits of using the GPU, we need to be able to move the evaluation of the reaction rates / net species production rates there.

dcmvdbekerom · 2023-03-23T18:38:14Z

Thanks for the warm welcome,

After looking a bit more into the current code and reading your comment I fear I'm way in over my head.
Moreover, I keep forgetting that consumer GPU's, like the one I have, have only few FP64 cores. In my other applications (spectroscopy) it was not a problem to use FP32, but for chemical kinetics this would probably be prohibitively inaccurate.

I'm still interested to see where this could go, but I realize now that the problem is much harder than I initially thought. I naively assumed that a 1D problem could be simulated by solving N reactors in parallel, interacting with only their neighbors at every timestep. It seems it is not so simple however.

I also have get more familiar with Cantera itself - what would be a good place for some newbie questions?

speth · 2023-03-24T00:40:28Z

If you want to look at a 1D solver that does calculate the solution as you describe, you might look at Ember (https://github.com/speth/ember), which uses Strang splitting. That code is already parallelized across the different reactors using the Intel TBB library (on the CPU, not on the GPU).

For general Cantera questions, I'd suggest asking on the Users' Group, or for a discussion of potential new features, the "Discussions" section here in the Enhancements repo is also an option.

speth · 2024-07-31T19:50:20Z

Ginkgo looks like an interesting option for sparse systems, providing parallel implementations of factorization/solve algorithms for both CPU and GPU architectures. Recent versions of SUNDIALS (6.4 and newer) also provide some features for working with it.

bryanwweber added the gsoc-idea label Feb 22, 2020

bryanwweber added feature-request New feature request and removed gsoc-idea labels Mar 23, 2023

ischoegl mentioned this issue Nov 25, 2024

GPU compatibility (questions, open branches, problems?) #214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher interface #33

Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher interface #33

bryanwweber commented Feb 22, 2020 •

edited

Loading

skrsna commented Mar 10, 2020

bryanwweber commented Mar 10, 2020 •

edited

Loading

skrsna commented Apr 16, 2020

bryanwweber commented Apr 17, 2020

skrsna commented Apr 23, 2020

skrsna commented Jun 19, 2020

ischoegl commented Mar 19, 2021 •

edited

Loading

bryanwweber commented Mar 19, 2021

speth commented Mar 20, 2021

dcmvdbekerom commented Mar 23, 2023

bryanwweber commented Mar 23, 2023

speth commented Mar 23, 2023

speth commented Mar 23, 2023

dcmvdbekerom commented Mar 23, 2023

speth commented Mar 24, 2023

speth commented Jul 31, 2024

Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher interface #33

Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher interface #33

Comments

bryanwweber commented Feb 22, 2020 • edited Loading

Idea

Difficulty

Required Knowledge

Mentors

References

skrsna commented Mar 10, 2020

bryanwweber commented Mar 10, 2020 • edited Loading

skrsna commented Apr 16, 2020

bryanwweber commented Apr 17, 2020

skrsna commented Apr 23, 2020

skrsna commented Jun 19, 2020

ischoegl commented Mar 19, 2021 • edited Loading

bryanwweber commented Mar 19, 2021

speth commented Mar 20, 2021

dcmvdbekerom commented Mar 23, 2023

bryanwweber commented Mar 23, 2023

speth commented Mar 23, 2023

speth commented Mar 23, 2023

dcmvdbekerom commented Mar 23, 2023

speth commented Mar 24, 2023

speth commented Jul 31, 2024

bryanwweber commented Feb 22, 2020 •

edited

Loading

bryanwweber commented Mar 10, 2020 •

edited

Loading

ischoegl commented Mar 19, 2021 •

edited

Loading