-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher interface #33
Comments
Hello Bryan, |
Following up on my previous comment, it seems like implementing auto-diff for Cantera through JAX is complex cause the way JAX traces or builds computational graphs for custom python classes so I stopped spending more time on that. But, I was able to compile Cantera with Sundials 5.1.0 with CUDA support and I'm starting to use the NVECTOR_CUDA for Reactor setup in Cantera. Feedback, suggestions and comments are highly appreciated cause this is my first time dealing a large C++ codebase. |
@skrsna Please feel free to make a pull request or provide a link to a branch on your fork. |
Hi @jiweiqi, can you please elaborate on what you meant by "learn reaction kinetic models", are you trying to apply Neural Ordinary Differential Equations (https://arxiv.org/abs/1806.07366) to kinetic models or just using PyTorch autograd backend to obtain Jacobians using autodiff. |
Thanks @jiweiqi for the awesome ReacTorch package. We just released our code JAX-reactor. It's similar to ReacTorch but the backend is JAX and leverages JIT compile, vectorization, parallelization and automatic differentiation out of the box. I believe that JAX's autodiff API is more mature than PyTorch for scientific computational methods and we can also write custom differentiation rules which are gonna be useful for sensitivity analysis and uncertainty quantification. Right now profiling the JAX code revealed that bottleneck is copy data between host and the device as I'm using scipy to solve the ODE system. I'm also looking into writing BDF ODE solver in JAX to get around it. Feedback and suggestions are welcome. |
@bryanwweber ... while the discussion here is definitely useful, it doesn't necessarily help to shed light on the actual topic, i.e. 'Implement GPU solver for reactor simulations using SUNDIALS 3.1 or higher Interface'. If I understand correctly, both of the approaches advertised above bypass the Cantera core and mainly leverage YAML for alternative implementations (another reason for #83 I guess). I'd be quite interested to hear about current thoughts on how GPU calculations can be implemented. |
@ischoegl I have no idea! I was just copying GSoC ideas from our wiki page into issues 😄 |
I worked on an implementation of this a while back with @athlonshi, where we modified an old version of SUNDIALS to use MAGMA to factorize the Jacobian and do the linear solves on the GPU. It worked quite well for large mechanisms. If I recall correctly, we were getting something like a factor of 10 speedup for mechanisms with ~1000 species on just a normal consumer (gaming-targeted) GPU. I wouldn't expect the stiffness to affect the speedup, as you're using the same BDF method either way. We didn't get much further than a proof-of-concept, because at the time I think SUNDIALS didn't have some of the flexibility for specifying the linear solver steps that it gained in more recent versions. I think that is the origin of the reference to "SUNDIALS 3.1" or higher in the initial description of this idea. |
Hi all, I was wondering what the status is of this issue - I'd be willing to contribute to this, though not as GSoC participant. |
@dcmvdbekerom Please feel free, as far as I know the status update from 2 years ago represents the most recent progress on this issue. |
Hi @dcmvdbekerom, any contributions in this area would be greatly appreciated. One significant thing that has changed since the last update I wrote is that we are now able to use sparse iterative solvers (GMRES) for the reactor network problem, thanks to the implementation of approximate Jacobians for certain reactor types that can be used to construct a preconditioner, which has provided large speed-ups for big mechanisms for the problems where we can use this (@anthony-walker is working on extending the set of problems where these can be applied). As a result, using the GPU just for dense factorization and direct linear solve is probably much less of a clear benefit. MAGMA does appear to have implementations of algorithms for sparse matrices, including incomplete LU factorization, so I think we could still apply it for the same steps of the integration process. However, with the sparse solver, the linear algebra operations make up a smaller portion of the overall runtime compared to the evaluation of the chemical kinetics / governing equations, and to get the biggest benefits we may want to explore how to move more of those calculations onto the GPU. If you have any questions, please feel free to ask here or in a "discussion" at https://github.com/Cantera/enhancements/discussions. |
I remembered that I had done a bit of profiling for the sparse preconditioned solver a while back. The results can be seen here: precon-profiling.pdf. This was for several ignition delay calculations with mechanisms of different sizes. The overall results mostly reflect the behavior on the largest mechanism (~7000 species). The key results are:
Only these last two pieces will benefit from just applying MAGMA to the linear solver operations done by CVODES. If we want to see large benefits of using the GPU, we need to be able to move the evaluation of the reaction rates / net species production rates there. |
Thanks for the warm welcome, After looking a bit more into the current code and reading your comment I fear I'm way in over my head. I'm still interested to see where this could go, but I realize now that the problem is much harder than I initially thought. I naively assumed that a 1D problem could be simulated by solving N reactors in parallel, interacting with only their neighbors at every timestep. It seems it is not so simple however. I also have get more familiar with Cantera itself - what would be a good place for some newbie questions? |
If you want to look at a 1D solver that does calculate the solution as you describe, you might look at Ember (https://github.com/speth/ember), which uses Strang splitting. That code is already parallelized across the different reactors using the Intel TBB library (on the CPU, not on the GPU). For general Cantera questions, I'd suggest asking on the Users' Group, or for a discussion of potential new features, the "Discussions" section here in the Enhancements repo is also an option. |
Ginkgo looks like an interesting option for sparse systems, providing parallel implementations of factorization/solve algorithms for both CPU and GPU architectures. Recent versions of SUNDIALS (6.4 and newer) also provide some features for working with it. |
Idea
The matrices representing the systems of equations in Cantera are generally solved using SUNDIALS. Newer versions of SUNDIALS have a better interface to integrate with heterogeneous computing environments. This project will add GPU support to Cantera for reactor network integration.
Difficulty
Medium
Required Knowledge
C++, CUDA/OpenCL
Mentors
@kyleniemeyer
References
The text was updated successfully, but these errors were encountered: