Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CPU-side solvers #1992

Merged
merged 2 commits into from
Oct 4, 2022

Conversation

thorstenhater
Copy link
Contributor

Description

  • remove storage of RHS and face-conductance from cable solver
  • remove RHS from diffusion solver
  • elide copy RHS->U and RHS->Xd respectively
  • solve will now directly mangle its inputs U / Xd
  • assembly is now part of solve
  • fix tests accordingly
  • add __restrict__ in the assembly part to encourage auto-vectorization

Analysis

  • these tricks do not work for GPU, as we pack memory there
  • further manual vectorisation found no speed-up
    • likely this means the explicit vectorisation in shared state is likewise redundant; might be subject to removal soon
    • experiments done on AVX2
  • we save quite a bit of memory
    • cable_solver: rhs + face_conductance = 2*#CV doubles
    • diffusion_solver: rhs = #CV doubles per diffusive species
  • and some copies: one for the cable solver and one per diffusive species
    • these are redundant, since we only ever call into solver::solve<T>(T& to)

* remove storage of RHS and face-conductance
* elide copy RHS->U and RHS->Xd respectively
  -> solve will now directly mangly U and Xd
* fix tests accordingly
@thorstenhater thorstenhater merged commit c8b2e78 into arbor-sim:master Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants