Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance of DGMulti flux differencing #757

Merged
merged 51 commits into from
Aug 9, 2021

Conversation

jlchan
Copy link
Contributor

@jlchan jlchan commented Aug 1, 2021

This PR will improve performance of flux differencing for DGMulti solvers.

@jlchan jlchan changed the title improving performance of DGMulti flux differencing WIP: improving performance of DGMulti flux differencing Aug 1, 2021
@jlchan
Copy link
Contributor Author

jlchan commented Aug 1, 2021

First step: optimize flux differencing using dense SBP matrices.

Old timing:

calc_volume_integral!       496    5.54s  83.4%  11.2ms

After optimization of hadamard_product_A_transposed!, new timing:

calc_volume_integral!       496    3.12s  87.6%  6.29ms

@codecov
Copy link

codecov bot commented Aug 1, 2021

Codecov Report

Merging #757 (7b26fa6) into main (2d8e5dd) will increase coverage by 0.00%.
The diff coverage is 95.39%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #757   +/-   ##
=======================================
  Coverage   93.58%   93.59%           
=======================================
  Files         182      182           
  Lines       17644    17710   +66     
=======================================
+ Hits        16512    16574   +62     
- Misses       1132     1136    +4     
Flag Coverage Δ
unittests 93.59% <95.39%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Trixi.jl 83.33% <ø> (ø)
src/solvers/dgmulti/types.jl 84.21% <72.73%> (-15.79%) ⬇️
src/solvers/dgmulti/dg.jl 95.86% <95.00%> (-0.43%) ⬇️
src/solvers/dgmulti/flux_differencing.jl 98.57% <97.94%> (+0.68%) ⬆️
src/callbacks_step/analysis_dgmulti.jl 97.37% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d8e5dd...7b26fa6. Read the comment docs.

@jlchan
Copy link
Contributor Author

jlchan commented Aug 3, 2021

image

Profiling results: shaved off 15% runtime by removing a bad broadcast. Other main causes of slowdown appear to be

  1. Type instability in StartUpDG (fixed in jlchan/StartUpDG.jl@b1aed81, will release with v0.11)
  2. lazy broadcasting in LazyArrays.jl. Todo: try replacing this with non-broadcasted LazyArrays once I figure out non-broadcasted LazyArray bug? JuliaArrays/LazyArrays.jl#189 or just roll a custom type.

@ranocha
Copy link
Member

ranocha commented Aug 5, 2021

Depending on your implementation of flux differencing, JuliaArrays/StaticArrays.jl#949 can also speed-up your code.

@jlchan
Copy link
Contributor Author

jlchan commented Aug 5, 2021

After more optimization, the time for KHI has dropped to

   calc_volume_integral!       966    1.26s  79.9%  1.31ms

However, the timing for KHI with TreeMesh is still over 6x faster!

   volume integral         986    206ms  54.5%   209μs

I think I can get a bit more speedup by not using LazyArrays (see JuliaArrays/LazyArrays.jl#189), but it's definitely not the only thing needed to close the performance gap.

@jlchan
Copy link
Contributor Author

jlchan commented Aug 5, 2021

Aha! Tried using unsafe_wrap instead of PtrArray in dg.jl, and got more comparable performance from TreeMesh.

   volume integral         986    1.47s  75.3%  1.49ms

I see an odd performance Heisenbug related to whether or not I'm using PtrArray in dg.jl too (seems similar to the !!! danger "Heisenbug" comment).

@ranocha
Copy link
Member

ranocha commented Aug 6, 2021

Which KHI setup are you using? Is it on a uniform grid or do you use some nonconforming interfaces?

@jlchan
Copy link
Contributor Author

jlchan commented Aug 6, 2021

Just a uniform TreeMesh grid with a flux differencing DGSEM solver

@ranocha
Copy link
Member

ranocha commented Aug 6, 2021

Are you running Julia with --check-bounds=no?

@jlchan
Copy link
Contributor Author

jlchan commented Aug 6, 2021

I don't believe so - are you wondering about the performance Heisenbug?

@ranocha
Copy link
Member

ranocha commented Aug 6, 2021

No, just about the general performance difference. When we benchmark Trixi.jl, we usually run Julia with --check-bounds=no. I proposed to add explicit bounds checks at the beginning of some methods and @inline afterwards, but that proposal was rejected. Right now, the approach used in Trixi.jl is to be safe by default and fast by passing --check-bounds=no to Julia, see also #210.
Thus, I would propose to use julia --check-bounds=no for the benchmarks you're doing here. Maybe StructArrays etc. come with some additional bounds checking that influences the performance.

@jlchan jlchan requested a review from ranocha August 8, 2021 03:25
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. I just have a few minor comments and questions 👍

jlchan and others added 7 commits August 8, 2021 07:24
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
- using Matrix{SVector{nvars, uEltype}}, which seems to be a little bit faster on average.
@jlchan jlchan requested a review from ranocha August 8, 2021 18:13
@jlchan
Copy link
Contributor Author

jlchan commented Aug 8, 2021

Oops - test failing b/c I still need to specialize the analysis routines for the new solution storage.

- also introduce new types for specialization for mul_by!(A::UniformScaling) and mul_by_accum!(A::UniformScaling)
@jlchan
Copy link
Contributor Author

jlchan commented Aug 9, 2021

The last commit should have fixed the tests, and also addresses task #5 in #675.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot - nice work 👍

@ranocha ranocha enabled auto-merge (squash) August 9, 2021 04:34
@ranocha ranocha merged commit a6d1758 into trixi-framework:main Aug 9, 2021
@jlchan
Copy link
Contributor Author

jlchan commented Aug 9, 2021

Thanks for reviewing!

@jlchan jlchan deleted the jc/optimize_fluxdiff branch August 11, 2021 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants