Improving performance of DGMulti flux differencing #757

jlchan · 2021-08-01T04:23:25Z

This PR will improve performance of flux differencing for DGMulti solvers.

reduce flux evaluations by computing only the upper triangular part of the flux matrix.
specialize flux differencing for sparse matrices
replace LazyArrays.jl with custom lazy array type (addresses non-broadcasted LazyArray bug? JuliaArrays/LazyArrays.jl#189 (comment))

jlchan · 2021-08-01T04:28:06Z

First step: optimize flux differencing using dense SBP matrices.

Old timing:

calc_volume_integral!       496    5.54s  83.4%  11.2ms

After optimization of hadamard_product_A_transposed!, new timing:

calc_volume_integral!       496    3.12s  87.6%  6.29ms

codecov · 2021-08-01T06:05:55Z

Codecov Report

Merging #757 (7b26fa6) into main (2d8e5dd) will increase coverage by 0.00%.
The diff coverage is 95.39%.

@@           Coverage Diff           @@
##             main     #757   +/-   ##
=======================================
  Coverage   93.58%   93.59%           
=======================================
  Files         182      182           
  Lines       17644    17710   +66     
=======================================
+ Hits        16512    16574   +62     
- Misses       1132     1136    +4

Flag	Coverage Δ
unittests	`93.59% <95.39%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/Trixi.jl	`83.33% <ø> (ø)`
src/solvers/dgmulti/types.jl	`84.21% <72.73%> (-15.79%)`	⬇️
src/solvers/dgmulti/dg.jl	`95.86% <95.00%> (-0.43%)`	⬇️
src/solvers/dgmulti/flux_differencing.jl	`98.57% <97.94%> (+0.68%)`	⬆️
src/callbacks_step/analysis_dgmulti.jl	`97.37% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d8e5dd...7b26fa6. Read the comment docs.

originally introduced as an optimized version of the routine as suggested in https://github.com/trixi-framework/Trixi.jl/pull/695/files#r670403097, but it doesn't seem to affect performance

jlchan · 2021-08-03T20:51:24Z

Profiling results: shaved off 15% runtime by removing a bad broadcast. Other main causes of slowdown appear to be

Type instability in StartUpDG (fixed in jlchan/StartUpDG.jl@b1aed81, will release with v0.11)
lazy broadcasting in LazyArrays.jl. Todo: try replacing this with non-broadcasted LazyArrays once I figure out non-broadcasted LazyArray bug? JuliaArrays/LazyArrays.jl#189 or just roll a custom type.

ranocha · 2021-08-05T09:03:27Z

Depending on your implementation of flux differencing, JuliaArrays/StaticArrays.jl#949 can also speed-up your code.

jlchan · 2021-08-05T16:10:13Z

After more optimization, the time for KHI has dropped to

   calc_volume_integral!       966    1.26s  79.9%  1.31ms

However, the timing for KHI with TreeMesh is still over 6x faster!

   volume integral         986    206ms  54.5%   209μs

I think I can get a bit more speedup by not using LazyArrays (see JuliaArrays/LazyArrays.jl#189), but it's definitely not the only thing needed to close the performance gap.

jlchan · 2021-08-05T17:27:05Z

Aha! Tried using unsafe_wrap instead of PtrArray in dg.jl, and got more comparable performance from TreeMesh.

   volume integral         986    1.47s  75.3%  1.49ms

I see an odd performance Heisenbug related to whether or not I'm using PtrArray in dg.jl too (seems similar to the !!! danger "Heisenbug" comment).

ranocha · 2021-08-06T05:42:55Z

Which KHI setup are you using? Is it on a uniform grid or do you use some nonconforming interfaces?

jlchan · 2021-08-06T05:48:34Z

Just a uniform TreeMesh grid with a flux differencing DGSEM solver

ranocha · 2021-08-06T05:52:42Z

Are you running Julia with --check-bounds=no?

jlchan · 2021-08-06T05:55:06Z

I don't believe so - are you wondering about the performance Heisenbug?

ranocha · 2021-08-06T05:58:47Z

No, just about the general performance difference. When we benchmark Trixi.jl, we usually run Julia with --check-bounds=no. I proposed to add explicit bounds checks at the beginning of some methods and @inline afterwards, but that proposal was rejected. Right now, the approach used in Trixi.jl is to be safe by default and fast by passing --check-bounds=no to Julia, see also #210.
Thus, I would propose to use julia --check-bounds=no for the benchmarks you're doing here. Maybe StructArrays etc. come with some additional bounds checking that influences the performance.

…c/optimize_fluxdiff

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

…c/optimize_fluxdiff

intended to be compared with examples/dgmulti_2d/elixir_euler_kelvin_helmholtz_instability.jl

ranocha

Looks mostly good. I just have a few minor comments and questions 👍

examples/tree_2d_dgsem/elixir_euler_kelvin_helmholtz_instability_no_shock_capturing.jl

src/solvers/dgmulti/dg.jl

src/solvers/dgmulti/flux_differencing.jl

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

…c/optimize_fluxdiff

- using Matrix{SVector{nvars, uEltype}}, which seems to be a little bit faster on average.

jlchan · 2021-08-08T21:25:42Z

Oops - test failing b/c I still need to specialize the analysis routines for the new solution storage.

- also introduce new types for specialization for mul_by!(A::UniformScaling) and mul_by_accum!(A::UniformScaling)

jlchan · 2021-08-09T03:27:46Z

The last commit should have fixed the tests, and also addresses task #5 in #675.

src/solvers/dgmulti/dg.jl

ranocha

Thanks a lot - nice work 👍

jlchan · 2021-08-09T09:51:41Z

Thanks for reviewing!

improving performance of DGMulti flux differencing

9f7d51e

jlchan changed the title ~~improving performance of DGMulti flux differencing~~ WIP: improving performance of DGMulti flux differencing Aug 1, 2021

jlchan added 8 commits August 1, 2021 23:40

adding FillArrays, SparseArrays to Trixi and Project.toml

de2b9a2

adding sparsity_pattern to DGMulti caches

36dfd48

adding optimized flux differencing routine for sparse matrices

18e4729

removing duplicate hadamard_sum_A_transposed! routine

fabb8c6

originally introduced as an optimized version of the routine as suggested in https://github.com/trixi-framework/Trixi.jl/pull/695/files#r670403097, but it doesn't seem to affect performance

adding rowvals to imported functions from SparseArrays

fbc4796

adding test for new sparse flux differencing features

456250f

removing slow broadcasted division

37b92e6

adding dropped "end"

4f30128

jlchan mentioned this pull request Aug 3, 2021

non-broadcasted LazyArray bug? JuliaArrays/LazyArrays.jl#189

Closed

Merge remote-tracking branch 'origin/main' into jc/optimize_fluxdiff

b7e2575

jlchan and others added 3 commits August 5, 2021 21:55

avoid passing anonymous function in hadamard_sum_A_transposed!

60a4805

refactoring create_cache and rearranging routines

c0cb82b

Merge branch 'main' into jc/optimize_fluxdiff

ecd8c52

jlchan and others added 4 commits August 5, 2021 23:00

Merge branch 'main' into jc/optimize_fluxdiff

675795f

adding new lazy physical diff type

1029e90

replacing LazyArrays with custom LazyMatrixLinearCombo

4b31bfa

Merge remote-tracking branch 'Trixi_fork/jc/optimize_fluxdiff' into j…

85b32c0

…c/optimize_fluxdiff

jlchan and others added 9 commits August 7, 2021 14:28

Merge remote-tracking branch 'Trixi_fork/jc/optimize_fluxdiff' into j…

113153a

…c/optimize_fluxdiff

Update src/solvers/dgmulti/flux_differencing.jl

8540905

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

adding DGMulti KHI elixir

f8daddf

Merge remote-tracking branch 'Trixi_fork/jc/optimize_fluxdiff' into j…

4c06231

…c/optimize_fluxdiff

else -> elseif orientation ...

dc5331a

precomputing 1 / rd.wq[i]

dcda429

storing A instead of ATr for flux differencing for simplicity/clarity

cafa904

renaming KHI elixir

b2271e9

adding plain (no shock capturing) KHI elixir

86256f4

intended to be compared with examples/dgmulti_2d/elixir_euler_kelvin_helmholtz_instability.jl

jlchan requested a review from ranocha August 8, 2021 03:25

ranocha requested changes Aug 8, 2021

View reviewed changes

jlchan and others added 7 commits August 8, 2021 07:24

Apply suggestions from code review

955ef06

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

remove unstable elixir

7235f4a

Update src/solvers/dgmulti/dg.jl

dabd5f3

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

Merge remote-tracking branch 'Trixi_fork/jc/optimize_fluxdiff' into j…

f3b7691

…c/optimize_fluxdiff

renaming Fscale -> lift_scalings

fd25fb2

elseif orientation = ... -> else # if orientation = ...

17b58b4

specializing solution storage for DGMulti SBP + flux differencing

cb57598

- using Matrix{SVector{nvars, uEltype}}, which seems to be a little bit faster on average.

jlchan requested a review from ranocha August 8, 2021 18:13

replacing StructArrays.foreachfield with apply_to_each_field

f4575c4

- also introduce new types for specialization for mul_by!(A::UniformScaling) and mul_by_accum!(A::UniformScaling)

ranocha reviewed Aug 9, 2021

View reviewed changes

src/solvers/dgmulti/dg.jl Outdated Show resolved Hide resolved

ranocha reviewed Aug 9, 2021

View reviewed changes

src/solvers/dgmulti/dg.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

7b26fa6

ranocha approved these changes Aug 9, 2021

View reviewed changes

ranocha enabled auto-merge (squash) August 9, 2021 04:34

ranocha merged commit a6d1758 into trixi-framework:main Aug 9, 2021

jlchan deleted the jc/optimize_fluxdiff branch August 11, 2021 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving performance of DGMulti flux differencing #757

Improving performance of DGMulti flux differencing #757

jlchan commented Aug 1, 2021 •

edited

Loading

jlchan commented Aug 1, 2021 •

edited

Loading

codecov bot commented Aug 1, 2021 •

edited

Loading

jlchan commented Aug 3, 2021

ranocha commented Aug 5, 2021

jlchan commented Aug 5, 2021 •

edited

Loading

jlchan commented Aug 5, 2021 •

edited

Loading

ranocha commented Aug 6, 2021

jlchan commented Aug 6, 2021

ranocha commented Aug 6, 2021

jlchan commented Aug 6, 2021

ranocha commented Aug 6, 2021 •

edited

Loading

ranocha left a comment

jlchan commented Aug 8, 2021

jlchan commented Aug 9, 2021

ranocha left a comment

jlchan commented Aug 9, 2021

Improving performance of DGMulti flux differencing #757

Improving performance of DGMulti flux differencing #757

Conversation

jlchan commented Aug 1, 2021 • edited Loading

jlchan commented Aug 1, 2021 • edited Loading

codecov bot commented Aug 1, 2021 • edited Loading

Codecov Report

jlchan commented Aug 3, 2021

ranocha commented Aug 5, 2021

jlchan commented Aug 5, 2021 • edited Loading

jlchan commented Aug 5, 2021 • edited Loading

ranocha commented Aug 6, 2021

jlchan commented Aug 6, 2021

ranocha commented Aug 6, 2021

jlchan commented Aug 6, 2021

ranocha commented Aug 6, 2021 • edited Loading

ranocha left a comment

Choose a reason for hiding this comment

jlchan commented Aug 8, 2021

jlchan commented Aug 9, 2021

ranocha left a comment

Choose a reason for hiding this comment

jlchan commented Aug 9, 2021

jlchan commented Aug 1, 2021 •

edited

Loading

jlchan commented Aug 1, 2021 •

edited

Loading

codecov bot commented Aug 1, 2021 •

edited

Loading

jlchan commented Aug 5, 2021 •

edited

Loading

jlchan commented Aug 5, 2021 •

edited

Loading

ranocha commented Aug 6, 2021 •

edited

Loading