-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate and implement EllipsisNotation.jl
for dimension-agnostic code
#179
Comments
A typical REPL session monitoring the performance of some code in Trixi while modifying it could look like julia> using Revise, BenchmarkTools; using Trixi
julia> Trixi.init_parameters("examples/2d/parameters_sedov_blast_wave_shockcapturing_amr.toml");
julia> mesh, dg, time_parameters, time_integration_function = Trixi.init_simulation();
julia> @benchmark Trixi.prolong2mortars!(
$(dg),
$(dg.mortar_type)) One could also call |
It seems like code that can be simplified using EllipsisNotation does not appear in performance-critical parts. In Trixi code that can be simplified using EllipsisNotation appears in I tested the function if ndims(solver) == 2
if maximum(abs, @views solver.elements.u_t[1, :, :, :]) <= solver.equations.resid_tol
# println(" Gravity solution tolerance ", solver.equations.resid_tol,
# " reached in iterations ",iteration)
finalstep = true
end
else
if maximum(abs, @views solver.elements.u_t[1, :, :, :, :]) <= solver.equations.resid_tol
# println(" Gravity solution tolerance ", solver.equations.resid_tol,
# " reached in iterations ",iteration)
finalstep = true
end
end can be simplified to if maximum(abs, @views solver.elements.u_t[1, ..]) <= solver.equations.resid_tol
# println(" Gravity solution tolerance ", solver.equations.resid_tol,
# " reached in iterations ",iteration)
finalstep = true
end with BenchmarkTools. update_gravity!(solver_gravity, solver_euler.elements.u, gravity_parameters) in b = @benchmarkable update_gravity!($solver_gravity, $solver_euler.elements.u, $gravity_parameters) and ran the benchmark in the REPL. The result with classical notation looks like this:
The result with EllipsisNotation looks like this:
While the times are similar, the version with EllipsisNotation causes a few more allocations. |
This looks already very promising, thanks! Have you also tried an "artificial" test where the compiler does not know the type of the array that is being indexed? I am thinking about something along the following lines struct Wrapper
wrapped::Any
end
function foo()
u = rand(5, 5, 5, 5, 100)
w = Wrapper(u)
bar(w)
end
function bar(w)
@views maximum(abs, w.wrapped[1, ..])
end I am not sure if I constructed the example correctly, but I would find it interesting to see if the unknown type of |
With your example I get
with classical notation and
with EllipsisNotation. Classical notation is about 5% faster and uses less allocations with about the same memory estimate. |
Hm. Thinking about it, my example code was not very good... now you're benchmarking not just ellipsis notation but also general allocations, which can hide a lot. Maybe something more along those lines: struct Wrapper
wrapped::Any
end
function foo()
u = rand(5, 5, 5, 5, 100)
w = Wrapper(u)
@benchmark bar1($w)
@benchmark bar2($w)
end
function bar1(w)
@views maximum(abs, w.wrapped[1, ..])
end
function bar2(w)
@views maximum(abs, w.wrapped[1, :, :, :, :])
end Or am I still missing something here, @ranocha ? |
Needs to be function foo()
u = rand(5, 5, 5, 5, 100)
w = Wrapper(u)
display(@benchmark bar1($w))
display(@benchmark bar2($w))
end (You forgot the |
Okay, now there is quite a difference:
|
But why do you want to know what's going on when the compiler doesn't know the type of the array? That case should hopefully not occur in any performance-critical part... |
Anyway, there is certainly a difference between |
And that difference seems mainly due to the creation of the |
Mainly because my intuition about performance in Julia has been wrong so many times already that I cannot count ;-) OK, so I also did some additional checks and it seems like that the memory usage is ~3x higher for the ellipsis version compared to the colon version. This does seem to be independent on the dimension lengths (as expected) and of the number of dimensions of the arrays (which I was not sure about). Similarly, the compute time is always slightly higher for the ellipsis version, but not that much. Could you maybe do one last check, @erik-f, and add a @erik-f Just make sure that the problem size (number of elements and/or polydeg) is sufficiently large to get accurate measurements from the timers. According to https://github.com/KristofferC/TimerOutputs.jl#overhead, the overhead is ~0.25 microseconds, so if a single measurement on average takes more than 50 microseconds, you should be ok. |
With classical notation I get: julia> Trixi.run("examples/3d/parameters_hyp_diff_llf.toml", initial_refinement_level=4, t_end=0.2)
[...]
--------------------------------------------------------------------------------------------
Trixi.jl Time Allocations
---------------------- -----------------------
Tot / % measured: 5.24s / 90.4% 428MiB / 51.8%
Section ncalls time %tot avg alloc %tot avg
--------------------------------------------------------------------------------------------
main loop 1 4.70s 99.4% 4.70s 196MiB 88.3% 196MiB
rhs 505 3.41s 72.1% 6.76ms 66.8MiB 30.1% 136KiB
interface flux 505 717ms 15.2% 1.42ms 6.67MiB 3.01% 13.5KiB
volume integral 505 643ms 13.6% 1.27ms 6.74MiB 3.04% 13.7KiB
source terms 505 584ms 12.3% 1.16ms 6.69MiB 3.01% 13.6KiB
prolong2interfaces 505 448ms 9.47% 888μs 6.67MiB 3.00% 13.5KiB
surface integral 505 415ms 8.76% 821μs 6.71MiB 3.02% 13.6KiB
reset ∂u/∂t 505 402ms 8.49% 796μs 0.00B 0.00% 0.00B
Jacobian 505 100ms 2.10% 197μs 6.66MiB 3.00% 13.5KiB
prolong2boundaries 505 36.8ms 0.78% 72.9μs 6.64MiB 2.99% 13.5KiB
boundary flux 505 26.9ms 0.57% 53.2μs 6.67MiB 3.00% 13.5KiB
prolong2mortars 505 18.0ms 0.38% 35.6μs 6.65MiB 2.99% 13.5KiB
mortar flux 505 16.8ms 0.36% 33.3μs 6.72MiB 3.02% 13.6KiB
positivity preserving limiter 505 247μs 0.01% 489ns 0.00B 0.00% 0.00B
Runge-Kutta step 505 661ms 14.0% 1.31ms 6.68MiB 3.01% 13.5KiB
analyze solution 1 362ms 7.66% 362ms 62.6MiB 28.2% 62.6MiB
I/O 2 53.1ms 1.12% 26.5ms 60.0MiB 27.0% 30.0MiB
- steady-state check 101 51.0ms 1.08% 505μs 0.00B 0.00% 0.00B
calc_dt 101 981μs 0.02% 9.71μs 0.00B 0.00% 0.00B
mesh creation 1 29.2ms 0.62% 29.2ms 26.0MiB 11.7% 26.0MiB
creation 1 19.3ms 0.41% 19.3ms 15.3MiB 6.87% 15.3MiB
initial refinement 1 9.87ms 0.21% 9.87ms 10.7MiB 4.82% 10.7MiB
refinement patches 1 3.60μs 0.00% 3.60μs 80.0B 0.00% 80.0B
coarsening patches 1 800ns 0.00% 800ns 80.0B 0.00% 80.0B
read parameter file 1 212μs 0.00% 212μs 17.8KiB 0.01% 17.8KiB
-------------------------------------------------------------------------------------------- With EllipsisNotation I get: julia> Trixi.run("examples/3d/parameters_hyp_diff_llf.toml", initial_refinement_level=4, t_end=0.2)
[...]
--------------------------------------------------------------------------------------------
Trixi.jl Time Allocations
---------------------- -----------------------
Tot / % measured: 5.15s / 93.2% 407MiB / 54.6%
Section ncalls time %tot avg alloc %tot avg
--------------------------------------------------------------------------------------------
main loop 1 4.73s 98.6% 4.73s 196MiB 88.3% 196MiB
rhs 505 3.53s 73.6% 6.99ms 66.8MiB 30.1% 136KiB
interface flux 505 720ms 15.0% 1.43ms 6.68MiB 3.01% 13.5KiB
volume integral 505 658ms 13.7% 1.30ms 6.74MiB 3.03% 13.7KiB
source terms 505 596ms 12.4% 1.18ms 6.69MiB 3.01% 13.6KiB
prolong2interfaces 505 470ms 9.81% 931μs 6.67MiB 3.00% 13.5KiB
surface integral 505 448ms 9.35% 888μs 6.71MiB 3.02% 13.6KiB
reset ∂u/∂t 505 410ms 8.56% 812μs 0.00B 0.00% 0.00B
Jacobian 505 91.9ms 1.92% 182μs 6.66MiB 3.00% 13.5KiB
prolong2boundaries 505 48.2ms 1.01% 95.4μs 6.64MiB 2.99% 13.5KiB
boundary flux 505 36.9ms 0.77% 73.2μs 6.67MiB 3.00% 13.5KiB
prolong2mortars 505 20.4ms 0.43% 40.5μs 6.65MiB 2.99% 13.5KiB
mortar flux 505 20.3ms 0.42% 40.2μs 6.72MiB 3.02% 13.6KiB
positivity preserving limiter 505 253μs 0.01% 502ns 0.00B 0.00% 0.00B
Runge-Kutta step 505 676ms 14.1% 1.34ms 6.68MiB 3.00% 13.5KiB
analyze solution 1 247ms 5.15% 247ms 62.6MiB 28.2% 62.6MiB
I/O 2 68.4ms 1.43% 34.2ms 60.0MiB 27.0% 30.0MiB
- steady-state check 101 52.6ms 1.10% 521μs 117KiB 0.05% 1.16KiB
calc_dt 101 976μs 0.02% 9.66μs 0.00B 0.00% 0.00B
mesh creation 1 66.1ms 1.38% 66.1ms 26.0MiB 11.7% 26.0MiB
creation 1 56.5ms 1.18% 56.5ms 15.3MiB 6.87% 15.3MiB
initial refinement 1 9.58ms 0.20% 9.58ms 10.7MiB 4.82% 10.7MiB
refinement patches 1 4.00μs 0.00% 4.00μs 80.0B 0.00% 80.0B
coarsening patches 1 799ns 0.00% 799ns 80.0B 0.00% 80.0B
read parameter file 1 785μs 0.02% 785μs 17.8KiB 0.01% 17.8KiB
-------------------------------------------------------------------------------------------- So no time difference, but a few allocations with EllipsisNotation (compared to none at all with classical notation). |
Sounds good! However, I'm really wondering: Where do the >1 KiB of allocations per iteration come from? This seems like a ridiculously high number for a single |
These basically come from EllipsisNotation.jl and |
Sure, go ahead, please. |
@erik-f can you please go ahead and create a PR to introduce the new notation in the non-dimension-specific locations as discussed? |
Alright. |
The package EllipsisNotation.jl allows to use
..
in place of an arbitrary number of:
slice operators in a multi-dimensional array. It means "all of the columns before (or after)". This could be very helpful in simplifying code like thisto this
However, before using this, we should first benchmark it for performance (CPU time and memory usage), as usually there is no free lunch.
The text was updated successfully, but these errors were encountered: