-
Notifications
You must be signed in to change notification settings - Fork 16
Attempts to reduce latency #2225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3634c5e to
89c5e57
Compare
0f2289b to
235eae6
Compare
|
Some important points are:
|
|
On Julia 1.11: # main
7.758546 seconds (105.21 M allocations: 4.691 GiB, 27.90% gc time, 99.69% compilation time)
1.794 ms (0 allocations: 0 bytes)
# This branch
2.511549 seconds (42.33 M allocations: 1.861 GiB, 24.26% gc time, 99.73% compilation time)
1.646 ms (0 allocations: 0 bytes) |
d59ac82 to
e6ee4d6
Compare
e6ee4d6 to
0363ae8
Compare
|
Unfortunately, we cannot type-hint on const ᶜadvdivᵥ = Operators.DivergenceF2C(
bottom = Operators.SetValue(CT3(0)),
top = Operators.SetValue(CT3(0)),
)This pattern costs us ~10-20% latency on our broadcast (I think all?) expressions. Since fixing this would be a breaking change, I'm going to remove this part. |
36249bc to
ca4aa4e
Compare
|
Running this in ClimaAtmos: import ClimaComms
ClimaComms.@import_required_backends
import ClimaAtmos as CA
import SciMLBase
import Random
Random.seed!(1234)
empty!(ARGS);
push!(ARGS, "--config_file", "config/model_configs/diagnostic_edmfx_trmm_stretched_box.yml");
push!(ARGS, "--job_id", "diagnostic_edmfx_trmm_stretched_box");
@time begin
(; config_file, job_id) = CA.commandline_kwargs();
config = CA.AtmosConfig(config_file; job_id);
simulation = CA.AtmosSimulation(config);
(; integrator) = simulation;
SciMLBase.step!(integrator)
endYields: Main branch: This branch: So, this appears to be showing benefits in a larger context, in atmos. |
ca4aa4e to
f0d92c1
Compare
f0d92c1 to
69ff0d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that we can't abstract out any of these changes into UnrolledUtilities without hurting latency, but I expect that by further simplifying the broadcasting and stencil code we can avoid triggering compilation heuristics through more layers of inlining. Until then, looks like we're stuck with manually doing inlining for the compiler. As long as these shenanigans remain isolated to struct.jl and finitedifference.jl, I'm happy with the changes here.


This PR is an attempt to reduce latency. This was motivated by the example in #2215.
Several experiments were attempted, some were found to be fruitful. These experiments were performed with Julia 1.10:
Using
SnoopCompilewith flamegraphs: