-
-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing type-instability in callback handling #971
Comments
Hey, did this ever find its way to a PR? I know you put a few things in but I can't find a PR associated with this, though I swear I remembered an email about it? Or maybe I'm misremembering. |
I got caught up a bit with the normal PhD grind. I can work on getting something PR-worthy within the next week or so. I was looking for some amount of suggestion on where to PR. My leading idea that I'll PR for unless you have more feedback is:
Alternatively, I could: |
As mentioned in SciML/DifferentialEquations.jl#971, the current recursive method for identifying the first continuous callback can cause the compiler to give up on type inference, especially when there are many callbacks. The fallback then allocates. This switches this function to using a generated function (along with an inline function that takes splatted tuples). Because this generated function explicitly unrolls the tuple, there are no type inference problems. I added a test that allocates using the old implementation (about 19kb allocations!) but does not with the new system.
As mentioned in SciML/DifferentialEquations.jl#971, the current recursive method for identifying the first continuous callback can cause the compiler to give up on type inference, especially when there are many callbacks. The fallback then allocates. This switches this function to using a generated function (along with an inline function that takes splatted tuples). Because this generated function explicitly unrolls the tuple, there are no type inference problems. I added a test that allocates using the old implementation (about 19kb allocations!) but does not with the new system.
Currently, as referenced in SciML/DifferentialEquations.jl#971, the old implementation of `handle_callbacks!` directly calls. `apply_callback!` on `continuous_callbacks[idx]`, which is inherently type-unstable because `apply_callback!` is specialized on the callback type. This commit adds a generated function `apply_ith_callback!` which generates type-stable code to do the same thing, where for each callback tuple type, the generated function unrolls the tuple by checking the callback index against static indicies. As a nice bonus, this generated function seems to often be converted into a switch statement at the LLVM level: ``` switch i64 %4, label %L46 [ i64 9, label %L3 i64 8, label %L8 i64 7, label %L13 i64 6, label %L18 i64 5, label %L23 i64 4, label %L28 i64 3, label %L33 i64 2, label %L38 i64 1, label %L43 ] ``` For testing, I added an allocation test which sets up a simple ODE problem, steps the integrator manually to before the first callback, then manipulates integrator state past the first callback point. This way, we can directly call `handle_callbacks!` and write a test on the allocation count. I confirm that (at least testing against commit SciML/DiffEqBase.jl@1799fc3, the current master branch tip in DiffEqBase.jl), the new method does not allocate, whereas the old one allocates. This may not be the case until a new release is cut of DiffEqBase.jl, because the old version of `find_first_continuous_callback` might allocate.
Currently, as referenced in SciML/DifferentialEquations.jl#971, the old implementation of `handle_callbacks!` directly calls. `apply_callback!` on `continuous_callbacks[idx]`, which is inherently type-unstable because `apply_callback!` is specialized on the callback type. This commit adds a generated function `apply_ith_callback!` which generates type-stable code to do the same thing, where for each callback tuple type, the generated function unrolls the tuple by checking the callback index against static indicies. As a nice bonus, this generated function seems to often be converted into a switch statement at the LLVM level: ``` switch i64 %4, label %L46 [ i64 9, label %L3 i64 8, label %L8 i64 7, label %L13 i64 6, label %L18 i64 5, label %L23 i64 4, label %L28 i64 3, label %L33 i64 2, label %L38 i64 1, label %L43 ] ``` For testing, I added an allocation test which sets up a simple ODE problem, steps the integrator manually to before the first callback, then manipulates integrator state past the first callback point. This way, we can directly call `handle_callbacks!` and write a test on the allocation count. I confirm that (at least testing against commit SciML/DiffEqBase.jl@1799fc3, the current master branch tip in DiffEqBase.jl), the new method does not allocate, whereas the old one allocates. This may not be the case until a new release is cut of DiffEqBase.jl, because the old version of `find_first_continuous_callback` might allocate.
This is closable now that both PRs have been merged. Thanks Chris! |
Thanks for your contributions! These are a few things that have bugged me for awhile so I'm glad someone put the work in to fix it. |
In the handle_callbacks! function in OrdinaryDiffEq, both the
find_first_continuous_callback
andapply_callback!
lines are either type-unstable or can sometimes fail to be inferred properly. Both can be solved with generated functions, but I'd like feedback on how to PR this (PR into both OrdinaryDiffEq and DiffEqBase? Only add functions to OrdinaryDiffEq?) and if the generated functions have any disadvantages.Fixing
apply_callback!
type instabilityThe line
allocates due to the indexing into
continuous_callbacks[idx]
.@code_warntype
shows the instabilitywhich, at the LLVM level calls out to
gc_alloc_obj
, and churns a lot of allocations if you have a lot of callbacks that call frequently:This is fixed by adding a generated overload that generates code that explicitly iterates over the tuple type in a type stable way. It also often seems to inline specialized
apply_callback!
calls, which is nice as well:Generating
find_first_continuous_callback
to side-step inference failureDepending on the ODE problem / number of callbacks, sometimes the recursive
find_first_continuous_callback
functions fail to infer. From an example Cthulhu run, even though stuff should be inferable, the generated function is type unstable which unfortunately does N allocations per internal step for N continuous callbacks :(The recursive definition can relatively straightforwardly be replaced with a generated function:
This also is type stable and also often inlines the specific
find_callback_time
calls.Implementation questions
I currently define these functions in DiffEqBase, and then change the implementation of
handle_callbacks!
in OrdinaryDiffEq with the versions that doesn't splat thecontinuous_callbacks
tuple:DiffEqBase
. I can add some, or is this something that should be implicitly covered by the tests in OrdinaryDiffEq?Possible alternate implementation
Instead of messing with
DiffEqBase
, this could be alternatively implemented as makinghandle_callbacks!
into a generated function, or making a helper generated function to replace just these lines, which could be a little nicer.The text was updated successfully, but these errors were encountered: