Attempts at specialization transparency, ref #57 #58

danielwe · 2024-07-22T05:49:59Z

Exploring ways of dealing with #57. This prototype injects a type variable (where parameter) for every argument that doesn't already have one. I think this is a satisfactory fix for codegen_level = "min", but it leads to more specialization than the user's code implies for codegen_level = "debug" (or less, if typevar injection is turned off for this case).

Here's a script that's helpful for experimenting, using allocations as a convenient proxy for the performance impact of runtime instability checking:

using BenchmarkTools: @ballocated
using DispatchDoctor: @stable

macro mymap!_body()
    return esc(
        quote
            y .= f.(x)
            # for i in eachindex(y, x)
            #     y[i] = f(x[i])
            # end
            return y
        end
    )
end

mymap!(f, y, x) = @mymap!_body

@stable default_codegen_level="debug" mymap_debug!(f, y, x) = @mymap!_body

@stable default_codegen_level="min" mymap_min!(f, y, x) = @mymap!_body

x, y = zeros(1000), zeros(1000)
print("$mymap!\t\t")
println(@ballocated mymap!(identity, $y, $x))
print("$mymap_debug!\t")
println(@ballocated mymap_debug!(identity, $y, $x))
print("$mymap_min!\t")
println(@ballocated mymap_min!(identity, $y, $x))

With the state of the code in this PR, the output is

julia> include("mymap.jl")
mymap!          144
mymap_debug!    0
mymap_min!      144

Setting genwhereparam=(codegen_level == "min") instead of simply true on line 230 in stabilization.jl, we get

julia> include("mymap.jl")
mymap!          144
mymap_debug!    1440
mymap_min!      144

We see that mymap!_debug is either better or much worse than the undecorated mymap!, but never the same. Meanwhile mymap!_min is faithful.

If you switch from broadcasting to loopy implementation (by commenting/uncommenting lines in mymap!_body), both of the above settings work great---all three versions get zero allocations. If you set genwhereparam=false, you'll recover the behavior observed in #57, where codegen_level = "min" results in lots of allocations.

For a solution along these lines, my feeling right now is that genwhereparam=(codegen_level == "min") is the most appropriate setting. In this case, the documentation should clearly flag that codegen_level = "min" is the choice for faithful behavior, while codegen_level = "debug" mainly exists for @code_warntype convenience and may be detrimental to performance. Perhaps the default should be changed to codegen_level = "min".

Hopefully there's a better solution to be found.

github-actions · 2024-07-22T05:51:39Z

Benchmark Results

	main	`197a0ed`...	main/197a0edbe5542d...
_stable/mode=disable	0.781 ± 0.02 μs	0.722 ± 0.011 μs	1.08
_stable/mode=error	0.69 ± 0.02 ms	0.707 ± 0.022 ms	0.976
_stable/mode=warn	0.687 ± 0.036 ms	0.7 ± 0.031 ms	0.98
time_to_load	0.0667 ± 0.00068 s	0.0672 ± 0.00022 s	0.993

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Ref MilesCranmer#57

danielwe · 2024-07-22T07:44:17Z

For what it's worth, this PR does nothing to fix the original problem that led me down this investigation. Turns out that's a thornier problem and needs a different MWE than #57. In fact, for said problem codegen_level = "debug" is more faithful than codegen_level = "min" to the behavior sans @stable, and this PR changes nothing regardless of the value of genwhereparam.

MilesCranmer · 2024-07-22T08:01:22Z

Maybe a bandaid fix is to just warn (in the docstring) about potential runtime dispatch for codegen_level="min", so that people profile their code and know where to look if something slows down, and add f::F as appropriate? That option exists mostly for faster pre-compilation and more accurate code coverage anyways; I guess "debug" is the full thing. Maybe we could even rename it to something else to make this clear.

danielwe · 2024-07-22T08:41:34Z

Well, it's not so clear-cut. As seen with the broadcasting implementation, codegen_level = "debug" can also add extra overhead.

Anyway, for the issue discussed here and in #57, adding f::F fixes the problem regardless of codegen level. That's probably worth mentioning somewhere in the docs/readme.

For the issue I'm alluding to in my previous comment, however, f::F does not help. I'll let you know if I ever manage to distill an MWE from that.

MilesCranmer · 2024-07-22T16:50:40Z

Well, it's not so clear-cut. As seen with the broadcasting implementation, codegen_level = "debug" can also add extra overhead.

I didn’t know this, how could this be true if it’s the same body and same signature?

danielwe · 2024-07-22T17:24:43Z

Explained in #57 (comment) and demonstrated in the benchmarks in the top post here. Briefly: if the body of the original function does not itself call f, but only passes it along to callees (such as if f is only used in broadcasting, which is the example I've been using), then the function generated by codegen_level = "stable" will not specialize to the type of f. The result is that the instability check happens at runtime, adding overhead.

To be clear, the function wouldn't specialize without @stable either, so it would likely incur dynamic dispatch and allocations even without DispatchDoctor. The difference is that DispatchDoctor adds extra runtime work because the instability check is not compiled away.

danielwe · 2024-07-22T17:24:48Z

Referring back to the top post here: The second benchmark reflects what DispatchDoctor currently does for mymap!_debug (ignore mymap!_min, which is altered by this PR). Notice that mymap!_debug has 10x the allocations of mymap!. Then look at the top benchmark: in this case, DispatchDoctor is forcing f::F, eliminating all allocations, including the ones that mymap! would ordinarily have.

The challenge is to strike the middle ground where the instability check is always compiled away, while the main body is only specialized to the extent it would be without @stable. The benchmarks above suggest that this can be achieved for codegen_level = "min" by enforcing specialization in the outer function, but not in the simulator. Whether it's possible for codegen_level = "debug" remains unclear.

danielwe · 2024-07-22T17:29:00Z

Note that I'm just using allocations as a simple, deterministic proxy here. The more important issue is of course actual performance, and the difference can be many orders of magnitude for simple functions like this.

MilesCranmer · 2024-07-23T00:16:32Z

I am thinking about if there are any issues with forcing Julia to always specialize. I suppose, no? It just means larger compilation cost, and that's it? Although I feel like that is implied anyway by using DispatchDoctor.jl, and is already pointed out in the caveats section, so maybe it's not a big deal. So yeah, I would be on board with that change.

By the way, with your current attempt, would it also work on something like

@stable mymap!((f,), y, x) = @mymap!_body

i.e., where f is inside a tuple? I'm not actually sure how Julia specialization works for things like this. Perhaps it is already guaranteed to specialize?

MilesCranmer · 2024-07-23T00:22:59Z

src/utils.jl

+)::Tuple{Union{Expr,Symbol},Union{Expr,Nothing},Union{Symbol,Nothing}}
+    genwhereparam || return ex, nothing, nothing
+    whereparam = gensym("T")
+    return Expr(:(::), ex, whereparam), nothing, whereparam
 end
 function sanitize_arg_for_stability_check(


Now that there are 3 returns, it might be nice to return a NamedTuple instead, for robustness. Returning tuples is always a bit risky because of things like

(x, y) = (1, 2, 3)

being valid Julia syntax.

e.g., could be:

return (; arg=Expr(:(::), ex, whereparam), destruct=nothing, whereparam=whereparam)

Yeah, that's a good idea!

MilesCranmer · 2024-07-23T00:28:20Z

src/utils.jl

+        # (Composite case)
+        # matches things like `::Type{T}=MyType`
+        arg_ex, destructure_ex, whereparam = sanitize_arg_for_stability_check(
+            first(args); genwhereparam
+        )
+        return Expr(head, arg_ex, last(args)), destructure_ex, whereparam


Does this one need the genwhereparam too? It already has the ::T part, no?

It's needed because this is also the branch that matches a regular x=default argument without a type parameter

If it happens to be a ::Type{T}=MyType argument, the inner recursive call will land in the branch that sets genwhereparam=false as required

I see, thanks!

MilesCranmer · 2024-07-23T00:30:35Z

src/stabilization.jl

        (
-            map(first, args_destructurings),
-            filter(!isnothing, map(last, args_destructurings)),
+            map(first, args_destructurings_typevars),


The error messages might need to be treated since they now show ::var"..." after every arg.

danielwe · 2024-07-23T18:48:15Z

I am thinking about if there are any issues with forcing Julia to always specialize. I suppose, no? It just means larger compilation cost, and that's it?

I think this is where I should acknowledge that I don't really know what I'm talking about when discussing compiler-related behavior and tradeoffs. But for what it's worth, your hunch aligns with mine.

with your current attempt, would it also work on something like
@stable mymap!((f,), y, x) = @mymap!_body
i.e., where f is inside a tuple?

Yes, the current implementation injects type parameters for every type of argument, turning this into something like

function mymap!(var1::T1, y::T2, z::T3) where {T1,T2,T3}
    <instability check>
    (f,) = var1
    @mymap!_body
end

The change as it stands is fully functional. The only tests that don't pass are those that query the exact form of the generated code.

I'm not actually sure how Julia specialization works for things like this. Perhaps it is already guaranteed to specialize?

This may be incorrect, but I think argument destructuring lowers to code that's more or less equivalent to the code we're generating manually. That is, f((x, y), z) = ... becomes something like f(var1, z) = ((x, y) = var1; ...) after lowering, and I assume the normal rules about specialization apply after that, which I think means that it always specializes, since var1 is explicitly used in the function body. (And since neither functions nor types are iterable, this would error for such arguments anyway, however that's not true for property destructuring.)

You can also combine destructuring and vararg into f((x, y)...) = ..., which looks confusing but is actually just f(args...) = ((x, y) = args; ...), i.e., it extracts the first two of an arbitrarily long list of arguments. Once again, since args is used explicitly within the lowered function body, I think this should always specialize.

danielwe · 2024-07-23T18:53:13Z

I don't think I can set aside time to take this from draft to ready for the next several days/weeks, so feel free to take it and run with it if you've developed a conviction about the way forward.

danielwe · 2024-07-23T19:00:07Z

It just means larger compilation cost, and that's it?

One more comment here: the more I think about it, the more I land on the side that DispatchDoctor ought to err on the side of more specialization rather than potential runtime overhead. People who use @stable can be presumed to care about performance. But it would be good to point this out clearly in the README such that users know how they can restore performance if they observe increased allocations/slowdown after removing @stable.

MilesCranmer · 2024-07-29T16:44:05Z

Sorry I just realised I didn't respond. Regarding specialisation: yes, I totally am on board with your proposal to force more specialisation. Especially since you have shown that sometimes specialisation is negatively affected, so I think it's much better to flip it the other way.

MilesCranmer · 2024-08-04T23:23:26Z

We should also add this as a test to ensure the new symbol extractor can handle it:

@testitem "issue #59" begin
    using DispatchDoctor
    @stable function f(@nospecialize(x))
        return x > 0 ? x : 0.0
    end
    @test_throws TypeInstabilityError f(1)
end

It's a bit tricky though because you would want to have the @nospecialize appear in the arg list of the simulator function, but not appear in the arg symbols passed to _promote_op.

Edit: I think the easiest way to do this is just keep the sanitize_arg_for_stability_check as is – because we want func[:args] to still have the @nospecialize – but then have a separate sanitize_arg_for_function_call which operates on args and strips the @nospecialize.

MilesCranmer · 2024-10-06T15:50:38Z

@danielwe I actually ran into something caused by this type of issue and was wondering if you could take a stab at finishing this PR? I am eager to get something like this into DD

danielwe · 2024-10-11T18:24:57Z

Would love to help out, but right now I’m rather swamped. I’ll try to get to it eventually if you or someone else don’t get there before me, but feel free to take this and run with it.

As I remember, the solution is just to inject type parameters for every argument in the method definition, right?

MilesCranmer · 2024-10-12T18:18:33Z

I think so. Would that cause any issues with the type hierarchy of Julia though? Like would it change position of methods in the type hierarchy by making them more specific?

danielwe · 2024-10-12T23:18:13Z

I don't think so if the type parameters are unconstrained, but I'm not particularly knowledgeable about this

First attempt at specialization transparency

197a0ed

Ref MilesCranmer#57

danielwe force-pushed the specialization branch from 70763a2 to 197a0ed Compare July 22, 2024 06:15

MilesCranmer reviewed Jul 23, 2024

View reviewed changes

danielwe mentioned this pull request Sep 22, 2024

Non-specialized methods with custom rules result in unnecessary use of runtime handlers and "Non-constant keyword argument" error EnzymeAD/Enzyme.jl#1873

Open

MilesCranmer mentioned this pull request Oct 6, 2024

BREAKING: Change expression types to DynamicExpressions.Expression (from DynamicExpressions.Node) MilesCranmer/SymbolicRegression.jl#326

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempts at specialization transparency, ref #57 #58

Attempts at specialization transparency, ref #57 #58

danielwe commented Jul 22, 2024 •

edited

Loading

github-actions bot commented Jul 22, 2024 •

edited

Loading

danielwe commented Jul 22, 2024

MilesCranmer commented Jul 22, 2024 •

edited

Loading

danielwe commented Jul 22, 2024 •

edited

Loading

MilesCranmer commented Jul 22, 2024

danielwe commented Jul 22, 2024

danielwe commented Jul 22, 2024 •

edited

Loading

danielwe commented Jul 22, 2024

MilesCranmer commented Jul 23, 2024

MilesCranmer Jul 23, 2024 •

edited

Loading

danielwe Jul 23, 2024

MilesCranmer Jul 23, 2024

danielwe Jul 23, 2024

danielwe Jul 23, 2024

MilesCranmer Jul 23, 2024

MilesCranmer Jul 23, 2024

danielwe commented Jul 23, 2024

danielwe commented Jul 23, 2024

danielwe commented Jul 23, 2024 •

edited

Loading

MilesCranmer commented Jul 29, 2024

MilesCranmer commented Aug 4, 2024 •

edited

Loading

MilesCranmer commented Oct 6, 2024

danielwe commented Oct 11, 2024

MilesCranmer commented Oct 12, 2024

danielwe commented Oct 12, 2024

Attempts at specialization transparency, ref #57 #58

Are you sure you want to change the base?

Attempts at specialization transparency, ref #57 #58

Conversation

danielwe commented Jul 22, 2024 • edited Loading

github-actions bot commented Jul 22, 2024 • edited Loading

Benchmark Results

Benchmark Plots

danielwe commented Jul 22, 2024

MilesCranmer commented Jul 22, 2024 • edited Loading

danielwe commented Jul 22, 2024 • edited Loading

MilesCranmer commented Jul 22, 2024

danielwe commented Jul 22, 2024

danielwe commented Jul 22, 2024 • edited Loading

danielwe commented Jul 22, 2024

MilesCranmer commented Jul 23, 2024

MilesCranmer Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

danielwe Jul 23, 2024

Choose a reason for hiding this comment

MilesCranmer Jul 23, 2024

Choose a reason for hiding this comment

danielwe Jul 23, 2024

Choose a reason for hiding this comment

danielwe Jul 23, 2024

Choose a reason for hiding this comment

MilesCranmer Jul 23, 2024

Choose a reason for hiding this comment

MilesCranmer Jul 23, 2024

Choose a reason for hiding this comment

danielwe commented Jul 23, 2024

danielwe commented Jul 23, 2024

danielwe commented Jul 23, 2024 • edited Loading

MilesCranmer commented Jul 29, 2024

MilesCranmer commented Aug 4, 2024 • edited Loading

MilesCranmer commented Oct 6, 2024

danielwe commented Oct 11, 2024

MilesCranmer commented Oct 12, 2024

danielwe commented Oct 12, 2024

danielwe commented Jul 22, 2024 •

edited

Loading

github-actions bot commented Jul 22, 2024 •

edited

Loading

MilesCranmer commented Jul 22, 2024 •

edited

Loading

danielwe commented Jul 22, 2024 •

edited

Loading

danielwe commented Jul 22, 2024 •

edited

Loading

MilesCranmer Jul 23, 2024 •

edited

Loading

danielwe commented Jul 23, 2024 •

edited

Loading

MilesCranmer commented Aug 4, 2024 •

edited

Loading