Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempts at specialization transparency, ref #57 #58

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

danielwe
Copy link
Contributor

@danielwe danielwe commented Jul 22, 2024

Exploring ways of dealing with #57. This prototype injects a type variable (where parameter) for every argument that doesn't already have one. I think this is a satisfactory fix for codegen_level = "min", but it leads to more specialization than the user's code implies for codegen_level = "debug" (or less, if typevar injection is turned off for this case).

Here's a script that's helpful for experimenting, using allocations as a convenient proxy for the performance impact of runtime instability checking:

using BenchmarkTools: @ballocated
using DispatchDoctor: @stable

macro mymap!_body()
    return esc(
        quote
            y .= f.(x)
            # for i in eachindex(y, x)
            #     y[i] = f(x[i])
            # end
            return y
        end
    )
end

mymap!(f, y, x) = @mymap!_body

@stable default_codegen_level="debug" mymap_debug!(f, y, x) = @mymap!_body

@stable default_codegen_level="min" mymap_min!(f, y, x) = @mymap!_body

x, y = zeros(1000), zeros(1000)
print("$mymap!\t\t")
println(@ballocated mymap!(identity, $y, $x))
print("$mymap_debug!\t")
println(@ballocated mymap_debug!(identity, $y, $x))
print("$mymap_min!\t")
println(@ballocated mymap_min!(identity, $y, $x))

With the state of the code in this PR, the output is

julia> include("mymap.jl")
mymap!          144
mymap_debug!    0
mymap_min!      144

Setting genwhereparam=(codegen_level == "min") instead of simply true on line 230 in stabilization.jl, we get

julia> include("mymap.jl")
mymap!          144
mymap_debug!    1440
mymap_min!      144

We see that mymap!_debug is either better or much worse than the undecorated mymap!, but never the same. Meanwhile mymap!_min is faithful.

If you switch from broadcasting to loopy implementation (by commenting/uncommenting lines in mymap!_body), both of the above settings work great---all three versions get zero allocations. If you set genwhereparam=false, you'll recover the behavior observed in #57, where codegen_level = "min" results in lots of allocations.

For a solution along these lines, my feeling right now is that genwhereparam=(codegen_level == "min") is the most appropriate setting. In this case, the documentation should clearly flag that codegen_level = "min" is the choice for faithful behavior, while codegen_level = "debug" mainly exists for @code_warntype convenience and may be detrimental to performance. Perhaps the default should be changed to codegen_level = "min".

Hopefully there's a better solution to be found.

Copy link

github-actions bot commented Jul 22, 2024

Benchmark Results

main 197a0ed... main/197a0edbe5542d...
_stable/mode=disable 0.781 ± 0.02 μs 0.722 ± 0.011 μs 1.08
_stable/mode=error 0.69 ± 0.02 ms 0.707 ± 0.022 ms 0.976
_stable/mode=warn 0.687 ± 0.036 ms 0.7 ± 0.031 ms 0.98
time_to_load 0.0667 ± 0.00068 s 0.0672 ± 0.00022 s 0.993

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@danielwe
Copy link
Contributor Author

For what it's worth, this PR does nothing to fix the original problem that led me down this investigation. Turns out that's a thornier problem and needs a different MWE than #57. In fact, for said problem codegen_level = "debug" is more faithful than codegen_level = "min" to the behavior sans @stable, and this PR changes nothing regardless of the value of genwhereparam.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Jul 22, 2024

Maybe a bandaid fix is to just warn (in the docstring) about potential runtime dispatch for codegen_level="min", so that people profile their code and know where to look if something slows down, and add f::F as appropriate? That option exists mostly for faster pre-compilation and more accurate code coverage anyways; I guess "debug" is the full thing. Maybe we could even rename it to something else to make this clear.

@danielwe
Copy link
Contributor Author

danielwe commented Jul 22, 2024

Well, it's not so clear-cut. As seen with the broadcasting implementation, codegen_level = "debug" can also add extra overhead.

Anyway, for the issue discussed here and in #57, adding f::F fixes the problem regardless of codegen level. That's probably worth mentioning somewhere in the docs/readme.

For the issue I'm alluding to in my previous comment, however, f::F does not help. I'll let you know if I ever manage to distill an MWE from that.

@MilesCranmer
Copy link
Owner

Well, it's not so clear-cut. As seen with the broadcasting implementation, codegen_level = "debug" can also add extra overhead.

I didn’t know this, how could this be true if it’s the same body and same signature?

@danielwe
Copy link
Contributor Author

Explained in #57 (comment) and demonstrated in the benchmarks in the top post here. Briefly: if the body of the original function does not itself call f, but only passes it along to callees (such as if f is only used in broadcasting, which is the example I've been using), then the function generated by codegen_level = "stable" will not specialize to the type of f. The result is that the instability check happens at runtime, adding overhead.

To be clear, the function wouldn't specialize without @stable either, so it would likely incur dynamic dispatch and allocations even without DispatchDoctor. The difference is that DispatchDoctor adds extra runtime work because the instability check is not compiled away.

@danielwe
Copy link
Contributor Author

danielwe commented Jul 22, 2024

Referring back to the top post here: The second benchmark reflects what DispatchDoctor currently does for mymap!_debug (ignore mymap!_min, which is altered by this PR). Notice that mymap!_debug has 10x the allocations of mymap!. Then look at the top benchmark: in this case, DispatchDoctor is forcing f::F, eliminating all allocations, including the ones that mymap! would ordinarily have.

The challenge is to strike the middle ground where the instability check is always compiled away, while the main body is only specialized to the extent it would be without @stable. The benchmarks above suggest that this can be achieved for codegen_level = "min" by enforcing specialization in the outer function, but not in the simulator. Whether it's possible for codegen_level = "debug" remains unclear.

@danielwe
Copy link
Contributor Author

Note that I'm just using allocations as a simple, deterministic proxy here. The more important issue is of course actual performance, and the difference can be many orders of magnitude for simple functions like this.

@MilesCranmer
Copy link
Owner

I am thinking about if there are any issues with forcing Julia to always specialize. I suppose, no? It just means larger compilation cost, and that's it? Although I feel like that is implied anyway by using DispatchDoctor.jl, and is already pointed out in the caveats section, so maybe it's not a big deal. So yeah, I would be on board with that change.

By the way, with your current attempt, would it also work on something like

@stable mymap!((f,), y, x) = @mymap!_body

i.e., where f is inside a tuple? I'm not actually sure how Julia specialization works for things like this. Perhaps it is already guaranteed to specialize?

)::Tuple{Union{Expr,Symbol},Union{Expr,Nothing},Union{Symbol,Nothing}}
genwhereparam || return ex, nothing, nothing
whereparam = gensym("T")
return Expr(:(::), ex, whereparam), nothing, whereparam
end
function sanitize_arg_for_stability_check(
Copy link
Owner

@MilesCranmer MilesCranmer Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there are 3 returns, it might be nice to return a NamedTuple instead, for robustness. Returning tuples is always a bit risky because of things like

(x, y) = (1, 2, 3)

being valid Julia syntax.

e.g., could be:

return (; arg=Expr(:(::), ex, whereparam), destruct=nothing, whereparam=whereparam)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea!

Comment on lines +93 to +98
# (Composite case)
# matches things like `::Type{T}=MyType`
arg_ex, destructure_ex, whereparam = sanitize_arg_for_stability_check(
first(args); genwhereparam
)
return Expr(head, arg_ex, last(args)), destructure_ex, whereparam
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this one need the genwhereparam too? It already has the ::T part, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed because this is also the branch that matches a regular x=default argument without a type parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it happens to be a ::Type{T}=MyType argument, the inner recursive call will land in the branch that sets genwhereparam=false as required

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks!

(
map(first, args_destructurings),
filter(!isnothing, map(last, args_destructurings)),
map(first, args_destructurings_typevars),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error messages might need to be treated since they now show ::var"..." after every arg.

@danielwe
Copy link
Contributor Author

I am thinking about if there are any issues with forcing Julia to always specialize. I suppose, no? It just means larger compilation cost, and that's it?

I think this is where I should acknowledge that I don't really know what I'm talking about when discussing compiler-related behavior and tradeoffs. But for what it's worth, your hunch aligns with mine.

with your current attempt, would it also work on something like

@stable mymap!((f,), y, x) = @mymap!_body

i.e., where f is inside a tuple?

Yes, the current implementation injects type parameters for every type of argument, turning this into something like

function mymap!(var1::T1, y::T2, z::T3) where {T1,T2,T3}
    <instability check>
    (f,) = var1
    @mymap!_body
end

The change as it stands is fully functional. The only tests that don't pass are those that query the exact form of the generated code.

I'm not actually sure how Julia specialization works for things like this. Perhaps it is already guaranteed to specialize?

This may be incorrect, but I think argument destructuring lowers to code that's more or less equivalent to the code we're generating manually. That is, f((x, y), z) = ... becomes something like f(var1, z) = ((x, y) = var1; ...) after lowering, and I assume the normal rules about specialization apply after that, which I think means that it always specializes, since var1 is explicitly used in the function body. (And since neither functions nor types are iterable, this would error for such arguments anyway, however that's not true for property destructuring.)

You can also combine destructuring and vararg into f((x, y)...) = ..., which looks confusing but is actually just f(args...) = ((x, y) = args; ...), i.e., it extracts the first two of an arbitrarily long list of arguments. Once again, since args is used explicitly within the lowered function body, I think this should always specialize.

@danielwe
Copy link
Contributor Author

I don't think I can set aside time to take this from draft to ready for the next several days/weeks, so feel free to take it and run with it if you've developed a conviction about the way forward.

@danielwe
Copy link
Contributor Author

danielwe commented Jul 23, 2024

It just means larger compilation cost, and that's it?

One more comment here: the more I think about it, the more I land on the side that DispatchDoctor ought to err on the side of more specialization rather than potential runtime overhead. People who use @stable can be presumed to care about performance. But it would be good to point this out clearly in the README such that users know how they can restore performance if they observe increased allocations/slowdown after removing @stable.

@MilesCranmer
Copy link
Owner

Sorry I just realised I didn't respond. Regarding specialisation: yes, I totally am on board with your proposal to force more specialisation. Especially since you have shown that sometimes specialisation is negatively affected, so I think it's much better to flip it the other way.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Aug 4, 2024

We should also add this as a test to ensure the new symbol extractor can handle it:

@testitem "issue #59" begin
    using DispatchDoctor
    @stable function f(@nospecialize(x))
        return x > 0 ? x : 0.0
    end
    @test_throws TypeInstabilityError f(1)
end

It's a bit tricky though because you would want to have the @nospecialize appear in the arg list of the simulator function, but not appear in the arg symbols passed to _promote_op.

Edit: I think the easiest way to do this is just keep the sanitize_arg_for_stability_check as is – because we want func[:args] to still have the @nospecialize – but then have a separate sanitize_arg_for_function_call which operates on args and strips the @nospecialize.

@MilesCranmer
Copy link
Owner

@danielwe I actually ran into something caused by this type of issue and was wondering if you could take a stab at finishing this PR? I am eager to get something like this into DD

@danielwe
Copy link
Contributor Author

Would love to help out, but right now I’m rather swamped. I’ll try to get to it eventually if you or someone else don’t get there before me, but feel free to take this and run with it.

As I remember, the solution is just to inject type parameters for every argument in the method definition, right?

@MilesCranmer
Copy link
Owner

I think so. Would that cause any issues with the type hierarchy of Julia though? Like would it change position of methods in the type hierarchy by making them more specific?

@danielwe
Copy link
Contributor Author

I don't think so if the type parameters are unconstrained, but I'm not particularly knowledgeable about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants