Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking world ages seriously #394

Merged
merged 20 commits into from
Mar 13, 2023
Merged

Taking world ages seriously #394

merged 20 commits into from
Mar 13, 2023

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Feb 20, 2023

This PR reworks our world handling, now properly forwarding world ages to Base functions and more accurately keeping track of them (we previously were falling back to -1/MAX_WORLD in many cases). This means that a CompilerJob now needs to contain a valid world age (set automatically by the FunctionSpec default constructor), and that if you redefine a kernel the previously-constructed CompilerJob will refer to the previous version of that function.

Since the above is technically breaking, I made use of the opportunity to perform a couple of other breaking clean-ups. Summarizing those here:

  • FunctionSpec doesn't have F and TT typevars anymore
  • FunctionSpec only accepts function types now (i.e. this finishes Use function type in fspec instead of value #320)
  • similarly, deferred_codegen also requires a function type
  • CompilerJob now takes the source argument first, and all non-essential arguments are kwargs
  • The similar copy constructor is now a regular constructor

Note that the invalidating magic is now part of the get_world function, which gets called by the FunctionSpec constructor:

julia> f(f, tt) = FunctionSpec(f,tt)
f (generic function with 1 method)

julia> @code_warntype f(typeof(sin), Tuple{Int})
MethodInstance for f(::Type{typeof(sin)}, ::Type{Tuple{Int64}})
  from f(f, tt) @ Main REPL[3]:1
Arguments
  #self#::Core.Const(f)
  f::Core.Const(typeof(sin))
  tt::Core.Const(Tuple{Int64})
Body::FunctionSpec
1 ─ %1 = Main.FunctionSpec(f, tt)::Core.Const(kernel sin(Int64) in world 33405)
└──      return %1

Also note that the breakage of CompilerJob seems annoying but also helps to catch misuse when upgrading to this PR (because the kernel boolean arg may get interpreted as an integer world age 🤦 ).

Fixes #329, fixes #168
Also adapts to JuliaLang/julia#48611

@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Base: 86.77% // Head: 77.14% // Decreases project coverage by -9.64% ⚠️

Coverage data is based on head (22bd9c4) compared to base (b9c269f).
Patch coverage: 60.16% of modified lines in pull request are covered.

❗ Current head 22bd9c4 differs from pull request most recent head 05b6e30. Consider uploading reports for the commit 05b6e30 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #394      +/-   ##
==========================================
- Coverage   86.77%   77.14%   -9.64%     
==========================================
  Files          24       24              
  Lines        2836     2818      -18     
==========================================
- Hits         2461     2174     -287     
- Misses        375      644     +269     
Impacted Files Coverage Δ
src/cache.jl 56.96% <29.78%> (-43.04%) ⬇️
src/driver.jl 88.44% <66.66%> (-6.30%) ⬇️
src/jlgen.jl 71.49% <70.27%> (-9.79%) ⬇️
src/interface.jl 81.63% <100.00%> (-6.83%) ⬇️
src/rtlib.jl 93.54% <100.00%> (+0.07%) ⬆️
src/validation.jl 96.55% <100.00%> (-1.38%) ⬇️
src/precompile.jl 0.00% <0.00%> (-81.09%) ⬇️
src/reflection.jl 34.74% <0.00%> (-41.48%) ⬇️
src/execution.jl 60.00% <0.00%> (-40.00%) ⬇️
src/ptx.jl 70.79% <0.00%> (-23.04%) ⬇️
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@maleadt maleadt changed the title Add a test for cached compilation and redefinitions. Adapt to upstream changes wrt. world age handling in generated functions Feb 20, 2023
@maleadt maleadt changed the title Adapt to upstream changes wrt. world age handling in generated functions Rework world age handling Feb 20, 2023
@maleadt maleadt force-pushed the tb/test_cached_compilation branch 6 times, most recently from 22bd9c4 to 06342c2 Compare February 22, 2023 13:17
@maleadt maleadt marked this pull request as ready for review February 22, 2023 13:18
@maleadt maleadt force-pushed the tb/test_cached_compilation branch 2 times, most recently from 05b6e30 to 9017c35 Compare February 22, 2023 13:38
@maleadt
Copy link
Member Author

maleadt commented Feb 22, 2023

OK, this should be good to review. cc @vchuravy @jpsamaroo
Example adaption in https://github.com/JuliaGPU/CUDA.jl/compare/tb/world_ages

@maleadt maleadt requested review from vchuravy and jpsamaroo and removed request for vchuravy February 22, 2023 13:53
@maleadt
Copy link
Member Author

maleadt commented Feb 22, 2023

Note that this makes the following pattern break behave correctly:

using GPUCompiler
include("test/definitions/native.jl")

function main()
    @eval kernel() = nothing
    native_code_llvm(kernel, Tuple{})
end

main()
ERROR: LoadError: MethodError: no method matching kernel()
The applicable method may be too new: running in world age 33788, while current world is 33789.

Closest candidates are:
  kernel() (method too new to be called from this world context.)
   @ Main ~/Julia/pkg/GPUCompiler/wip.jl:5

Only guaranteed to work on 1.10 though. On 1.9 and earlier, we're doing fine on normal stuff, but world-age corner cases may be mishandled.

@maleadt maleadt force-pushed the tb/test_cached_compilation branch from 4d91a45 to 1ac5160 Compare March 13, 2023 13:16
@maleadt maleadt changed the title Rework world age handling Taking world ages seriously Mar 13, 2023
@maleadt
Copy link
Member Author

maleadt commented Mar 13, 2023

Took out the bits that rely on JuliaLang/julia#48611, but this should still be an improvement. Instead of a globally-incrementing id, we now take the (estimate of the) world age and pipe that through everywhere.

@maleadt maleadt merged commit 860ec6a into master Mar 13, 2023
@maleadt maleadt deleted the tb/test_cached_compilation branch March 13, 2023 15:15
@@ -175,6 +181,10 @@ end
end
end

# ensure that the returned method instance is valid in the compilation world.
# otherwise, `jl_create_native` won't actually emit any code.
@assert method_instance.def.primary_world <= job.source.world <= method_instance.def.deleted_world
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seems overly strict, as it is supposed to be perfectly reasonable to run a Method in any world, irrespective of its primary_world and delete_world values (which only refer to the lookup of said Method).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is supposed to be perfectly reasonable to run a Method in any world

What would that look like (in the context of GPUCompiler)? I put that there because jl_create_native uses the same check to decide whether to emit any code: https://github.com/JuliaLang/julia/blob/7341fb9517d290be02fdc54ae453999843a0dc7e/src/aotcompile.cpp#L332-L335

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that is just a heuristic decision, since our caller isn't designed to be particularly reliable about filtering those earlier (since we do want it to compile for both the current and typeinf worlds when applicable)

src/cache.jl Show resolved Hide resolved
src/cache.jl Show resolved Hide resolved
new_ci.slotflags = UInt8[0x00 for i = 1:3]

# return the world
push!(new_ci.code, ReturnNode(world))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be trying to always return the value of the world for the get_world function definition, not the lookup world?

But this also seems very risky, as you are claiming to the compiler that this world value is a constant, but we know for a fact (based on the edges, min_world, and max_world) that the meaning of world numbers is certain to be dynamic.

The appropriate token to return here might be the MethodInstance object mi? That would encapsulate fully the MethodMatch lookup result, and exactly represent the result of the lookup for a long as that is the correct compilation target?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fine to return the generator's world (post-JuliaLang/julia#48766)? That's what I'm doing in #403.

Returning the mi could work I guess, but would need some refactoring. We're currently returning the world so that we can use it to construct a FunctionSpec, which is the object that we use to look-up the code to compile:

# what we'll be compiling
struct FunctionSpec
ft::Type
tt::Type
world::UInt
FunctionSpec(ft::Type, tt::Type, world::Integer=get_world(ft, tt)) =
new(ft, tt, world)
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the reason we use a separate struct is that it used to carry more fields. Now that they're gone, maybe we should just use MethodInstance instead of FunctionSpec, as @vchuravy notes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is never allowable to expose the user to worlds from the compiler and vice versa. It violates the contract that only edges are observable effects.

Copy link
Member Author

@maleadt maleadt Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. We do however need the world age for some cases though, e.g., to look-up in our CodeInstance cache, and to pass to jl_create_native (not sure what to use here). I would assume that both need to be the world age that we need to generate code for (i.e. what we're returning now from this generator), and not the Method's primary world. Or what do you suggest to use here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU using mi.def.primary_world won't work because that returns too early of a world:

julia> foo(x) = x
julia> mi = methodinstance(typeof(foo), Tuple{Int})
julia> mi.def.primary_world |> Int
32455

julia> mi = methodinstance(typeof(map), Tuple{typeof(foo), Int})
julia> mi.def.primary_world |> Int
6074

i.e. using map's primary_world and passing that to, say jl_create_native won't work because we don't have foo there. That's why I was 'leaking' the world from the generated function before.

Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that just be the tls_world_age ccall?

@maleadt
Copy link
Member Author

maleadt commented Mar 29, 2023

Hmm, maybe. We also need a way to re-invoke the compiler when the generated function got called though, e.g., because a function was redefined. We were doing that by looking at the returned world and using that to index a Dict, but that doesn't work with the MethodInstance because it doesn't change if a child function is redefined. i.e.:

compilation_cache = Dict{}
cached_compilation(f, tt) = get!(compilation_cache, methodinstance(f, tt))

foo(x) = 1
cached_compilation(typeof(map), Tuple{Int})()   -> 1
foo(x) = 2
cached_compilation(typeof(map), Tuple{Int})()   -> still 1, because map's mi didn't change

@vtjnash
Copy link
Contributor

vtjnash commented Mar 29, 2023

Of course that wouldn't make sense to change. Worlds are constant, so their content does not (and should not) get invalidated ever. In base, we use both a forward edges walk or a reverse edges trigger to decide whether to reuse the code for a method instance from a different world age. When the reverse edges trigger, we walk back any inference result that we recorded as depending on it, and remove that code from future worlds. Later, when the method instance is called, we check whether there is an existing code instance for that world. If not, we re-run inference to either find an old one or make a new one that is applicable to the current world.

@maleadt
Copy link
Member Author

maleadt commented Mar 30, 2023

In base, we use both a forward edges walk or a reverse edges trigger to decide whether to reuse the code for a method instance from a different world age. When the reverse edges trigger, we walk back any inference result that we recorded as depending on it, and remove that code from future worlds. Later, when the method instance is called, we check whether there is an existing code instance for that world. If not, we re-run inference to either find an old one or make a new one that is applicable to the current world.

Right, but how would that look like for GPUCompiler? Ideally we don't have to reimplement all that functionality, is why I was using the codegen world age passed to the generator to key our compilation cache. As per your advice I changed it so that we don't leak that to the user anymore though, i.e., we're actually doing compilation in the current world.

@vtjnash
Copy link
Contributor

vtjnash commented Mar 31, 2023

I have written small bits of that code, but don’t think there is any convenient interface for that right now in Julia. You probably need to instrument the caching step during abstract interpretation to build a graph of all of the edges it returns in whatever alternative caching structure you are using, and then do the incremental memoized verify-graph algorithm that is currently only found in staticdata_utils.c.

@maleadt
Copy link
Member Author

maleadt commented Mar 31, 2023

Is it really necessary though? Currently we're using the fact that a generated function (with edges to the MethodInstance we're "invoking") gets called to trigger invalidation of the code we cached for that MethodInstance. Isn't that sufficient? It sees to work (mostly) fine, and without having to implement fine-grained invalidation ourselves.

@vtjnash
Copy link
Contributor

vtjnash commented Mar 31, 2023

That code gives the start point, but you still need to propagate it. I think there is a hook currently from method add that lets you traverse the backwards graph instead from the call leafs, if you prefer walking your cache in that direction instead

@vchuravy
Copy link
Member

I think there is a hook currently from method add that lets you traverse the backwards graph instead from the call leafs, if you prefer walking your cache in that direction instead

Are you referring to mi.callbacks? We use that to inform the external cache about invalidations

function invalidate_code_cache(cache::CodeCache, replaced::MethodInstance, max_world, seen=Set{MethodInstance}())

@vtjnash
Copy link
Contributor

vtjnash commented Mar 31, 2023

Yeah. That looks like mostly all you should need

@vtjnash
Copy link
Contributor

vtjnash commented Mar 31, 2023

Conceptually, there is 2 distinct cache operations here: whether the code is still valid and whether that lookup is valid. The MethodInstance represents the latter, the CodeInstance represents the former (in base Julia's current design)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CompilerJob function-type changes regressed MethodError reporting Better world age handling
3 participants