Refactor CodeInfo/CodeInstance separation and interfaces #53219

Keno · 2024-02-06T21:20:40Z

The CodeInfo type is one of the oldest types in the system and has grown a bit of cruft. In particular, the rettype, inferred, parent, edges, min_world, max_world fields are not used for the original purpose of representing code, but for one or more of (in decreasing order of badness):

Smuggling extra results from inference into the compiler
Sumggling extra arguments into OpaqueClosure constructors
Passing extra information from generated functions to inference

The first of these points in particular causes a fair bit of mixup between caching concerns and compiler concerns and results in external abstract interpreters maintainging their own dummy CodeInfos, just to comply with the interface. Originally, I just wanted to clean up that bit, but it didn't really make sense as a standalone piece, so this PR is more comprehensive.

In particular, this PR:

Removes the parent, inferred and rettype fields of CodeInfo. They are largely vestigal and code accessing these is probably doing the wrong thing. They should instead be looking at either the CodeInstance or remembering the query that was asked of the cache in the first place.
Makes edges, min_world and max_world used for generated functions only. All other uses were replaced by appropriate queries on the CodeInstance. In particular, inference no longer sets these. In the future we may want to consider removing these also and having generated functions return some other object, but that is a topic to revisit once the broader compiler plugins landscape is more clear.
Makes the external type inference interface return CodeInstance rather than CodeInfo. This results in a lot of cleanup, because many functions had multiple code paths, some for CodeInstance and others for fallback to inference/CodeInfo. This is all cleaned up now. If you don't have a CodeInstance, you can ask inference for one. This CodeInstance may or may not be in the cache, but you can look at its types, compile it, etc.
Moves the main inference entrypoint out of the codegen library. There is still a little bit of entangelement, but this makes codegen much more of an independent system that you give a CodeInstance and it just fills in the invoke pointer for.

With these changes, only the third use of the above mentioned fields remains.

The overall theme here is decoupling. Over time, various parties have wanted to use the julia compiler with custom IR datastructure, backend code generators, caches, etc. This doesn't quite get us all the way there, but makes inference and codegen much more independent with a clear IR-format-independent interface (CodeInstance).

src/julia_internal.h

base/compiler/typeinfer.jl

vchuravy · 2024-02-06T22:07:38Z

This will conflict with #52233 which I am hoping to merge in the next few days. I went through this PR and I didn't see anything to bad, only my unease about uncached CodeInstances :)

aviatesk · 2024-02-07T17:00:18Z

Planning to conduct a thorough review around tomorrow. On a broader note, maybe relevant to Valentin's PR as well, it might be worth discussing whether this change should be incorporated into v1.11. While I agree that external uses of the CodeInfo fields is usualy wrong, considering its anticipated widespread use within the community, giving some leeway for ecosystem updates could be preferred?

Keno · 2024-02-07T20:37:30Z

We're still before the feature freeze, so I think it makes sense to merge these kinds of internals-breaking changes. Otherwise, we'll merge it right after the feature freeze and we'll need to maintain both versions through the full RC period.

vchuravy · 2024-02-10T22:53:36Z

Rebased onto master after #52233 merge

src/codegen.cpp

Keno · 2024-02-10T22:54:28Z

jinx, I also just rebased this locally ;). Let's see if we ended up in the same place ;).

Keno · 2024-02-10T22:56:28Z

jinx, I also just rebased this locally ;). Let's see if we ended up in the same place ;).

No diff modulo whitespace, so I'll just switch to this.

Keno · 2024-02-10T23:13:09Z

There's some more cleanup required here post-rebase. Are you working on that, or should I go ahead?

vchuravy · 2024-02-10T23:16:07Z

Are you working on that, or should I go ahead?

Go ahead :)

vtjnash · 2024-02-12T19:54:06Z

base/reflection.jl

@@ -1314,7 +1314,7 @@ uncompressed_ir(m::Method) = isdefined(m, :source) ? _uncompressed_ir(m, m.sourc
                             isdefined(m, :generator) ? error("Method is @generated; try `code_lowered` instead.") :
                             error("Code for this Method is not available.")
 _uncompressed_ir(m::Method, s::CodeInfo) = copy(s)
-_uncompressed_ir(m::Method, s::String) = ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any), m, C_NULL, s)::CodeInfo
+_uncompressed_ir(m::Method, s::String) = ccall(:jl_uncompress_ir, Any, (Any, Any), m, s)::CodeInfo
 _uncompressed_ir(ci::Core.CodeInstance, s::String) = ccall(:jl_uncompress_ir, Any, (Any, Any, Any), ci.def.def::Method, ci, s)::CodeInfo


Might be much safer not to change this API, and just ignore the extra argument. I want to be able to use it in the future to improve compression ratios anyways

I'll keep the change to the C API, but I'll leave the julia side API.

That seems like unnecessary code churn. Can we just not include either change?

What's the point of having C API with unused arguments. External packages shouldn't be calling this directly anyway. If they are that's a bug.

vtjnash · 2024-02-12T20:05:44Z

base/compiler/typeinfer.jl

+        # Even if we already have a CI for this, it's possible that the new CI has more
+        # information (E.g. because the source was limited before, but is no longer - this
+        # happens during bootstrap). In that case, allow the result to be recached.
+        if result.src === nothing || (ci.inferred !== nothing || ci.invoke != C_NULL)


Why these particular conditions? Is this what already_inferred_quick_test is already supposed to be checking?

already_inferred_quick_test handles the case where you're trying to cache the same MI twice in the same call to typeinf_ext (it just checks mi.inInference). However, this case is different. Here we check whether we need to upgrade the CodeInstance that's already in the cache to add source to it. This can happen because the last time we inferred it, we couldn't save the source because one of our (unused) body statements had a limited rettype. The condition is essentially ci_has_abi. I should probably replace it.

Given that both already_inferred_quick_test and the cache override here are both unique to the bootstrapping process, consolidating this logic within already_inferred_quick_test(interp, mi, result) might be beneficial?

I don't think the cache override is unique to the bootstrapping process. You can create the same situation at any time by first inferring something that will poison the the code and then it gets re-inferred when the execution actually reaches.

The pair of conditions though (ci.inferred !== nothing || ci.invoke != C_NULL) seems to be not something that has a well-defined meaning in a multithread program (e.g. it introduces a data-race here).

base/compiler/typeinfer.jl

src/opaque_closure.c

src/jitlayers.cpp

src/aotcompile.cpp

vtjnash · 2024-02-12T21:48:23Z

src/codegen.cpp

@@ -6804,7 +6806,7 @@ static Function* gen_cfun_wrapper(

    jl_codectx_t ctx(M->getContext(), params);
    ctx.f = cw;
-    ctx.world = world;
+    ctx.min_world = ctx.max_world = world;


It seems a little bit tricky to have world in both params and ctx, since it leads to odd cases like this. It might be easier to only have them in ctx?

~~You mean only have them in the params, since the ctx has a reference to that?~~

I see what you mean - will delete from the params struct.

src/codegen.cpp

vtjnash · 2024-02-19T18:52:48Z

Removing parent was a breaking change. It is why I asked for that to be put back before merging this, as it doesn't seem relevant to the rest of the change, so it should be very simple to revert that part of the change without impacting anything else.

Keno · 2024-02-19T19:01:52Z

It's an internal field like the rest of the fields removed in this PR. Cthulhu's reliance on it is a bug that we'll fix.

vtjnash · 2024-02-19T19:24:28Z

It is documented as being part of the API here:

julia/doc/src/devdocs/ast.md

Lines 723 to 725 in 02699bb

    
           * `parent` 
        
             The `MethodInstance` that "owns" this object (if applicable).

If it was a problem for other reasons, we could argue to make a breaking change to remove that, but as it is, it is simple to restore that field without affecting the rest of this PR.

Keno · 2024-02-19T19:34:05Z

That's developer documentation, which does not constrain the interface. We do need to update that documentation, both for this change and several others in the past few months.

The problem with that field is that there's no good semantic point at which to set it and that any user of the field is likely doing the wrong thing. Before this PR it was just being set at some random point in inference.

vtjnash · 2024-02-19T22:55:58Z

src/jitlayers.cpp

+                JL_GC_PUSH1(&forced_ci);
+                if (forced_ci) {
+                    // Force compile of this codeinst even though it already has an ->invoke
+                    _jl_compile_codeinst(forced_ci, NULL, *jl_ExecutionEngine->getContext());


If this codeinst is not in a cache, that means it is not legal to call this method, as this method pushes the object into the global debuginfo / JIT structs and later returns those, which can result in GC use-after free bugs

vtjnash · 2024-02-19T23:04:58Z

src/gf.c

+        int should_skip_inference = !jl_is_method(mi->def.method) || jl_symbol_name(mi->def.method->name)[0] == '@';
+
+        if (!should_skip_inference) {
+            codeinst = jl_type_infer(mi, world, 0, SOURCE_MODE_ABI);


This appears to have introduced a datarace on the mi->inInference field, which seems likely to lead to incorrect compilation and bad performance of threaded code. It was previously protected by the codegen lock.

vtjnash · 2024-02-20T18:48:43Z

base/compiler/typeinfer.jl

+"""
+function ci_has_abi(code::CodeInstance)
+    ci_has_source(code) && return true
+    return code.invoke !== C_NULL


These are monotonic loads, so you've introduced a data-race here also. I think you wanted an :acquire on one of these (may this one) to fix that, so that it must observe this being set. And also switch the order of the checks, so that it correctly pairs with the release store.

vtjnash · 2024-02-20T18:50:07Z

base/compiler/typeinfer.jl

+In such a case, it will first set the `invoke` field to a method that
+will block the thread until compilation is completed.


Indeed, the comment here mentions "first", which seems to intend to imply a happens-before, but the code doesn't have that

This drops the unnecessary breaking change from #53219 by re-adding the optional `parent` field to CodeInfo. A few months ago, I had actually already put together a branch that also fixed Keno's complaints about how it was not set reliably, so I copied that code here also, so that it should get set more reliably whenever a CodeInfo is associated with a MethodInstance (either because it called `retrieve_code_info` to get IR from the Method or `uncompress_ir` to get it from the inference cache). This does not entirely fix Cthulhu's test errors, since I don't see any particular reason to re-introduce the other fields (min_world, max_world, inferred, edges, method_for_inference_limit_heuristics) that got removed or are now set incorrectly, and most errors appear to be instead related to the `Expr(:boundscheck, true/false)` change. However, they could be trivially re-added back as placeholders, if someone encounters a broken package that still needs them.

vtjnash · 2024-02-22T14:48:46Z

base/compiler/typeinfer.jl

+            return CodeInstance(mi, cache_owner(interp), Any, Any, nothing, src, Int32(0),
+                get_inference_world(interp), get_inference_world(interp),
+                UInt32(0), UInt32(0), nothing, UInt8(0))


These values appear to be mostly set incorrectly now, which can lead to compile corruption (e.g. #51872), as well as now failing to handle the case when retrieve_code_info may fail and return nothing instead of CodeInfo. The old logic had a helper function jl_get_codeinst_for_src which used to handle this case, so that function could either be un-deleted, or that operation could be added as an alternative way to call the CodeInstance constructor, since it needs identical handling to how result would be treated later.

Suggested change

return CodeInstance(mi, cache_owner(interp), Any, Any, nothing, src, Int32(0),

get_inference_world(interp), get_inference_world(interp),

UInt32(0), UInt32(0), nothing, UInt8(0))

CodeInstance(interp, src, can_discard_trees=)

vtjnash · 2024-02-24T22:32:16Z

It looks like this causes some fairly extreme performance issues (up to 50x longer inference times), though curiously also sometimes provides up to a 5x speed up

When we use options like code coverage, we can't use the native code present in the cache file since it is not instrumented. PR #52123 introduced the capability of skipping the native code during loading, but created the issue that subsequent packages could have an explicit or implicit dependency on the native code. PR #53439 tainted the current process by setting `use_sysimage_native_code`, but this flag is propagated to subprocesses and lead to a regression in test time. Move this to a process local flag to avoid the regression. In the future we might be able to change the calling convention for cross-image calls to `invoke(ci::CodeInstance, args...)` instead of `ci.fptr(args...)` to handle native code not being present. --------- Co-authored-by: Jameson Nash <vtjnash@gmail.com>

Co-authored-by: Cody Tapscott <topolarity@tapscott.me> Co-authored-by: Shuhei Kadowaki <aviatesk@gmail.com>

…3219) The `CodeInfo` type is one of the oldest types in the system and has grown a bit of cruft. In particular, the `rettype`, `inferred`, `parent`, `edges`, `min_world`, `max_world` fields are not used for the original purpose of representing code, but for one or more of (in decreasing order of badness): 1. Smuggling extra results from inference into the compiler 2. Sumggling extra arguments into OpaqueClosure constructors 3. Passing extra information from generated functions to inference The first of these points in particular causes a fair bit of mixup between caching concerns and compiler concerns and results in external abstract interpreters maintainging their own dummy CodeInfos, just to comply with the interface. Originally, I just wanted to clean up that bit, but it didn't really make sense as a standalone piece, so this PR is more comprehensive. In particular, this PR: 1. Removes the `parent`, `inferred` and `rettype` fields of `CodeInfo`. They are largely vestigal and code accessing these is probably doing the wrong thing. They should instead be looking at either the CodeInstance or remembering the query that was asked of the cache in the first place. 2. Makes `edges`, `min_world` and `max_world` used for generated functions only. All other uses were replaced by appropriate queries on the CodeInstance. In particular, inference no longer sets these. In the future we may want to consider removing these also and having generated functions return some other object, but that is a topic to revisit once the broader compiler plugins landscape is more clear. 3. Makes the external type inference interface return `CodeInstance` rather than `CodeInfo`. This results in a lot of cleanup, because many functions had multiple code paths, some for CodeInstance and others for fallback to inference/CodeInfo. This is all cleaned up now. If you don't have a CodeInstance, you can ask inference for one. This CodeInstance may or may not be in the cache, but you can look at its types, compile it, etc. 4. Moves the main inference entrypoint out of the codegen library. There is still a little bit of entangelement, but this makes codegen much more of an independent system that you give a CodeInstance and it just fills in the invoke pointer for. With these changes, only the third use of the above mentioned fields remains. The overall theme here is decoupling. Over time, various parties have wanted to use the julia compiler with custom IR datastructure, backend code generators, caches, etc. This doesn't quite get us all the way there, but makes inference and codegen much more independent with a clear IR-format-independent interface (CodeInstance). --------- Co-authored-by: Valentin Churavy <v.churavy@gmail.com>

…ang#53393) This drops the unnecessary breaking change from JuliaLang#53219 by re-adding the optional `parent` field to CodeInfo. A few months ago, I had actually already put together a branch that also fixed Keno's complaints about how it was not set reliably, so I copied that code here also, so that it should get set more reliably whenever a CodeInfo is associated with a MethodInstance (either because it called `retrieve_code_info` to get IR from the Method or `uncompress_ir` to get it from the inference cache). This does not entirely fix Cthulhu's test errors, since I don't see any particular reason to re-introduce the other fields (min_world, max_world, inferred, edges, method_for_inference_limit_heuristics) that got removed or are now set incorrectly, and most errors appear to be instead related to the `Expr(:boundscheck, true/false)` change. However, they could be trivially re-added back as placeholders, if someone encounters a broken package that still needs them.

) Continuing the development in #53219, according to the plan in https://hackmd.io/@vtjnash/codeinstances, this further separates the meaning behind CodeInfo and CodeInstance, such that CodeInstance can only be used as a call target, and cannot be used for code generation, while CodeInfo can only be used to generate code (or for reflection on what code would be generated), and cannot be used as a call target. Basically, the eventual idea is that CodeInfo will only show up now as an input (e.g. for doing inlining or codegen) and is ephemeral, while a CodeInstance is what shows up in a cache (e.g. as a callable object).

This is the best way I could think of to determine whether an OC contains inferred code after JuliaLang/julia#53219. Also deletes a fastpath we had for OCs containing inferred code in `maybe_evaluate_builtin` as it doesn't really seem necessary and we would otherwise scan the IR twice for OCs we do want to interpret.

Keno force-pushed the kf/codeinforefactor branch from 9371cb3 to d0e03c7 Compare February 6, 2024 21:22

vchuravy reviewed Feb 6, 2024

View reviewed changes

src/julia_internal.h Show resolved Hide resolved

vchuravy reviewed Feb 6, 2024

View reviewed changes

base/compiler/typeinfer.jl Show resolved Hide resolved

vchuravy force-pushed the kf/codeinforefactor branch from d0e03c7 to 1a1de9c Compare February 10, 2024 22:52

vchuravy reviewed Feb 10, 2024

View reviewed changes

src/codegen.cpp Outdated Show resolved Hide resolved

Keno mentioned this pull request Feb 11, 2024

Allow CodeInstance in Expr(:invoke) #52797

Closed

vchuravy mentioned this pull request Feb 12, 2024

allow external absint to hold custom data in codeinst.inferred #53300

Merged

Keno mentioned this pull request Feb 12, 2024

Use tagged CodeInstances for AbstractInterpreter caching #52233

Merged

4 tasks