Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CodeInfo/CodeInstance separation and interfaces #53219

Merged
merged 15 commits into from
Feb 19, 2024
Merged

Conversation

Keno
Copy link
Member

@Keno Keno commented Feb 6, 2024

The CodeInfo type is one of the oldest types in the system and has grown a bit of cruft. In particular, the rettype, inferred, parent, edges, min_world, max_world fields are not used for the original purpose of representing code, but for one or more of (in decreasing order of badness):

  1. Smuggling extra results from inference into the compiler
  2. Sumggling extra arguments into OpaqueClosure constructors
  3. Passing extra information from generated functions to inference

The first of these points in particular causes a fair bit of mixup between caching concerns and compiler concerns and results in external abstract interpreters maintainging their own dummy CodeInfos, just to comply with the interface. Originally, I just wanted to clean up that bit, but it didn't really make sense as a standalone piece, so this PR is more comprehensive.

In particular, this PR:

  1. Removes the parent, inferred and rettype fields of CodeInfo. They are largely vestigal and code accessing these is probably doing the wrong thing. They should instead be looking at either the CodeInstance or remembering the query that was asked of the cache in the first place.

  2. Makes edges, min_world and max_world used for generated functions only. All other uses were replaced by appropriate queries on the CodeInstance. In particular, inference no longer sets these. In the future we may want to consider removing these also and having generated functions return some other object, but that is a topic to revisit once the broader compiler plugins landscape is more clear.

  3. Makes the external type inference interface return CodeInstance rather than CodeInfo. This results in a lot of cleanup, because many functions had multiple code paths, some for CodeInstance and others for fallback to inference/CodeInfo. This is all cleaned up now. If you don't have a CodeInstance, you can ask inference for one. This CodeInstance may or may not be in the cache, but you can look at its types, compile it, etc.

  4. Moves the main inference entrypoint out of the codegen library. There is still a little bit of entangelement, but this makes codegen much more of an independent system that you give a CodeInstance and it just fills in the invoke pointer for.

With these changes, only the third use of the above mentioned fields remains.

The overall theme here is decoupling. Over time, various parties have wanted to use the julia compiler with custom IR datastructure, backend code generators, caches, etc. This doesn't quite get us all the way there, but makes inference and codegen much more independent with a clear IR-format-independent interface (CodeInstance).

@Keno Keno force-pushed the kf/codeinforefactor branch from 9371cb3 to d0e03c7 Compare February 6, 2024 21:22
@vchuravy
Copy link
Member

vchuravy commented Feb 6, 2024

This will conflict with #52233 which I am hoping to merge in the next few days. I went through this PR and I didn't see anything to bad, only my unease about uncached CodeInstances :)

@aviatesk
Copy link
Member

aviatesk commented Feb 7, 2024

Planning to conduct a thorough review around tomorrow. On a broader note, maybe relevant to Valentin's PR as well, it might be worth discussing whether this change should be incorporated into v1.11. While I agree that external uses of the CodeInfo fields is usualy wrong, considering its anticipated widespread use within the community, giving some leeway for ecosystem updates could be preferred?

@Keno
Copy link
Member Author

Keno commented Feb 7, 2024

We're still before the feature freeze, so I think it makes sense to merge these kinds of internals-breaking changes. Otherwise, we'll merge it right after the feature freeze and we'll need to maintain both versions through the full RC period.

@vchuravy vchuravy force-pushed the kf/codeinforefactor branch from d0e03c7 to 1a1de9c Compare February 10, 2024 22:52
@vchuravy
Copy link
Member

Rebased onto master after #52233 merge

src/codegen.cpp Outdated Show resolved Hide resolved
@Keno
Copy link
Member Author

Keno commented Feb 10, 2024

jinx, I also just rebased this locally ;). Let's see if we ended up in the same place ;).

@Keno
Copy link
Member Author

Keno commented Feb 10, 2024

jinx, I also just rebased this locally ;). Let's see if we ended up in the same place ;).

No diff modulo whitespace, so I'll just switch to this.

@Keno
Copy link
Member Author

Keno commented Feb 10, 2024

There's some more cleanup required here post-rebase. Are you working on that, or should I go ahead?

@vchuravy
Copy link
Member

Are you working on that, or should I go ahead?

Go ahead :)

@@ -1314,7 +1314,7 @@ uncompressed_ir(m::Method) = isdefined(m, :source) ? _uncompressed_ir(m, m.sourc
isdefined(m, :generator) ? error("Method is @generated; try `code_lowered` instead.") :
error("Code for this Method is not available.")
_uncompressed_ir(m::Method, s::CodeInfo) = copy(s)
_uncompressed_ir(m::Method, s::String) = ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any), m, C_NULL, s)::CodeInfo
_uncompressed_ir(m::Method, s::String) = ccall(:jl_uncompress_ir, Any, (Any, Any), m, s)::CodeInfo
_uncompressed_ir(ci::Core.CodeInstance, s::String) = ccall(:jl_uncompress_ir, Any, (Any, Any, Any), ci.def.def::Method, ci, s)::CodeInfo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be much safer not to change this API, and just ignore the extra argument. I want to be able to use it in the future to improve compression ratios anyways

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep the change to the C API, but I'll leave the julia side API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like unnecessary code churn. Can we just not include either change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of having C API with unused arguments. External packages shouldn't be calling this directly anyway. If they are that's a bug.

@vtjnash vtjnash self-requested a review February 12, 2024 20:02
# Even if we already have a CI for this, it's possible that the new CI has more
# information (E.g. because the source was limited before, but is no longer - this
# happens during bootstrap). In that case, allow the result to be recached.
if result.src === nothing || (ci.inferred !== nothing || ci.invoke != C_NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these particular conditions? Is this what already_inferred_quick_test is already supposed to be checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already_inferred_quick_test handles the case where you're trying to cache the same MI twice in the same call to typeinf_ext (it just checks mi.inInference). However, this case is different. Here we check whether we need to upgrade the CodeInstance that's already in the cache to add source to it. This can happen because the last time we inferred it, we couldn't save the source because one of our (unused) body statements had a limited rettype. The condition is essentially ci_has_abi. I should probably replace it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that both already_inferred_quick_test and the cache override here are both unique to the bootstrapping process, consolidating this logic within already_inferred_quick_test(interp, mi, result) might be beneficial?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the cache override is unique to the bootstrapping process. You can create the same situation at any time by first inferring something that will poison the the code and then it gets re-inferred when the execution actually reaches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pair of conditions though (ci.inferred !== nothing || ci.invoke != C_NULL) seems to be not something that has a well-defined meaning in a multithread program (e.g. it introduces a data-race here).

src/jitlayers.cpp Outdated Show resolved Hide resolved
src/aotcompile.cpp Outdated Show resolved Hide resolved
src/aotcompile.cpp Outdated Show resolved Hide resolved
src/codegen.cpp Outdated
@@ -6804,7 +6806,7 @@ static Function* gen_cfun_wrapper(

jl_codectx_t ctx(M->getContext(), params);
ctx.f = cw;
ctx.world = world;
ctx.min_world = ctx.max_world = world;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little bit tricky to have world in both params and ctx, since it leads to odd cases like this. It might be easier to only have them in ctx?

Copy link
Member Author

@Keno Keno Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean only have them in the params, since the ctx has a reference to that?

I see what you mean - will delete from the params struct.

src/codegen.cpp Outdated Show resolved Hide resolved
@vtjnash
Copy link
Member

vtjnash commented Feb 19, 2024

Removing parent was a breaking change. It is why I asked for that to be put back before merging this, as it doesn't seem relevant to the rest of the change, so it should be very simple to revert that part of the change without impacting anything else.

@Keno
Copy link
Member Author

Keno commented Feb 19, 2024

It's an internal field like the rest of the fields removed in this PR. Cthulhu's reliance on it is a bug that we'll fix.

@vtjnash
Copy link
Member

vtjnash commented Feb 19, 2024

It is documented as being part of the API here:

* `parent`
The `MethodInstance` that "owns" this object (if applicable).

If it was a problem for other reasons, we could argue to make a breaking change to remove that, but as it is, it is simple to restore that field without affecting the rest of this PR.

@Keno
Copy link
Member Author

Keno commented Feb 19, 2024

That's developer documentation, which does not constrain the interface. We do need to update that documentation, both for this change and several others in the past few months.

The problem with that field is that there's no good semantic point at which to set it and that any user of the field is likely doing the wrong thing. Before this PR it was just being set at some random point in inference.

JL_GC_PUSH1(&forced_ci);
if (forced_ci) {
// Force compile of this codeinst even though it already has an ->invoke
_jl_compile_codeinst(forced_ci, NULL, *jl_ExecutionEngine->getContext());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this codeinst is not in a cache, that means it is not legal to call this method, as this method pushes the object into the global debuginfo / JIT structs and later returns those, which can result in GC use-after free bugs

int should_skip_inference = !jl_is_method(mi->def.method) || jl_symbol_name(mi->def.method->name)[0] == '@';

if (!should_skip_inference) {
codeinst = jl_type_infer(mi, world, 0, SOURCE_MODE_ABI);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to have introduced a datarace on the mi->inInference field, which seems likely to lead to incorrect compilation and bad performance of threaded code. It was previously protected by the codegen lock.

"""
function ci_has_abi(code::CodeInstance)
ci_has_source(code) && return true
return code.invoke !== C_NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are monotonic loads, so you've introduced a data-race here also. I think you wanted an :acquire on one of these (may this one) to fix that, so that it must observe this being set. And also switch the order of the checks, so that it correctly pairs with the release store.

Comment on lines +1043 to +1044
In such a case, it will first set the `invoke` field to a method that
will block the thread until compilation is completed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the comment here mentions "first", which seems to intend to imply a happens-before, but the code doesn't have that

vtjnash added a commit that referenced this pull request Feb 21, 2024
This drops the unnecessary breaking change from
#53219 by re-adding the optional
`parent` field to CodeInfo. A few months ago, I had actually already put
together a branch that also fixed Keno's complaints about how it was not
set reliably, so I copied that code here also, so that it should get set
more reliably whenever a CodeInfo is associated with a MethodInstance
(either because it called `retrieve_code_info` to get IR from the Method
or `uncompress_ir` to get it from the inference cache). This does not
entirely fix Cthulhu's test errors, since I don't see any particular
reason to re-introduce the other fields (min_world, max_world, inferred,
edges, method_for_inference_limit_heuristics) that got removed or are
now set incorrectly, and most errors appear to be instead related to the
`Expr(:boundscheck, true/false)` change. However, they could be
trivially re-added back as placeholders, if someone encounters a broken
package that still needs them.
Comment on lines +1114 to +1116
return CodeInstance(mi, cache_owner(interp), Any, Any, nothing, src, Int32(0),
get_inference_world(interp), get_inference_world(interp),
UInt32(0), UInt32(0), nothing, UInt8(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values appear to be mostly set incorrectly now, which can lead to compile corruption (e.g. #51872), as well as now failing to handle the case when retrieve_code_info may fail and return nothing instead of CodeInfo. The old logic had a helper function jl_get_codeinst_for_src which used to handle this case, so that function could either be un-deleted, or that operation could be added as an alternative way to call the CodeInstance constructor, since it needs identical handling to how result would be treated later.

Suggested change
return CodeInstance(mi, cache_owner(interp), Any, Any, nothing, src, Int32(0),
get_inference_world(interp), get_inference_world(interp),
UInt32(0), UInt32(0), nothing, UInt8(0))
CodeInstance(interp, src, can_discard_trees=)

@vtjnash
Copy link
Member

vtjnash commented Feb 24, 2024

It looks like this causes some fairly extreme performance issues (up to 50x longer inference times), though curiously also sometimes provides up to a 5x speed up

vtjnash referenced this pull request Feb 26, 2024
When we use options like code coverage, we can't use the native code
present in the cache file since it is not instrumented.

PR #52123 introduced the capability of skipping the native
code during loading, but created the issue that subsequent packages
could have an explicit or implicit dependency on the native code.

PR #53439 tainted the current process by setting
`use_sysimage_native_code`, but this flag is propagated to subprocesses
and lead to a regression in test time.

Move this to a process local flag to avoid the regression.
In the future we might be able to change the calling convention for
cross-image calls to `invoke(ci::CodeInstance, args...)` instead of
`ci.fptr(args...)` to handle native code not being present.

---------

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
aviatesk added a commit to JuliaDebug/Cthulhu.jl that referenced this pull request Mar 3, 2024
Co-authored-by: Cody Tapscott <topolarity@tapscott.me>
Co-authored-by: Shuhei Kadowaki <aviatesk@gmail.com>
tecosaur pushed a commit to tecosaur/julia that referenced this pull request Mar 4, 2024
…3219)

The `CodeInfo` type is one of the oldest types in the system and has
grown a bit of cruft. In particular, the `rettype`, `inferred`,
`parent`, `edges`, `min_world`, `max_world` fields are not used for the
original purpose of representing code, but for one or more of (in
decreasing order of badness):

1. Smuggling extra results from inference into the compiler
2. Sumggling extra arguments into OpaqueClosure constructors
3. Passing extra information from generated functions to inference

The first of these points in particular causes a fair bit of mixup
between caching concerns and compiler concerns and results in external
abstract interpreters maintainging their own dummy CodeInfos, just to
comply with the interface. Originally, I just wanted to clean up that
bit, but it didn't really make sense as a standalone piece, so this PR
is more comprehensive.

In particular, this PR:

1. Removes the `parent`, `inferred` and `rettype` fields of `CodeInfo`.
They are largely vestigal and code accessing these is probably doing the
wrong thing. They should instead be looking at either the CodeInstance
or remembering the query that was asked of the cache in the first place.

2. Makes `edges`, `min_world` and `max_world` used for generated
functions only. All other uses were replaced by appropriate queries on
the CodeInstance. In particular, inference no longer sets these. In the
future we may want to consider removing these also and having generated
functions return some other object, but that is a topic to revisit once
the broader compiler plugins landscape is more clear.

3. Makes the external type inference interface return `CodeInstance`
rather than `CodeInfo`. This results in a lot of cleanup, because many
functions had multiple code paths, some for CodeInstance and others for
fallback to inference/CodeInfo. This is all cleaned up now. If you don't
have a CodeInstance, you can ask inference for one. This CodeInstance
may or may not be in the cache, but you can look at its types, compile
it, etc.

4. Moves the main inference entrypoint out of the codegen library. There
is still a little bit of entangelement, but this makes codegen much more
of an independent system that you give a CodeInstance and it just fills
in the invoke pointer for.

With these changes, only the third use of the above mentioned fields
remains.

The overall theme here is decoupling. Over time, various parties have
wanted to use the julia compiler with custom IR datastructure, backend
code generators, caches, etc. This doesn't quite get us all the way
there, but makes inference and codegen much more independent with a
clear IR-format-independent interface (CodeInstance).

---------

Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
tecosaur pushed a commit to tecosaur/julia that referenced this pull request Mar 4, 2024
…ang#53393)

This drops the unnecessary breaking change from
JuliaLang#53219 by re-adding the optional
`parent` field to CodeInfo. A few months ago, I had actually already put
together a branch that also fixed Keno's complaints about how it was not
set reliably, so I copied that code here also, so that it should get set
more reliably whenever a CodeInfo is associated with a MethodInstance
(either because it called `retrieve_code_info` to get IR from the Method
or `uncompress_ir` to get it from the inference cache). This does not
entirely fix Cthulhu's test errors, since I don't see any particular
reason to re-introduce the other fields (min_world, max_world, inferred,
edges, method_for_inference_limit_heuristics) that got removed or are
now set incorrectly, and most errors appear to be instead related to the
`Expr(:boundscheck, true/false)` change. However, they could be
trivially re-added back as placeholders, if someone encounters a broken
package that still needs them.
maleadt added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request Apr 9, 2024
vtjnash added a commit that referenced this pull request Jun 6, 2024
)

Continuing the development in #53219, according to the plan in
https://hackmd.io/@vtjnash/codeinstances, this further separates the
meaning behind CodeInfo and CodeInstance, such that CodeInstance can
only be used as a call target, and cannot be used for code generation,
while CodeInfo can only be used to generate code (or for reflection on
what code would be generated), and cannot be used as a call target.
Basically, the eventual idea is that CodeInfo will only show up now as
an input (e.g. for doing inlining or codegen) and is ephemeral, while a
CodeInstance is what shows up in a cache (e.g. as a callable object).
simeonschaub added a commit to JuliaDebug/JuliaInterpreter.jl that referenced this pull request Sep 29, 2024
This is the best way I could think of to determine whether an OC
contains inferred code after JuliaLang/julia#53219. Also deletes a
fastpath we had for OCs containing inferred code in
`maybe_evaluate_builtin` as it doesn't really seem necessary and we
would otherwise scan the IR twice for OCs we do want to interpret.
simeonschaub added a commit to JuliaDebug/JuliaInterpreter.jl that referenced this pull request Sep 29, 2024
This is the best way I could think of to determine whether an OC
contains inferred code after JuliaLang/julia#53219. Also deletes a
fastpath we had for OCs containing inferred code in
`maybe_evaluate_builtin` as it doesn't really seem necessary and we
would otherwise scan the IR twice for OCs we do want to interpret.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants