Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for querying of build_id from objects #53943

Merged
merged 2 commits into from
Apr 13, 2024
Merged

Allow for querying of build_id from objects #53943

merged 2 commits into from
Apr 13, 2024

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Apr 3, 2024

For GPUCompiler we would like to support a native on disk cache of LLVM IR.
One of the longstanding issues has been the cache invalidation of such an on disk cache.

With #52233 we now have an integrated cache for the inference results and we can rely
on CodeInstance to be stable across sessions. Due to #52119 we can also rely on the
objectid to be stable.

My inital thought was to key the native disk cache in GPUCompiler on the objectid of
the corresponding CodeInstance (+ some compilation parameters).

While discussing this with @rayegun yesterday we noted that having a CodeInstance with
the same objectid might not be enough provenance. E.g we are not gurantueed that the
CodeInstance is from the same build artifact and the same precise source code.

For the package images we are tracking this during loading and validate all contents
at once, and we keep explicitly track of the provenance chain.

This PR adds a lookup up table where we map from "external_blobs" e.g. loaded images,
to the corresponding top module of each image, and uses this to determine the
build_id of the package image.

Objects that are not perma allocated are mapped to Main.
Main itself is a bit weird:

julia> @ccall jl_object_in_image(Main::Any)::UInt8
0x01

julia> Base.object_build_id(Main)
0xfdfcfbfa451e371b0000195de4fe8b4b

julia> Base.module_build_id(Main)
0xfdfcfbfa451e371b000018c746ebb6a2

julia> Base.module_build_id(Base)
0xfdfcfbfa451e371b0000195de4fe8b4b

julia> Base.object_build_id(Base)
0xfdfcfbfa451e371b0000195de4fe8b4b

julia> Base.Main == Main
true

So Main is itself perma-allocated through Base and thus the sysimage.

julia> @ccall jl_istopmod(Main::Module)::UInt8
0x00

julia> @ccall jl_istopmod(Base::Module)::UInt8
0x01

Main itself is not a topmod. So maybe it is not correct module to return for
runtime allocated objects.

Edit: I changed this to return nothing for runtime allocated objects.

vchuravy added 2 commits April 3, 2024 11:06
Julia objects can be perma-allocated in package images,
each package image has a corresponding top module.

We currently can map Julia objects to loaded image,
but we don't keep track of the corresponding top module.

This can be useful to ask for the build_id of the package
image we are using.

Non perma-allocated objects are mapped to `Main`.
@vchuravy vchuravy added the backport 1.11 Change should be backported to release-1.11 label Apr 3, 2024
@vchuravy vchuravy requested a review from vtjnash April 3, 2024 15:22
@KristofferC KristofferC mentioned this pull request Apr 9, 2024
41 tasks
@vchuravy vchuravy merged commit d47cbf6 into master Apr 13, 2024
9 checks passed
@vchuravy vchuravy deleted the vc/topmods branch April 13, 2024 01:22
@KristofferC KristofferC mentioned this pull request Apr 17, 2024
59 tasks
vchuravy added a commit that referenced this pull request Apr 19, 2024
For GPUCompiler we would like to support a native on disk cache of LLVM
IR.
One of the longstanding issues has been the cache invalidation of such
an on disk cache.

With #52233 we now have an integrated cache for the inference results
and we can rely
on `CodeInstance` to be stable across sessions. Due to #52119 we can
also rely on the
`objectid` to be stable.

My inital thought was to key the native disk cache in GPUCompiler on the
objectid of
the corresponding CodeInstance (+ some compilation parameters).

While discussing this with @rayegun yesterday we noted that having a
CodeInstance with
the same objectid might not be enough provenance. E.g we are not
gurantueed that the
CodeInstance is from the same build artifact and the same precise source
code.

For the package images we are tracking this during loading and validate
all contents
at once, and we keep explicitly track of the provenance chain.

This PR adds a lookup up table where we map from "external_blobs" e.g.
loaded images,
to the corresponding top module of each image, and uses this to
determine the
build_id of the package image.

(cherry picked from commit d47cbf6)
@vchuravy vchuravy removed the backport 1.11 Change should be backported to release-1.11 label Apr 19, 2024
KristofferC pushed a commit that referenced this pull request May 27, 2024
@KristofferC
Copy link
Member

KristofferC commented May 27, 2024

The tests for this fails on the 1.11 branch (https://buildkite.com/julialang/julia-release-1-dot-11/builds/80#018fa5f5-1b9e-41e8-8330-0cfca9f5128c):

Test Failed at /cache/build/tester-amdci5-15/julialang/julia-release-1-dot-11/julia-5b15ed75d0/share/julia/test/precompile.jl:1804
  Expression: Base.module_build_id(CustomAbstractInterpreterCaching) == Base.object_build_id(ci)
   Evaluated: 0xfafbfcfd7d4060d4000ad28fa1855dfc == nothing

so I have reverted this PR on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants