-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save actual machine code in precompile files #30488
Comments
This has been discussed in various issues. One challenge is that a lot of generated code not a single package is involved but various. So it has to work differently. PackagePrecompiler is a testbed for this. |
The PackageCompiler is different - since the whole system is compiled in one go - and you don't get to cache machine code for external packages in addition. This feature is extremely difficult to implement. |
Could you please provide an explanation? |
One factor to consider here is that a lot of time is actually spent re-compiling code, not just compiling it once. When you load packages that add methods to various low-level functions it can invalidate existing native code (since that code was compiled assuming those new methods don't exist). A lot of code also inherently involves multiple packages. For example, maybe we can compile and save some code for FixedPointNumbers and GenericLinearAlgebra, but where do we put the code for linear algebra of fixed-point matrices? Such code would not exist and not need to exist until somebody loads both packages and uses them together. There are various mechanical difficulties to work out. For one, it's not clear which code to assign to a particular package. For example, maybe loading package So while this is possible, we might decide it's not necessarily the best way to improve latency in terms of cost and benefit. For example, a combination of (1) using multiple cores to compile and (2) using standard tiered JIT techniques where we run more things in an interpreter first might work better. Try running julia with |
Have you considered my suggestion for “Context Dispatch” dispatching based on the caller module, and storing the code in the “lowest” module down the call tree that can resolve the call. In the second example it would belong to Base because both types and the generic function + are defined there. |
Maybe another option would be to move to a model where precompile happens per environment? And then machine code for everything in that environment gets stored? And whenever one makes a change to the environment, all of that gets updated (or potentially updated, if needed). So essentially say the That would slow down package operations, but it might help with these complicated package interaction questions? |
Similar question appeared independently on Slack #helpdesk yesterday:
|
"just" 😂 |
I have some problems understanding the comment "just". Obviously (to you) it's not straight forward to reuse already compiled code. And you give some examples ("it can invalidate existing native code") above. |
Suppose module A has a single function |
Yea So it was me asking that question. So to the comment of indeed there being possibilities of things being redefined and whatnot makes complete sense. Hence there is the |
I think the minimal change in |
Why not open this issue to be discussed with the community? core devs share your direction of thoughts and listen to the feedback from the supporters of the language. I addressed these problems in the "Context Dispatch" idea where the method table of a function is determined by the calling function scope .. all the way down in the call tree. Once I ready the "Context Dispatch" POC for Julia 1.0 I will post an issue asking for "problems" with saving Jitted code , and for each MWE of a problem supply a MWE of a solution. |
Does the Julia Github not count as being "open to the community"? I'm pretty sure the both of us are not "core devs", yet we're still able to comment on this issue 😄
Are you referring to the idea you had previously described here? If so, it seemed like @vtjnash was not convinced that your "Context Dispatch" approach was necessary for saving and loading generated native code. Both PackageCompiler.jl and the sysimg (sys.so) are pretty good indicators of Julia's ability to save and load native code. I think what would help here is if someone would write a package/patch that causes all (realistically most/some) JIT'd code to be written to disk, and automatically re-loaded when the correct conditions are met in a fresh Julia session. That way we'll be able to get a feel for whether saving all of this extra generated code is beneficial at all, and additionally how difficult it might be to pull this off in general. |
PackageCompiler is a different thing, it is aimed at AOT compilation and is not easily useful as part of an on going development process, and I say that from past experience. What I am aiming it, is the issue of caching reliably jitted code on the module level. and dynamically loading it when the module loads. As Jeff pointed out the problem is not the caching itself , the problem is that the cache is too easily invalidated, according the the current set of dispatch rules. |
I see this effort stopped. |
This is beyond the purview of Revise. However, some things have changed: in more recent Julia versions (and particularly the in-development Julia 1.6) there will be a lot less invalidation. So little (at least for many packages) that I don't think it's a serious obstacle anymore. The others obstacles still remain, AFAIK. |
Thank you for the answer Tim! What do you think, is it possible to list all the obstacles? |
This issue is about caching native code in Jeff listed the other obstacles to caching native code very nicely above. |
Yeah, sorry, I didn't want to change the subject. I misunderstanded the whole thing because from an outsider view Revise looked like a "code cache that interactively update with patching" which was so close to caching and updating native code between sessions. |
On the caching issue; since invalidations will soon be in much better shape, should we talk about the remaining obstacle?
Question: can the answer depend on circumstance? Specifically, what would happen if two different packages end up stashing the native code for the same method, is there anything particularly bad that happens? I can imagine two strategies:
|
I now think that the fundamental concerns raised in #30488 (comment) are largely moot:
To show that this is a reality, a useful example is JuliaImages/ImageFiltering.jl#201, in which That said, I should acknowledge that there are currently some weak links that prevent full realization of this scheme (I'll file issues). But these are likely to be specific technical points and not difficult conceptual issues. For many packages, it seems that the conceptual barriers are basically gone, and it's "just" a matter of someone investing the time needed to implement caching of native code. |
Does it help this problem that in 1.8, code for other packages can now be precompiled by the consumer? It seems to my naive eye that this addressed the same sort of issue of "where to stash code" |
I think #47184 would fix this? |
Essentially, store whatever is stored in a sysimage with user packages compiled into it in the standard precompile files.
I would assume that this, in combination with #30487, would go a very long way to make the interactive REPL experience of julia competitive.
I know that the core team has been thinking about this, and I did look for an existing issues that tracks this, but couldn't find any. So, I'm mainly creating the issue so that it can be assigned to a milestone and that I can follow progress :) If this is a duplicate (which I really had expected) and I just didn't find the original, please close.
The text was updated successfully, but these errors were encountered: