RFC: Do not consider inlining cost for precompilation #53547

sjkelly · 2024-03-01T19:33:38Z

This helps to minimize the amount of failed precompiles that occur by eliding the inlining cost check. It is likely this will lead to slightly larger sys/pkgimg sizes. However in practice it leads to more extensive and successful precompilation, which can minimize the number of JIT events which occur when using Julia.

sjkelly · 2024-03-01T20:08:10Z

This does grow the sysimg a considerable amount:

Master:
-rwxrwxr-x 1 sjkelly sjkelly 139M Mar 1 19:51 sys.so
PR:
-rwxrwxr-x 1 sjkelly sjkelly 158M Mar 1 20:01 sys.so

However, I would argue, reliable precompilation is somewhat important. Is it possible this threshold could be adjusted instead?

mbauman · 2024-03-01T20:32:10Z

What are the cases where you need precompilation for an inlineable function but aren't calling it from a context where it's been inlined?

sjkelly · 2024-03-01T20:53:05Z

What are the cases where you need precompilation for an inlineable function but aren't calling it from a context where it's been inlined?

I don't know. The behavior I am seeing seems to imply inlining is not strictly a function of the callee complexity, but also the caller.

For example under the following contrived workload (run twice to allow downloads/resolve) I see:

$ ./julia --trace-compile=precomp_mast.txt -e 'using Pkg; Pkg.activate(); Pkg.add("Plots"); Pkg.rm("Plots")'

$ wc -l precomp_mast.txt 
136 precomp_mast.txt

$ ./julia --trace-compile=precomp_pr.txt -e 'using Pkg; Pkg.activate(); Pkg.add("Plots"); Pkg.rm("Plots")'

$ wc -l precomp_pr.txt 
96 precomp_pr.txt

precomp_mast.txt
precomp_pr.txt

$ diff precomp_mast.txt precomp_pr.txt 
10,11d9
< precompile(Tuple{typeof(Base.append!), Array{String, 1}, Array{String, 1}})
< precompile(Tuple{typeof(Base.join), Array{String, 1}, Char})
31d28
< precompile(Tuple{typeof(Pkg.API.add), String})
35d31
< precompile(Tuple{typeof(Base.copy), Base.EnvDict})
37,38d32
< precompile(Tuple{typeof(Base.get), Base.Dict{String, String}, String, String})
< precompile(Tuple{typeof(Base.getindex), Base.Dict{String, String}, String})
40,41d33
< precompile(Tuple{Type{Base.Cmd}, Array{String, 1}})
< precompile(Tuple{Type{NamedTuple{(:env,), T} where T<:Tuple}, Tuple{Base.Dict{String, String}}})
51,53d42
< precompile(Tuple{Type{NamedTuple{(:stale_age,), T} where T<:Tuple}, Tuple{Int64}})
< precompile(Tuple{typeof(Base.pairs), NamedTuple{(:stale_age,), Tuple{Int64}}})
< precompile(Tuple{typeof(Base.merge), NamedTuple{(), Tuple{}}, Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:stale_age,), Tuple{Int64}}}})
56d44
< precompile(Tuple{typeof(Base.join), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Array{Char, 1}})
58d45
< precompile(Tuple{typeof(Base.convert), Type{Any}, Any})
61,62d47
< precompile(Tuple{typeof(Base._str_sizehint), Base.SubString{String}})
< precompile(Tuple{typeof(Base.print), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Base.SHA1})
64,66d48
< precompile(Tuple{typeof(Base.similar), Array{String, 1}})
< precompile(Tuple{typeof(Base.Iterators.enumerate), Array{String, 1}})
< precompile(Tuple{typeof(Base.setindex!), Array{String, 1}, String, Int64})
68d49
< precompile(Tuple{typeof(Base.hash), Tuple{UInt64}, UInt64})
70d50
< precompile(Tuple{typeof(Base.close), FileWatching.FDWatcher})
73d52
< precompile(Tuple{typeof(Base.hash), Tuple{String}, UInt64})
78,79d56
< precompile(Tuple{typeof(Base.length), Array{Int64, 1}})
< precompile(Tuple{typeof(Base.length), Base.BitArray{1}})
83,84d59
< precompile(Tuple{typeof(Base.iterate), Base.Generator{Base.Iterators.Filter{Pkg.Resolve.var"#86#88"{Array{Int64, 1}, Base.BitArray{1}}, Base.UnitRange{Int64}}, Pkg.Resolve.var"#85#87"{Array{Base.BitArray{2}, 1}, Array{Int64, 1}, Array{Base.BitArray{1}, 1}, Int64}}})
< precompile(Tuple{typeof(Base.iterate), Base.Generator{Base.Iterators.Filter{Pkg.Resolve.var"#86#88"{Array{Int64, 1}, Base.BitArray{1}}, Base.UnitRange{Int64}}, Pkg.Resolve.var"#85#87"{Array{Base.BitArray{2}, 1}, Array{Int64, 1}, Array{Base.BitArray{1}, 1}, Int64}}, Int64})
92d66
< precompile(Tuple{typeof(Base.isempty), Base.Dict{String, Base.UUID}})
95d68
< precompile(Tuple{typeof(Base.push!), Array{Base.Dict{String, Any}, 1}, Base.Dict{String, Any}})
110d82
< precompile(Tuple{typeof(Base.isempty), Base.Dict{String, Union{Array{String, 1}, String}}})
116d87
< precompile(Tuple{typeof(Base.isempty), Base.Dict{String, String}})
120,122d90
< precompile(Tuple{typeof(Base.similar), Array{Any, 1}})
< precompile(Tuple{typeof(Base.Iterators.enumerate), Array{Any, 1}})
< precompile(Tuple{Type{GenericMemory{:not_atomic, String, Core.AddrSpace{Core}(0x00)}}, UndefInitializer, Int64})
124d91
< precompile(Tuple{typeof(Base.print), Base.IOContext{Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}}, Char})
126,132d92
< precompile(Tuple{typeof(Base.issorted), Array{String, 1}, Base.Order.ReverseOrdering{Base.Order.ForwardOrdering}})
< precompile(Tuple{typeof(Pkg.MiniProgressBars.start_progress), Pkg.UnstableIO, Pkg.MiniProgressBars.MiniProgressBar})
< precompile(Tuple{typeof(Pkg.MiniProgressBars.end_progress), Pkg.UnstableIO, Pkg.MiniProgressBars.MiniProgressBar})
< precompile(Tuple{typeof(Base._str_sizehint), UInt64})
< precompile(Tuple{typeof(Base.print), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, UInt64})
< precompile(Tuple{typeof(Base.haskey), Base.Dict{String, Any}, String})
< precompile(Tuple{typeof(Base.sizeof), Symbol})

sjkelly · 2024-03-01T20:58:26Z

Looks like the tests would need to be changed with this as well: https://buildkite.com/julialang/julia-master/builds/34254#018dfb82-9e51-4c92-a655-9d43f9b2dd9b/910-1671

This helps to minimize the amount of failed precompiles that occur by eliding the inlining cost check. It is likely this will lead to slightly larger sys/pkgimg sizes. However in practice it leads to more extensive and successful precompilation, which can minimize the number of JIT events which occur when using Julia.

gbaraldi · 2024-03-04T13:44:47Z

The cases where this matters is that you have a dynamic dispatch that then dispatches to one of these functions that always gets inlined.

mbauman · 2024-03-04T14:20:41Z

Right — that was my assumption and at the root of my question. If there's dynamic dispatch, how does Julia know the argument types in order to do the precompile? I suppose there must be other — type stable — functions that happen to call (and then inline) these methods in other usages, so then the standalone versions happen to get precompiled on this branch?

sjkelly · 2024-03-04T19:22:50Z

The cases where this matters is that you have a dynamic dispatch that then dispatches to one of these functions that always gets inlined.

So if I am understanding correctly this is then indeed a bug in the current precompilation heuristic. Meaning there are such cases where something is in theory always able to be inlined, but is in fact not due to dynamic dispatch. Therefore this patch fixes the issue, but is inefficient because we exhaustively precompile every method instance, whereas we want something that knows if the calling context is a dynamic dispatch. This truth table may help articulate this better:

Inlinable?	In a dynamic dispatch context?	Gets precompiled?
Y	Y	N (should be Yes)
Y	N	Implicit
N	Y	Y

JeffBezanson · 2024-03-05T22:46:13Z

I suppose there must be other — type stable — functions that happen to call (and then inline) these methods in other usages, so then the standalone versions happen to get precompiled on this branch?

I think that's right, yes. It's basically using the fact that we had to infer and inline a function somewhere as a heuristic to find functions that are probably useful. Given that we had to analyze these functions anyway, it's nice to compile and save them. But I'm not sure it's really worth the extra space. Maybe we should enable this in compile=all mode though?

This helps to minimize the amount of missed precompiles that occur by eliding the inlining cost check when julia is run under `--compile=all`. This will lead to slightly larger sysimage sizes. However, in practice it leads to more extensive and successful precompilation, which can minimize the number of JIT events in call sites with dynamic dispatch. This is an alternative to JuliaLang#53547, where removing the inlining cost check was made universal. However, that lead to ~15% larger sysimage sizes by default. This implements Jeff's suggestion that this mode be enabled under `--compile=all`.

sjkelly · 2024-03-20T23:57:13Z

Replacing with #53798

This helps to minimize the amount of missed precompiles that occur by eliding the inlining cost check when julia is run under `--compile=all`. This will lead to slightly larger sysimage sizes. However, in practice it leads to more extensive and successful precompilation, which can minimize the number of JIT events in call sites with dynamic dispatch. This is an alternative to #53547, where removing the inlining cost check was made universal. However, that lead to ~15% larger sysimage sizes by default. This implements Jeff's suggestion that this mode be enabled under `--compile=all`.

…ang#53798) This helps to minimize the amount of missed precompiles that occur by eliding the inlining cost check when julia is run under `--compile=all`. This will lead to slightly larger sysimage sizes. However, in practice it leads to more extensive and successful precompilation, which can minimize the number of JIT events in call sites with dynamic dispatch. This is an alternative to JuliaLang#53547, where removing the inlining cost check was made universal. However, that lead to ~15% larger sysimage sizes by default. This implements Jeff's suggestion that this mode be enabled under `--compile=all`.

sjkelly force-pushed the sjk/precompheuristic2 branch from 361e211 to 650868e Compare March 3, 2024 16:17

sjkelly mentioned this pull request Mar 20, 2024

Ignore inlining costs for precompilation with --compile=all #53798

Merged

sjkelly closed this Mar 20, 2024

kpamnany mentioned this pull request May 15, 2025

Ignore inlining costs for precompilation with --compile=all (#53798) RelationalAI/julia#233

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Do not consider inlining cost for precompilation #53547

RFC: Do not consider inlining cost for precompilation #53547

Uh oh!

sjkelly commented Mar 1, 2024

Uh oh!

sjkelly commented Mar 1, 2024

Uh oh!

mbauman commented Mar 1, 2024

Uh oh!

sjkelly commented Mar 1, 2024

Uh oh!

sjkelly commented Mar 1, 2024

Uh oh!

gbaraldi commented Mar 4, 2024

Uh oh!

mbauman commented Mar 4, 2024 •

edited

Loading

Uh oh!

sjkelly commented Mar 4, 2024 •

edited

Loading

Uh oh!

JeffBezanson commented Mar 5, 2024

Uh oh!

sjkelly commented Mar 20, 2024

Uh oh!

Uh oh!

Uh oh!

RFC: Do not consider inlining cost for precompilation #53547

RFC: Do not consider inlining cost for precompilation #53547

Uh oh!

Conversation

sjkelly commented Mar 1, 2024

Uh oh!

sjkelly commented Mar 1, 2024

Uh oh!

mbauman commented Mar 1, 2024

Uh oh!

sjkelly commented Mar 1, 2024

Uh oh!

sjkelly commented Mar 1, 2024

Uh oh!

gbaraldi commented Mar 4, 2024

Uh oh!

mbauman commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjkelly commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JeffBezanson commented Mar 5, 2024

Uh oh!

sjkelly commented Mar 20, 2024

Uh oh!

Uh oh!

mbauman commented Mar 4, 2024 •

edited

Loading

sjkelly commented Mar 4, 2024 •

edited

Loading