Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for precompilation support #1549

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Add tests for precompilation support #1549

wants to merge 4 commits into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Jun 20, 2024

This currently crashes (Julia 1.11-beta2) with:

vchuravy@odin ~/s/Enzyme (main) [1]> julia +beta --project=. test/precompile.jl

[118024] signal 11 (1): Segmentation fault
in expression starting at none:1
unknown function (ip: 0x766bca6cca30)
macro expansion at /home/vchuravy/src/Enzyme/src/compiler.jl:6587 [inlined]
enzyme_call at /home/vchuravy/src/Enzyme/src/compiler.jl:6188 [inlined]
CombinedAdjointThunk at /home/vchuravy/src/Enzyme/src/compiler.jl:6065 [inlined]
autodiff at /home/vchuravy/src/Enzyme/src/Enzyme.jl:309 [inlined]
autodiff at /home/vchuravy/src/Enzyme/src/Enzyme.jl:321
Allocations: 1322902 (Pool: 1322821; Big: 81); GC: 2
fish: Job 1, 'julia +beta --project=. test/pr…' terminated by signal SIGSEGV (Address boundary error)

This is due to the @generated function thunk that illegally captures World and returns a constant inferred
Thunk(fptr) where fptr is only valid within the current session.

Copy link
Contributor

github-actions bot commented Jun 20, 2024

Benchmark Results

main 6f73850... main/6f73850e5401f8...
basics/overhead 4.03 ± 0.01 ns 4.34 ± 0.01 ns 0.928
time_to_load 0.452 ± 0.0012 s 0.457 ± 0.0039 s 0.99

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@wsmoses
Copy link
Member

wsmoses commented Jun 22, 2024

@vchuravy just curious -- if we use the inlineabi does it crash? [since then it becomes an llvmcall -- tho ofc all the constants need to be remapped]

@vchuravy
Copy link
Member Author

diff --git a/test/precompile.jl b/test/precompile.jl
index e7a9a852..3c6e0830 100644
--- a/test/precompile.jl
+++ b/test/precompile.jl
@@ -42,8 +42,9 @@ precompile_test_harness("Inference caching") do load_path
 
         @setup_workload begin
             @compile_workload begin
-                autodiff(Reverse, mul, Active, Active(1.0), Active(2.0))
-                autodiff(Forward, mul, Duplicated, Duplicated(1.0, 1.0), Const(2.0))
+                autodiff(ReverseMode{false,InlineABI,false}(), mul, Active, Active(1.0), Active(2.0))
+                # autodiff(Reverse, mul, Active, Active(1.0), Active(2.0))
+                # autodiff(Forward, mul, Duplicated, Duplicated(1.0, 1.0), Const(2.0))
             end
         end
     end) |> string)
@@ -53,7 +54,8 @@ precompile_test_harness("Inference caching") do load_path
         using InferenceCaching
         using Enzyme
 
-        autodiff(Reverse, InferenceCaching.mul, Active, Active(1.0), Active(2.0))
-        autodiff(Forward, InferenceCaching.mul, Duplicated, Duplicated(1.0, 1.0), Const(2.0))
+        @test autodiff(ReverseMode{false,InlineABI,false}(), InferenceCaching.mul, Active, Active(1.0), Active(2.0)) == ((2.0, 1.0),)
+        # autodiff(Reverse, InferenceCaching.mul, Active, Active(1.0), Active(2.0))
+        # autodiff(Forward, InferenceCaching.mul, Duplicated, Duplicated(1.0, 1.0), Const(2.0))
     end
 end
\ No newline at end of file

Does past this simple test

@wsmoses
Copy link
Member

wsmoses commented Jun 23, 2024

Okay awesome, we should fix the generated stuff to be relocatable, but this at least unblocks the next part.

Which I presume is needing to rewrite all the constant int to ptr's to be some relocatable address?

How does Julia do that normally atm

@wsmoses
Copy link
Member

wsmoses commented Jun 23, 2024

Also relatedly, can we use the inlineabi stuff on a temporary function to force Enzyme precompilation to compile all the interal utilities used by Enzyme [as otherwise presently the first use of an autodiff is quite slow compiling the inside of Enzyme]

@vchuravy
Copy link
Member Author

Also relatedly, can we use the inlineabi stuff on a temporary function to force Enzyme precompilation to compile all the interal utilities used by Enzyme [as otherwise presently the first use of an autodiff is quite slow compiling the inside of Enzyme]

Yes, but sadly it crashes:

[41147] signal 11 (128): Segmentation fault
in expression starting at /home/vchuravy/src/Enzyme/src/Enzyme.jl:1316
_ZN4llvm11PassBuilderC1EPNS_13TargetMachineENS_21PipelineTuningOptionsESt8optionalINS_10PGOOptionsEEPNS_28PassInstrumentationCallbacksE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
LLVMCreatePassBuilder at /workspace/srcdir/LLVM.jl/deps/LLVMExtra/lib/NewPM.cpp:149
LLVMCreatePassBuilder at /home/vchuravy/.julia/packages/LLVM/6cDbl/lib/16/libLLVM_extra.jl:531 [inlined]
create_passbuilder_internal at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/newpm/passbuilder.jl:14 [inlined]
PassBuilder at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/newpm/passbuilder.jl:30 [inlined]
PassBuilder at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/newpm/passbuilder.jl:25 [inlined]
macro expansion at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/base.jl:96 [inlined]
prop_julia_addr at /home/vchuravy/src/Enzyme/src/compiler/optimize.jl:75
unknown function (ip: 0x75d81e98a372)
function_pass_callback at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/pass.jl:49
unknown function (ip: 0x75d81e9e11d4)
runOnFunction at /workspace/srcdir/LLVM.jl/deps/LLVMExtra/lib/Core.cpp:146
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
LLVMRunPassManager at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
LLVMRunPassManager at /home/vchuravy/.julia/packages/LLVM/6cDbl/lib/16/libLLVM.jl:3351 [inlined]
run! at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/passmanager.jl:39 [inlined]
#28502 at /home/vchuravy/src/Enzyme/src/compiler/optimize.jl:2004
#ModulePassManager#51 at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/passmanager.jl:33
ModulePassManager at /home/vchuravy/.julia/packages/LLVM/6cDbl/src/passmanager.jl:30 [inlined]
optimize! at /home/vchuravy/src/Enzyme/src/compiler/optimize.jl:1930 [inlined]
#codegen#28574 at /home/vchuravy/src/Enzyme/src/compiler.jl:5754
codegen at /home/vchuravy/src/Enzyme/src/compiler.jl:5110 [inlined]
_thunk at /home/vchuravy/src/Enzyme/src/compiler.jl:6639
_thunk at /home/vchuravy/src/Enzyme/src/compiler.jl:6639 [inlined]
cached_compilation at /home/vchuravy/src/Enzyme/src/compiler.jl:6677 [inlined]
#28621 at /home/vchuravy/src/Enzyme/src/compiler.jl:6746
#JuliaContext#152 at /home/vchuravy/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:52
unknown function (ip: 0x75d81e8d7d13)
JuliaContext at /home/vchuravy/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:42
#s2003#28620 at /home/vchuravy/src/Enzyme/src/compiler.jl:6697 [inlined]
#s2003#28620 at ./none:0
GeneratedFunctionStub at ./boot.jl:708
jl_call_staged at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/method.c:581
ijl_code_for_staged at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/method.c:636
get_staged at ./compiler/utilities.jl:123
retrieve_code_info at ./compiler/utilities.jl:135 [inlined]
InferenceState at ./compiler/inferencestate.jl:494
typeinf_edge at ./compiler/typeinfer.jl:861
abstract_call_method at ./compiler/abstractinterpretation.jl:650
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:102
abstract_call_known at ./compiler/abstractinterpretation.jl:2182
abstract_call at ./compiler/abstractinterpretation.jl:2264
abstract_call at ./compiler/abstractinterpretation.jl:2257
abstract_call at ./compiler/abstractinterpretation.jl:2402
abstract_eval_call at ./compiler/abstractinterpretation.jl:2417
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2433
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2731
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:3022
typeinf_local at ./compiler/abstractinterpretation.jl:3300
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3382
_typeinf at ./compiler/typeinfer.jl:244
typeinf at ./compiler/typeinfer.jl:211
typeinf_edge at ./compiler/typeinfer.jl:871
abstract_call_method at ./compiler/abstractinterpretation.jl:650
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:102
abstract_call_known at ./compiler/abstractinterpretation.jl:2182
abstract_call at ./compiler/abstractinterpretation.jl:2264
abstract_apply at ./compiler/abstractinterpretation.jl:1672
abstract_call_known at ./compiler/abstractinterpretation.jl:2084
abstract_call at ./compiler/abstractinterpretation.jl:2264
abstract_call at ./compiler/abstractinterpretation.jl:2257
abstract_call at ./compiler/abstractinterpretation.jl:2402
abstract_eval_call at ./compiler/abstractinterpretation.jl:2417
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2433
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2731
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:3046
typeinf_local at ./compiler/abstractinterpretation.jl:3300
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3382
_typeinf at ./compiler/typeinfer.jl:244
typeinf at ./compiler/typeinfer.jl:211
typeinf_ext at ./compiler/typeinfer.jl:1043
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1081
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1077
jfptr_typeinf_ext_toplevel_38740.1 at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/julia.h:2154 [inlined]
jl_type_infer at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/gf.c:390
jl_generate_fptr_impl at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/jitlayers.cpp:511
jl_compile_method_internal at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/gf.c:2534 [inlined]
jl_compile_method_internal at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/gf.c:2421
_jl_invoke at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/gf.c:2938 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/gf.c:3123
macro expansion at /home/vchuravy/src/Enzyme/src/Enzyme.jl:1318 [inlined]
macro expansion at /home/vchuravy/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
macro expansion at /home/vchuravy/src/Enzyme/src/Enzyme.jl:1317 [inlined]
macro expansion at /home/vchuravy/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
top-level scope at /home/vchuravy/src/Enzyme/src/Enzyme.jl:1316
jl_toplevel_eval_flex at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_eval_module_expr at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:215 [inlined]
jl_toplevel_eval_flex at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:743
jl_toplevel_eval_flex at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:428 [inlined]
include_string at ./loading.jl:2543
_include at ./loading.jl:2603
include at ./Base.jl:558 [inlined]
include_package_for_output at ./loading.jl:2721
jfptr_include_package_for_output_69349.1 at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/julia.h:2154 [inlined]
do_call at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/interpreter.c:659
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/interpreter.c:817
jl_toplevel_eval_flex at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:428 [inlined]
include_string at ./loading.jl:2543
include_string at ./loading.jl:2553 [inlined]
exec_options at ./client.jl:316
_start at ./client.jl:526
jfptr__start_70826.1 at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/julia.h:2154 [inlined]
true_main at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x75d83613fc87)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 197283865 (Pool: 197277709; Big: 6156); GC: 17

The biggest issue is codegen_world_age, the observed world_age within the precompilation process has no relation to world ages in the loaded file. Internally Julia handles this during the deserialization/serialization process of world-age sensitive data structures like MethodInstance/CodeInstance.

Which I presume is needing to rewrite all the constant int to ptr's to be some relocatable address?

There are two stages here.

  1. Can we make the inference / AbstractInterpreter caches relocatable
  2. Can we cache the emitted LLVM IR/object code

The former is "just" normal Julia datastructures and as long as we don't cache pointers or constanst out of thin air, we should be fine. The biggest offender here is autodiff_deferred (motivation behind JuliaGPU/GPUCompiler.jl#582) and codegen_world_age generally any use of a @generated to create a constant must be audited.

Once we can handle that, the question arises for output of Enzyme and that is a bigger and more dificult problem.
This will likely need tigher integration with Julia through a compiler plugin of some sort or Julia base being able to query us during image emission for the IR.

@wsmoses
Copy link
Member

wsmoses commented Jun 24, 2024

So the big problem with worldage [which iw ould love to swap off of], is that you can't use a methodinstance in a val. This means that our generated code will be much more costly otherwise.

Could we somehow make a methodinstance as legal within a val?

autodiff(Reverse, mul, Active, Active(1.0), Active(2.0))
autodiff(Forward, mul, Duplicated, Duplicated(1.0, 1.0), Const(2.0))
autodiff(ReverseMode{false,InlineABI,false}(), mul, Active, Active(1.0), Active(2.0))
# Non-Inline mode uses `@generated` functions and poisons the caller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline mode also uses generated functions, but it uses an llvmcall not a call to custom jit

@vchuravy
Copy link
Member Author

vchuravy commented Jun 24, 2024

No we can't pass an MI to a generated function. What I think the system should look like is something like this:

mutable struct EnzymeThunk
   adjoint_TT # and other egal key data
   next::EnzymeThunk
   fptr
   # maybe CodeInstance to have a world-age range
end

primal_mi = method_instance(primal_TT)
thunks = enzyme_cache[primal_mi] # IDDict
thunk = linear_search(thunks, adjoint_TT)
thunk(adjoint_args...)

Julia can find the MI of a primal signature in about [12ns-80ns https://github.com/JuliaGPU/GPUCompiler.jl/pull/553#issuecomment-1989281301], and IDDict lookup adds another 12ns and a linear search is often under 4ns.

This is very similar to the cost of Julia dynamic dispatch, with the only optimization missing that Julia
use a call_cache based on the return address of the dynamic dispatch:

https://github.com/JuliaLang/julia/blob/696d9c3e691a613af9067c9a5d47558881354a55/src/gf.c#L3202

https://github.com/JuliaLang/julia/blob/696d9c3e691a613af9067c9a5d47558881354a55/src/gf.c#L3295

@wsmoses
Copy link
Member

wsmoses commented Jun 24, 2024

Is it possible to hook into Julia’s AI itself to be able to resolve the definition here.

Or perhaps maybe something something opaque closure.

Given that we get issues opened for “100x slowdown” when the gradient function has a dynamic dispatch I’m worried of the impact on downstream folks

@vchuravy
Copy link
Member Author

Given that we get issues opened for “100x slowdown” when the gradient function has a dynamic dispatch I’m worried of the impact on downstream folks

I don't particularly care about the performance argument here, for me this is first and foremost a correctness issue. Once we have a correct implementation we can evaluate performance (instead of speculating) and see if there is anything we can do.

Enzyme world-age handling is incorrect, it treats a range as a singular point and it's a runtime value, not a codegen constant one.

Long-term proper compiler plugin support in Julia may fix that, like the idea I sketched out in #1443. Inside the abstract interpreter we can correctly compute these properties. But that work stalled with me focusing on my thesis, so it is unlikely to land for 1.12

@wsmoses
Copy link
Member

wsmoses commented Jun 24, 2024

Maybe we can start by adding another "ABI" here which doesn't use the generated function so we can more easily test this out and A/B test things?

@vchuravy
Copy link
Member Author

This not part ABI call. This is before to lookup the thunk that we need to call, IIRC.

I would appreciate if you could take a stab at this. The code has grown quite complex over the years and I don't have the bandwidth right now to untangle it.

@wsmoses
Copy link
Member

wsmoses commented Jun 24, 2024

Yeah I'll take a stab at it late next week.

And ABI is just a placeholder type in the mode, so my thought is to overload autodiff with it to change going into the thunk [and thus not call the generated one].

That way we have both [and at any point we can change DeaultABI to point to whatever we want]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants