Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM] Large code model on AArch64 broken with RTDyld #42295

Closed
anandijain opened this issue Sep 17, 2021 · 15 comments · Fixed by #49745
Closed

[LLVM] Large code model on AArch64 broken with RTDyld #42295

anandijain opened this issue Sep 17, 2021 · 15 comments · Fixed by #49745
Labels
bug Indicates an unexpected problem or unintended behavior compiler:llvm For issues that relate to LLVM system:arm ARMv7 and AArch64 upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@anandijain
Copy link
Contributor

see SciML/OrdinaryDiffEq.jl#1493
I wasn't able to find a smaller reproducer and couldn't find any related issues here

environment

julia> versioninfo()
Julia Version 1.7.0-rc1
Commit 9eade6195e (2021-09-12 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin20.5.0)
  CPU: Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, cyclone)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_EDITOR = code

(MTK) pkg> st
     Project MTK v0.1.0
      Status `~/.julia/dev/MTK/Project.toml`
  [479239e8] Catalyst v9.0.0
  [82cc6244] DataInterpolations v3.6.0
  [9fdde737] DiffEqOperators v4.32.0
  [0c46a032] DifferentialEquations v6.19.0
  [5b8099bc] DomainSets v0.5.6
  [961ee093] ModelingToolkit v6.4.9
  [1dea7af3] OrdinaryDiffEq v5.64.0
  [91a5bcdd] Plots v1.22.0
  [0c5d862f] Symbolics v3.2.3
  [1986cc42] Unitful v1.9.0

mwe

using OrdinaryDiffEq, ModelingToolkit, DiffEqOperators, DomainSets
@parameters t x
@parameters Dn, Dp
@variables u(..) v(..)
Dt = Differential(t)
Dx = Differential(x)
Dxx = Differential(x)^2

eqs  = [Dt(u(t,x)) ~ Dn * Dxx(u(t,x)) + u(t,x)*v(t,x), 
        Dt(v(t,x)) ~ Dp * Dxx(v(t,x)) - u(t,x)*v(t,x)]
bcs = [u(0,x) ~ sin(pi*x/2),
       v(0,x) ~ sin(pi*x/2),
       u(t,0) ~ 0.0, Dx(u(t,1)) ~ 0.0,
       v(t,0) ~ 0.0, Dx(v(t,1)) ~ 0.0]

domains = [t  Interval(0.0,1.0),
           x  Interval(0.0,1.0)]

@named pdesys = PDESystem(eqs,bcs,domains,[t,x],[u(t,x),v(t,x)],[Dn=>0.5, Dp=>2])
discretization = MOLFiniteDifference([x=>0.1],t)
prob = discretize(pdesys,discretization)
sol = solve(prob,Tsit5())

trace

julia> sol = solve(prob,Tsit5())

signal (11): Segmentation fault: 11
in expression starting at REPL[16]:1
ndigits0zpb at ./intfuncs.jl:0
ndigits0z at ./intfuncs.jl:605
< at ./rational.jl:408
ode_determine_initdt at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/initdt.jl:120
auto_dt_reset! at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/integrators/integrator_interface.jl:329 [inlined]
handle_dt! at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:504
#__init#476 at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:466
__init at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:67
__init at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:67 [inlined]
__init at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:67 [inlined]
__init at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:67 [inlined]
__init at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:67 [inlined]
#__solve#475 at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:4 [inlined]
__solve at /Users/anand/.julia/packages/OrdinaryDiffEq/8K0Aj/src/solve.jl:4 [inlined]
#solve_call#42 at /Users/anand/.julia/packages/DiffEqBase/OPDgm/src/solve.jl:61 [inlined]
solve_call at /Users/anand/.julia/packages/DiffEqBase/OPDgm/src/solve.jl:48 [inlined]
#solve_up#44 at /Users/anand/.julia/packages/DiffEqBase/OPDgm/src/solve.jl:87 [inlined]
solve_up at /Users/anand/.julia/packages/DiffEqBase/OPDgm/src/solve.jl:78 [inlined]
#solve#43 at /Users/anand/.julia/packages/DiffEqBase/OPDgm/src/solve.jl:73 [inlined]
solve at /Users/anand/.julia/packages/DiffEqBase/OPDgm/src/solve.jl:68
unknown function (ip: 0x10f4167ab)
jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
do_call at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval_body at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_interpret_toplevel_thunk at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_flex at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_toplevel_eval_in at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
eval at ./boot.jl:373 [inlined]
eval_user_input at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150
repl_backend_loop at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:244
start_repl_backend at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:229
#run_repl#47 at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:362
run_repl at /Users/sabae/src/julia/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:349
jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
#929 at ./client.jl:394
jfptr_YY.929_33876 at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_f__call_latest at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
run_main_repl at ./client.jl:379
exec_options at ./client.jl:309
_start at ./client.jl:495
jfptr__start_33611 at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
jl_apply_generic at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
true_main at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
jl_repl_entrypoint at /Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.7.dylib (unknown line)
Allocations: 207032348 (Pool: 206988583; Big: 43765); GC: 123

ive tested that it works on linux 1.7rc1

@ararslan ararslan added system:arm ARMv7 and AArch64 bug Indicates an unexpected problem or unintended behavior system:mac Affects only macOS labels Sep 17, 2021
@Keno
Copy link
Member

Keno commented Sep 17, 2021

This is the usual strange M1 crash. It likes to hide whenever someone pokes at it with a debugger.

@Keno
Copy link
Member

Keno commented Sep 17, 2021

After much fighting, I managed to trap this in lldb:

    0x12778571c: 0xa9037bfd   stp    x29, x30, [sp, #0x30]
    0x127785720: 0xaa010008   orr    x8, x0, x1
    0x127785724: 0xb40003a8   cbz    x8, 0x127785798
    0x127785728: 0xaa0203f3   mov    x19, x2
    0x12778572c: 0xeb0003e8   negs   x8, x0
    0x127785730: 0xfa0103e9   ngcs   x9, x1
    0x127785734: 0xf100003f   cmp    x1, #0x0                  ; =0x0 
    0x127785738: 0x9a81b121   csel   x1, x9, x1, lt
    0x12778573c: 0x9a80b100   csel   x0, x8, x0, lt
    0x127785740: 0xd1000848   sub    x8, x2, #0x2              ; =0x2 
    0x127785744: 0x93c80508   ror    x8, x8, #0x1
    0x127785748: 0xf1001d1f   cmp    x8, #0x7                  ; =0x7 
    0x12778574c: 0x540007c8   b.hi   0x127785844
    0x127785750: 0xd0c056e9   adrp   x9, -521506
    0x127785754: 0x9116a129   add    x9, x9, #0x5a8            ; =0x5a8 
    0x127785758: 0x1000008a   adr    x10, #0x10
->  0x12778575c: 0x3868692b   ldrb   w11, [x9, x8]
    0x127785760: 0x8b0b094a   add    x10, x10, x11, lsl #2
    0x127785764: 0xd61f0140   br     x10
    0x127785768: 0xdac01008   clz    x8, x0

x9 at this point is 0x00000000a82635a8, which is not mapped. I'm wondering if what's happening here is that we're overflowing the 4GB window for adrp references.

@Keno
Copy link
Member

Keno commented Sep 18, 2021

This looks to be a jump table. macOS has a very suspicious "If macOS" right in the code that emits this sequence:
https://github.com/llvm/llvm-project/blob/6f7483b1ece4747f2aafe4baa05fc07e7dc9ed9d/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L7687-L7689

@Keno
Copy link
Member

Keno commented Sep 18, 2021

Simply removing that doesn't work though, because MachO appears to not have an appropriate relocation to support the large code model:

 @named pdesys = PDESystem(eqs,bcs,domains,[t,x],[u(t,x),v(t,x)],[Dn=>0.5, Dp=>2])
ERROR: MethodError: <unknown>:0: error: unknown AArch64 fixup kind!

I think this is the point where we'll need to get advice from the Apple LLVM folks.

@Keno Keno added this to the 1.7 milestone Sep 18, 2021
@ViralBShah ViralBShah added the system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips label Sep 19, 2021
@Keno Keno added the upstream The issue is with an upstream dependency, e.g. LLVM label Sep 20, 2021
@KristofferC
Copy link
Member

Could you elaborate on why this should be on the milestone? M1 is not tier 1 so by definition, issues on M1 are not release blocking.

@KristofferC KristofferC removed this from the 1.7 milestone Sep 21, 2021
@vchuravy
Copy link
Member

Keno, mentioned https://build.julialang.org/#builders/7/builds/3874 as probably related. (Linux AARch64)

      From worker 3:	julia: /workspace/srcdir/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:441: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
      From worker 3:	
      From worker 3:	signal (6): Aborted
      From worker 3:	in expression starting at /buildworker/worker/tester_linuxaarch64/build/share/julia/test/ccall.jl:1180
      From worker 3:	gsignal at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
      From worker 3:	abort at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
      From worker 3:	Allocations: 913498925 (Pool: 913038572; Big: 460353); GC: 700

@vtjnash vtjnash removed system:mac Affects only macOS system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips labels Nov 10, 2021
@vtjnash
Copy link
Member

vtjnash commented Nov 10, 2021

Removing labels since is seems to be a general AArch64 problem https://build.julialang.org/#/builders/7/builds/5808/steps/5/logs/stdio

@vchuravy
Copy link
Member

vchuravy commented Jan 9, 2022

Since the log above disappeared here is a recent one:

      From worker 2:	julia: /workspace/srcdir/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:479: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
      From worker 2:	
      From worker 2:	signal (6): Aborted
      From worker 2:	in expression starting at /buildworker/worker/tester_linuxaarch64/build/share/julia/test/read.jl:625
      From worker 2:	gsignal at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
      From worker 2:	abort at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
      From worker 2:	Allocations: 1090329829 (Pool: 1089702543; Big: 627286); GC: 841

#43664 seems to address it for aarch64-darwin, but JITLink support for aarch64-elf is nascent on LLVM 14 (https://github.com/llvm/llvm-project/commits/main/llvm/lib/ExecutionEngine/JITLink/ELF_aarch64.cpp)

@vchuravy vchuravy changed the title [m1 apple aarch darwin] 1.7.0rc1 < rational segfault [LLVM] Large code model on AArch64 broken with RTDyld Jan 9, 2022
@Keno
Copy link
Member

Keno commented Jan 20, 2022

Fixed by #43664

@Keno Keno closed this as completed Jan 20, 2022
@anandijain
Copy link
Contributor Author

does this go into 1.7? or do i need to wait for 1.8

@Keno
Copy link
Member

Keno commented Jan 20, 2022

1.8, depends on an LLVM upgrade, which we do not backport

@vchuravy vchuravy reopened this Jan 20, 2022
@vchuravy
Copy link
Member

Reopened since this is still an issue for non-darwin.

@vtjnash
Copy link
Member

vtjnash commented Jan 20, 2022

why did we not enable the fix for non-darwin too then?

@vchuravy
Copy link
Member

Because JITLink on LLVM 13 is not there for aarch64-gnu, iirc. I think we need to wait for LLVM 14

@giordano giordano added the compiler:llvm For issues that relate to LLVM label Feb 20, 2023
MikaelSmith pushed a commit to MikaelSmith/impala that referenced this issue Aug 15, 2023
Impala on Graviton v2 is failing during data load on
  INSERT INTO TABLE tpch_kudu.lineitem SELECT * FROM tpch.lineitem
with
  void llvm::RuntimeDyldELF::resolveAArch64Relocation(const
  llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion
  `isInt<33>(Result) && "overflow check failed for relocation"' failed.

The closest case I could find to this is
JuliaLang/julia#42295. Trying a similar fix of
setting CodeModel to Small for aarch64, although this may have other
issues
(https://github.com/llvm/llvm-project/blob/llvmorg-5.0.1/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp#L85-L89).

Change-Id: Idb87144ba38e3bedacac83c0f86093916f026c4f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior compiler:llvm For issues that relate to LLVM system:arm ARMv7 and AArch64 upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants