Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip non-integral address space from datalayout after GC lowering #36705

Merged
merged 1 commit into from
Jul 24, 2020

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Jul 17, 2020

This allows LLVM codegen passes to insert inttoptr and ptrtoint as it wish,
and avoids hitting any illegal ones in those passes.

Fix #36062

@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 17, 2020

Note that even after this there seems to be issues still if one add a verifier after the native codegen.

Instruction does not dominate all uses!
  %6 = bitcast i448* %5 to i128*
  %4 = bitcast i128* %6 to i8*
in function julia_now_16891

which will trigger assertion in the distructor. Without assertion or additional verifier the compilation finishes OK...


define internal void @julia_now_16891([1 x [1 x [1 x i64]]]* noalias nocapture sret %0) !dbg !137639 {
top:
  %1 = alloca {} addrspace(10)*, i32 3
  %gcframe = alloca {} addrspace(10)*, i32 3, align 16
  %2 = bitcast {} addrspace(10)** %gcframe to i8*
  call void @llvm.memset.p0i8.i32(i8* align 16 %2, i8 0, i32 12, i1 false), !tbaa !8249
  %3 = alloca i128, align 16
  %4 = bitcast i128* %6 to i8*
  %5 = alloca i448, align 16
  %6 = bitcast i448* %5 to i128*
  %7 = alloca i32, align 8
  %8 = alloca i64, align 8
  %thread_ptr = call i8* asm "mrc p15, 0, $0, c13, c0, 3", "=r"()
  %ptls_i8 = getelementptr i8, i8* %thread_ptr, i32 8
  %ptls = bitcast i8* %ptls_i8 to {}***
  %9 = bitcast {} addrspace(10)** %gcframe to {} addrspace(10)**
  %10 = bitcast {} addrspace(10)** %9 to i32*
  store i32 4, i32* %10, align 4, !tbaa !8249
  %11 = bitcast {}*** %ptls to {}***
  %12 = load {}**, {}*** %11, align 4
  %13 = getelementptr {} addrspace(10)*, {} addrspace(10)** %gcframe, i32 1
  %14 = bitcast {} addrspace(10)** %13 to {}***
  store {}** %12, {}*** %14, align 4, !tbaa !8249
  %15 = bitcast {}*** %11 to {} addrspace(10)***
  store {} addrspace(10)** %gcframe, {} addrspace(10)*** %15, align 4
  call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %4)
  %16 = bitcast i128* %6 to i8*, !dbg !137640
  %17 = load atomic i32 (i8*)*, i32 (i8*)** bitcast (void ()** @jlplt_jl_gettimeofday_16894_got to i32 (i8*)**) unordered, align 4, !dbg !137640
  %18 = call i32 %17(i8* nonnull %16), !dbg !137640
  %19 = icmp eq i32 %18, 0, !dbg !137643
  br i1 %19, label %L10, label %L8, !dbg !137647

L8:                                               ; preds = %top
  %20 = bitcast i128* %6 to i8*
  %21 = load {}*, {}** @jl_globalYY.1777, align 4, !dbg !137647, !tbaa !8259, !nonnull !4
  %22 = addrspacecast {}* %21 to {} addrspace(10)*, !dbg !137647
  %23 = load {}*, {}** @jl_globalYY.5951, align 4, !dbg !137647, !tbaa !8259, !nonnull !4
  %24 = addrspacecast {}* %23 to {} addrspace(10)*, !dbg !137647
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %20)
  %25 = call {} addrspace(10)* @jl_box_int32(i32 signext %18), !dbg !137647
  %26 = getelementptr {} addrspace(10)*, {} addrspace(10)** %gcframe, i32 2
  store {} addrspace(10)* %25, {} addrspace(10)** %26
  %27 = bitcast {} addrspace(10)** %1 to {} addrspace(10)**, !dbg !137647
  store {} addrspace(10)* %24, {} addrspace(10)** %27, align 4, !dbg !137647
  %28 = getelementptr {} addrspace(10)*, {} addrspace(10)** %1, i32 1, !dbg !137647
  store {} addrspace(10)* %25, {} addrspace(10)** %28, align 4, !dbg !137647
  %29 = call nonnull {} addrspace(10)* @jl_apply_generic({} addrspace(10)* %22, {} addrspace(10)** %1, i32 2), !dbg !137647
  call void @llvm.trap(), !dbg !137647
  unreachable, !dbg !137647

L10:                                              ; preds = %top
  %30 = bitcast i448* %5 to i8*
  %31 = bitcast i128* %6 to i8*
  %32 = bitcast i128* %6 to [2 x i64]*, !dbg !137648
  %.elt = bitcast i128* %6 to i64*, !dbg !137648
  %.unpack = load i64, i64* %.elt, align 16, !dbg !137648, !tbaa !8286
  %.elt10 = getelementptr inbounds [2 x i64], [2 x i64]* %32, i32 0, i32 1, !dbg !137648
  %.unpack11 = load i64, i64* %.elt10, align 8, !dbg !137648, !tbaa !8286
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %31)
  call void @llvm.lifetime.start.p0i8(i64 56, i8* nonnull %30)
  %33 = bitcast i448* %5 to i32*, !dbg !137653
  store i32 0, i32* %33, align 16, !dbg !137653, !tbaa !8286
  %34 = bitcast i448* %5 to i8*, !dbg !137653
  %35 = getelementptr inbounds i8, i8* %34, i32 4, !dbg !137653
  %36 = bitcast i8* %35 to i32*, !dbg !137653
  store i32 0, i32* %36, align 4, !dbg !137653, !tbaa !8286
  %37 = bitcast i448* %5 to i8*, !dbg !137653
  %38 = getelementptr inbounds i8, i8* %37, i32 8, !dbg !137653
  %39 = bitcast i8* %38 to i32*, !dbg !137653
  store i32 0, i32* %39, align 8, !dbg !137653, !tbaa !8286
  %40 = bitcast i448* %5 to i8*, !dbg !137653
  %41 = getelementptr inbounds i8, i8* %40, i32 12, !dbg !137653
  %42 = bitcast i8* %41 to i32*, !dbg !137653
  store i32 0, i32* %42, align 4, !dbg !137653, !tbaa !8286
  %43 = bitcast i448* %5 to i8*, !dbg !137653
  %44 = getelementptr inbounds i8, i8* %43, i32 16, !dbg !137653
  %45 = bitcast i8* %44 to i32*, !dbg !137653
  store i32 0, i32* %45, align 16, !dbg !137653, !tbaa !8286
  %46 = bitcast i448* %5 to i8*, !dbg !137653
  %47 = getelementptr inbounds i8, i8* %46, i32 20, !dbg !137653
  %48 = bitcast i8* %47 to i32*, !dbg !137653
  store i32 0, i32* %48, align 4, !dbg !137653, !tbaa !8286
  %49 = bitcast i448* %5 to i8*, !dbg !137653
  %50 = getelementptr inbounds i8, i8* %49, i32 24, !dbg !137653
  %51 = bitcast i8* %50 to i32*, !dbg !137653
  store i32 0, i32* %51, align 8, !dbg !137653, !tbaa !8286
  %52 = bitcast i448* %5 to i8*, !dbg !137653
  %53 = getelementptr inbounds i8, i8* %52, i32 28, !dbg !137653
  %54 = bitcast i8* %53 to i32*, !dbg !137653
  store i32 0, i32* %54, align 4, !dbg !137653, !tbaa !8286
  %55 = bitcast i448* %5 to i8*, !dbg !137653
  %56 = getelementptr inbounds i8, i8* %55, i32 32, !dbg !137653
  %57 = bitcast i8* %56 to i32*, !dbg !137653
  store i32 0, i32* %57, align 16, !dbg !137653, !tbaa !8286
  %58 = bitcast i448* %5 to i8*, !dbg !137653
  %59 = getelementptr inbounds i8, i8* %58, i32 36, !dbg !137653
  %60 = bitcast i8* %59 to i32*, !dbg !137653
  store i32 0, i32* %60, align 4, !dbg !137653, !tbaa !8286
  %61 = bitcast i448* %5 to i8*, !dbg !137653
  %62 = getelementptr inbounds i8, i8* %61, i32 40, !dbg !137653
  %63 = bitcast i8* %62 to i32*, !dbg !137653
  store i32 0, i32* %63, align 8, !dbg !137653, !tbaa !8286
  %64 = bitcast i448* %5 to i8*, !dbg !137653
  %65 = getelementptr inbounds i8, i8* %64, i32 44, !dbg !137653
  %66 = bitcast i8* %65 to i32*, !dbg !137653
  store i32 0, i32* %66, align 4, !dbg !137653, !tbaa !8286
  %67 = bitcast i448* %5 to i8*, !dbg !137653
  %68 = getelementptr inbounds i8, i8* %67, i32 48, !dbg !137653
  %69 = bitcast i8* %68 to i32*, !dbg !137653
  store i32 0, i32* %69, align 16, !dbg !137653, !tbaa !8286
  %70 = bitcast i448* %5 to i8*, !dbg !137653
  %71 = getelementptr inbounds i8, i8* %70, i32 52, !dbg !137653
  %72 = bitcast i8* %71 to i32*, !dbg !137653
  store i32 0, i32* %72, align 4, !dbg !137653, !tbaa !8286
  %73 = trunc i64 %.unpack to i32, !dbg !137657
  %74 = sext i32 %73 to i64, !dbg !137672
  %75 = icmp eq i64 %.unpack, %74, !dbg !137673
  br i1 %75, label %L25, label %L20, !dbg !137673

L20:                                              ; preds = %L10
  %76 = bitcast i448* %5 to i8*
  %77 = load {}*, {}** @jl_globalYY.16, align 4, !dbg !137673, !tbaa !8259, !nonnull !4
  %78 = addrspacecast {}* %77 to {} addrspace(10)*, !dbg !137673
  %79 = load {}*, {}** @jl_symYY.trunc61, align 4, !dbg !137673, !tbaa !8259, !nonnull !4
  %80 = addrspacecast {}* %79 to {} addrspace(10)*, !dbg !137673
  %81 = load {}*, {}** @SUM.CoreDOT.Int3260, align 4, !dbg !137673, !tbaa !8259, !nonnull !4, !dereferenceable !8266, !align !8264
  %82 = addrspacecast {}* %81 to {} addrspace(10)*, !dbg !137673
  call void @llvm.lifetime.end.p0i8(i64 56, i8* nonnull %76)
  %83 = call {} addrspace(10)* @jl_box_int64(i64 signext %.unpack), !dbg !137673
  %84 = getelementptr {} addrspace(10)*, {} addrspace(10)** %gcframe, i32 2
  store {} addrspace(10)* %83, {} addrspace(10)** %84
  %85 = bitcast {} addrspace(10)** %1 to {} addrspace(10)**, !dbg !137673
  store {} addrspace(10)* %80, {} addrspace(10)** %85, align 4, !dbg !137673
  %86 = getelementptr {} addrspace(10)*, {} addrspace(10)** %1, i32 1, !dbg !137673
  store {} addrspace(10)* %82, {} addrspace(10)** %86, align 4, !dbg !137673
  %87 = getelementptr {} addrspace(10)*, {} addrspace(10)** %1, i32 2, !dbg !137673
  store {} addrspace(10)* %83, {} addrspace(10)** %87, align 4, !dbg !137673
  %88 = call nonnull {} addrspace(10)* @jl_apply_generic({} addrspace(10)* %78, {} addrspace(10)** %1, i32 3), !dbg !137673
  call void @llvm.trap(), !dbg !137673
  unreachable, !dbg !137673

L25:                                              ; preds = %L10
  %89 = bitcast i32* %7 to i8*
  %90 = bitcast i448* %5 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %89)
  store i32 %73, i32* %7, align 8, !dbg !137665, !tbaa !8286
  %91 = bitcast i32* %7 to i8*, !dbg !137671
  %92 = bitcast i448* %5 to i8*, !dbg !137671
  %93 = load atomic i32 (i8*, i8*)*, i32 (i8*, i8*)** bitcast (void ()** @jlplt_localtime_r_16896_got to i32 (i8*, i8*)**) unordered, align 4, !dbg !137671
  %94 = call i32 %93(i8* nonnull %91, i8* nonnull %92), !dbg !137671
  %95 = bitcast i448* %5 to i32*, !dbg !137674
  %96 = load i32, i32* %95, align 16, !dbg !137674, !tbaa !8286
  %97 = bitcast i448* %5 to i8*, !dbg !137674
  %98 = getelementptr inbounds i8, i8* %97, i32 4, !dbg !137674
  %99 = bitcast i8* %98 to i32*, !dbg !137674
  %100 = load i32, i32* %99, align 4, !dbg !137674, !tbaa !8286
  %101 = bitcast i448* %5 to i8*, !dbg !137674
  %102 = getelementptr inbounds i8, i8* %101, i32 8, !dbg !137674
  %103 = bitcast i8* %102 to i32*, !dbg !137674
  %104 = load i32, i32* %103, align 8, !dbg !137674, !tbaa !8286
  %105 = bitcast i448* %5 to i8*, !dbg !137674
  %106 = getelementptr inbounds i8, i8* %105, i32 12, !dbg !137674
  %107 = bitcast i8* %106 to i32*, !dbg !137674
  %108 = load i32, i32* %107, align 4, !dbg !137674, !tbaa !8286
  %109 = bitcast i448* %5 to i8*, !dbg !137674
  %110 = getelementptr inbounds i8, i8* %109, i32 16, !dbg !137674
  %111 = bitcast i8* %110 to i32*, !dbg !137674
  %112 = load i32, i32* %111, align 16, !dbg !137674, !tbaa !8286
  %113 = add i32 %112, 1, !dbg !137676
  %114 = bitcast i448* %5 to i8*, !dbg !137674
  %115 = getelementptr inbounds i8, i8* %114, i32 20, !dbg !137674
  %116 = bitcast i8* %115 to i32*, !dbg !137674
  %117 = load i32, i32* %116, align 4, !dbg !137674, !tbaa !8286
  %118 = add i32 %117, 1900, !dbg !137676
  %119 = sdiv i64 %.unpack11, 1000, !dbg !137678
  %120 = sext i32 %118 to i64, !dbg !137683
  %121 = sext i32 %113 to i64, !dbg !137683
  %122 = sext i32 %108 to i64, !dbg !137683
  %123 = sext i32 %104 to i64, !dbg !137683
  %124 = sext i32 %100 to i64, !dbg !137683
  %125 = sext i32 %96 to i64, !dbg !137683
  call void @llvm.lifetime.end.p0i8(i64 56, i8* nonnull %90)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %89)
  %126 = call nonnull {} addrspace(10)* @julia_validargs_16798(i64 %120, i64 %121, i64 %122, i64 %123, i64 %124, i64 %125, i64 %119, i32 2), !dbg !137690
  %127 = addrspacecast {} addrspace(10)* %126 to {} addrspace(12)*, !dbg !137691
  %128 = load {}*, {}** @jl_globalYY.10, align 4, !dbg !137691, !tbaa !8259, !nonnull !4
  %129 = addrspacecast {}* %128 to {} addrspace(12)*, !dbg !137691
  %130 = icmp eq {} addrspace(12)* %127, %129, !dbg !137691
  br i1 %130, label %L55, label %L74, !dbg !137691

L55:                                              ; preds = %L25
  %131 = sext i32 %96 to i64, !dbg !137683
  %132 = sext i32 %100 to i64, !dbg !137683
  %133 = sext i32 %104 to i64, !dbg !137683
  %134 = sext i32 %108 to i64, !dbg !137683
  %135 = sext i32 %113 to i64, !dbg !137683
  %136 = sext i32 %118 to i64, !dbg !137683
  %137 = bitcast i8* %ptls_i8 to {}***
  %138 = mul nsw i64 %132, 60, !dbg !137692
  %139 = mul nsw i64 %133, 3600, !dbg !137692
  call void @julia_totaldays_16811(i64* noalias nocapture nonnull sret %8, i64 %136, i64 %135, i64 %134), !dbg !137695
  %140 = load i64, i64* %8, align 8, !dbg !137692, !tbaa !8261
  %141 = mul i64 %140, 86400, !dbg !137692
  %142 = add nsw i64 %138, %131, !dbg !137696
  %143 = add nsw i64 %142, %139, !dbg !137696
  %144 = add i64 %143, %141, !dbg !137699
  %145 = mul i64 %144, 1000, !dbg !137692
  %146 = add i64 %119, %145, !dbg !137702
  %.sroa.0.0..sroa_idx313 = bitcast [1 x [1 x [1 x i64]]]* %0 to i64*, !dbg !137675
  store i64 %146, i64* %.sroa.0.0..sroa_idx313, align 8, !dbg !137675
  %147 = getelementptr {} addrspace(10)*, {} addrspace(10)** %gcframe, i32 1
  %148 = load {} addrspace(10)*, {} addrspace(10)** %147, align 4, !tbaa !8249
  %149 = bitcast {}*** %137 to {}***
  %150 = bitcast {}*** %149 to {} addrspace(10)**
  store {} addrspace(10)* %148, {} addrspace(10)** %150, align 4, !tbaa !8249
  ret void, !dbg !137675

L74:                                              ; preds = %L25
  %151 = addrspacecast {} addrspace(10)* %126 to {} addrspace(12)*, !dbg !137691
  %152 = getelementptr {} addrspace(10)*, {} addrspace(10)** %gcframe, i32 2
  store {} addrspace(10)* %126, {} addrspace(10)** %152
  call void @jl_throw({} addrspace(12)* %151), !dbg !137691
  unreachable, !dbg !137691
}

Somehow

  %3 = alloca i128, align 16
  %4 = bitcast i128* %3 to i8*
  %5 = alloca i448, align 16

got turned into

  %3 = alloca i128, align 16
  %4 = bitcast i128* %6 to i8*
  %5 = alloca i448, align 16
  %6 = bitcast i448* %5 to i128*

after the last pass that is supposed to touch the IR.

@vchuravy
Copy link
Member

Would prefer a fix upstream instead of disabling a pass conditionally.

@yuyichao
Copy link
Contributor Author

I agree. Just that this is easy enough to do (and is how I get it past this error with JULIA_LLVM_ARGS ....). I'm not planing to work on a patch on this in the before the end of next week but it should be easy enough to debug at this point...

It turns out that there are two issues causing the additional crash I'm seeing...
The issue with the IR shown above is reported at https://bugs.llvm.org/show_bug.cgi?id=46758 and was probably fixed on master (need to test) by https://reviews.llvm.org/D80101 but the fix isn't really backportable. OTOH, it's actually caused by our AllocOpt pass emitting interleaving alloca and bitcast so it can probably be fixed there instead.

A thrid issue, which is the actual crash I got that I though was caused by the stack coloring bug, is a missing clear in CodeGenPrepare. I believe https://reviews.llvm.org/D84031 fixes it and we'll need to carry the patch.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 18, 2020

The following LLVM patch should fix the stack coloring issue on LLVM < 12

diff --git a/llvm/lib/CodeGen/StackColoring.cpp b/llvm/lib/CodeGen/StackColoring.cpp
index 9d4fdc6b624..14b52e0ca33 100644
--- a/llvm/lib/CodeGen/StackColoring.cpp
+++ b/llvm/lib/CodeGen/StackColoring.cpp
@@ -913,6 +913,8 @@ void StackColoring::remapInstructions(DenseMap<int, int> &SlotRemap) {
     assert(To && From && "Invalid allocation object");
     Allocas[From] = To;

+    const_cast<AllocaInst*>(To)->moveBefore(const_cast<Instruction*>(&*To->getParent()->getFirstInsertionPt()));
+
     // AA might be used later for instruction scheduling, and we need it to be
     // able to deduce the correct aliasing releationships between pointers
     // derived from the alloca being remapped and the target of that remapping.

@vchuravy
Copy link
Member

The following LLVM patch should fix the stack coloring issue on LLVM < 12

So that is a backportable version of https://reviews.llvm.org/D80101?

@yuyichao
Copy link
Contributor Author

Yes.

@yuyichao yuyichao changed the title Disable Loop Strength Reduction on ARM Strip non-integral address space from datalayout after GC lowering Jul 18, 2020
@yuyichao
Copy link
Contributor Author

Change this to removing ni from datalayout instead.

Not sure if this is legal (didn't see any other passes doing this but also didn't see the datalayout being obviously cached as value by anyone, and FWIW if someone thinks these are still non-integeral that sounds OK to me...).

Running the test on ARM with this doesn't seem to cause any issues. The only major failure in the test is the vecelement caused by layout disagreement...

@yuyichao
Copy link
Contributor Author

@Keno Any comment on the legality of this? Assert/debug build of LLVM doesn't seem to be complaining at least...

@Keno
Copy link
Member

Keno commented Jul 20, 2020

I think it's probably fine, though I don't think it's explicitly allowed.

This allows LLVM codegen passes to insert `inttoptr` and `ptrtoint` as it wish,
and avoids hitting any illegal ones in those passes.

Fix #36062
@yuyichao yuyichao merged commit 235784a into master Jul 24, 2020
@yuyichao yuyichao deleted the yyc/arm/lsr branch July 24, 2020 19:50
@staticfloat
Copy link
Member

Hmmm, it appears that with this change, the armv7l buildbots simply freeze up during initial bootstrap: https://build.julialang.org/#/builders/57/builds/1849/steps/8/logs/stdio

@yuyichao
Copy link
Contributor Author

Not duriing bootstrap?

IIUC the log was

command timed out: 3600 seconds without output running ['sh', '-c', 'make -j3 VERBOSE=1 TAGGED_RELEASE_BANNER="Official https://julialang.org/ release" SRCCACHE=/tmp/srccache USECCACHE=1 JULIA_CPU_TARGET="armv7-a;armv7-a,neon;armv7-a,neon,vfp4" MARCH=armv7-a  LLVM_ASSERTIONS=1 FORCE_ASSERTIONS=1 USE_BINARYBUILDER=1  release'], attempting to kill
make[1]: *** Deleting file 'aotcompile.o'
make[1]: *** Deleting file 'jitlayers.o'
make[1]: *** Deleting file 'codegen.o'
make[1]: *** [codegen.o] Terminated
make: *** [julia-src-release] Terminated

So it was still compiling the c++ code.... julia-src-release.... I don't think I'm writing invalid C++ code here so there seems to be something specific about the compiler on the builtbot that is hanging on compilation....

@yuyichao
Copy link
Contributor Author

The run right before mine: https://build.julialang.org/#/builders/57/builds/1848 got an ICE on codegen.o, which is the same file that got stuck here.

@jmkuhn jmkuhn mentioned this pull request Aug 5, 2020
25 tasks
KristofferC pushed a commit that referenced this pull request Aug 10, 2020
This allows LLVM codegen passes to insert `inttoptr` and `ptrtoint` as it wish,
and avoids hitting any illegal ones in those passes.

Fix #36062

(cherry picked from commit 235784a)
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
…g#36705)

This allows LLVM codegen passes to insert `inttoptr` and `ptrtoint` as it wish,
and avoids hitting any illegal ones in those passes.

Fix JuliaLang#36062
KristofferC pushed a commit that referenced this pull request Aug 18, 2020
This allows LLVM codegen passes to insert `inttoptr` and `ptrtoint` as it wish,
and avoids hitting any illegal ones in those passes.

Fix #36062

(cherry picked from commit 235784a)
KristofferC pushed a commit that referenced this pull request Aug 19, 2020
This allows LLVM codegen passes to insert `inttoptr` and `ptrtoint` as it wish,
and avoids hitting any illegal ones in those passes.

Fix #36062

(cherry picked from commit 235784a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

armv7l: ptrtoint not supported for non-integral pointers
5 participants