Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary GC roots with jlcall #22415

Closed
yuyichao opened this issue Jun 18, 2017 · 5 comments
Closed

Unnecessary GC roots with jlcall #22415

yuyichao opened this issue Jun 18, 2017 · 5 comments
Labels
compiler:codegen Generation of LLVM IR and native code

Comments

@yuyichao
Copy link
Contributor

yuyichao commented Jun 18, 2017

It turns out that the original case in #15369 is actually not fixed =(. The original case hits #19418 so is not noinline anymore, a slightly modified version is,

julia> @noinline g2(a...) = a[1][]
g2 (generic function with 1 method)

julia> function f2()
           g2(Ref(1), Ref(2))
           g2(Ref(1), Ref(2))
       end
f2 (generic function with 1 method)

code_llvm for f2 is.

define i64 @julia_f2_63188() #0 !dbg !5 {
top:
  %0 = alloca [3 x %jl_value_t addrspace(10)*], align 8
  %gcframe6 = alloca [5 x %jl_value_t addrspace(10)*], align 8
  %gcframe6.sub = getelementptr inbounds [5 x %jl_value_t addrspace(10)*], [5 x %jl_value_t addrspace(10)*]* %gcframe6, i64 0, i64 0
  %.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %0, i64 0, i64 0
  %gcroot5 = getelementptr inbounds [5 x %jl_value_t addrspace(10)*], [5 x %jl_value_t addrspace(10)*]* %gcframe6, i64 0, i64 2
  %1 = getelementptr inbounds [5 x %jl_value_t addrspace(10)*], [5 x %jl_value_t addrspace(10)*]* %gcframe6, i64 0, i64 1
  %2 = bitcast %jl_value_t addrspace(10)** %1 to i8*
  call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 32, i32 8, i1 false)
  %ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10928, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #2
  %3 = bitcast [5 x %jl_value_t addrspace(10)*]* %gcframe6 to i64*
  store i64 6, i64* %3, align 8
  %4 = getelementptr inbounds [5 x %jl_value_t addrspace(10)*], [5 x %jl_value_t addrspace(10)*]* %gcframe6, i64 0, i64 1
  %5 = bitcast i8* %ptls_i8 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = bitcast %jl_value_t addrspace(10)** %4 to i64*
  store i64 %6, i64* %7, align 8
  %8 = bitcast i8* %ptls_i8 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe6.sub, %jl_value_t addrspace(10)*** %8, align 8
  %9 = call %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %ptls_i8, i32 1432, i32 16)
  %10 = addrspacecast %jl_value_t addrspace(10)* %9 to %jl_value_t addrspace(11)*
  %11 = bitcast %jl_value_t addrspace(11)* %10 to %jl_value_t addrspace(10)* addrspace(11)*
  %12 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %11, i64 -1
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652692239792 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(11)* %12, align 8
  %13 = bitcast %jl_value_t addrspace(10)* %9 to i64 addrspace(10)*
  %14 = addrspacecast i64 addrspace(10)* %13 to i64 addrspace(11)*
  store i64 1, i64 addrspace(11)* %14, align 16
  %15 = getelementptr inbounds [5 x %jl_value_t addrspace(10)*], [5 x %jl_value_t addrspace(10)*]* %gcframe6, i64 0, i64 4
  store %jl_value_t addrspace(10)* %9, %jl_value_t addrspace(10)** %15, align 8
  %16 = call %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %ptls_i8, i32 1432, i32 16)
  %17 = addrspacecast %jl_value_t addrspace(10)* %16 to %jl_value_t addrspace(11)*
  %18 = bitcast %jl_value_t addrspace(11)* %17 to %jl_value_t addrspace(10)* addrspace(11)*
  %19 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %18, i64 -1
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652692239792 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(11)* %19, align 8
  %20 = bitcast %jl_value_t addrspace(10)* %16 to i64 addrspace(10)*
  %21 = addrspacecast i64 addrspace(10)* %20 to i64 addrspace(11)*
  store i64 2, i64 addrspace(11)* %21, align 16
  %22 = getelementptr inbounds [5 x %jl_value_t addrspace(10)*], [5 x %jl_value_t addrspace(10)*]* %gcframe6, i64 0, i64 3
  store %jl_value_t addrspace(10)* %16, %jl_value_t addrspace(10)** %22, align 8
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652691390488 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %.sub, align 8
  %23 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %0, i64 0, i64 1
  store %jl_value_t addrspace(10)* %9, %jl_value_t addrspace(10)** %23, align 8
  %24 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %0, i64 0, i64 2
  store %jl_value_t addrspace(10)* %16, %jl_value_t addrspace(10)** %24, align 8
  %25 = call %jl_value_t addrspace(10)* @jl_invoke(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652165224592 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %.sub, i32 3)
  %26 = call %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %ptls_i8, i32 1432, i32 16)
  %27 = addrspacecast %jl_value_t addrspace(10)* %26 to %jl_value_t addrspace(11)*
  %28 = bitcast %jl_value_t addrspace(11)* %27 to %jl_value_t addrspace(10)* addrspace(11)*
  %29 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %28, i64 -1
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652692239792 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(11)* %29, align 8
  %30 = bitcast %jl_value_t addrspace(10)* %26 to i64 addrspace(10)*
  %31 = addrspacecast i64 addrspace(10)* %30 to i64 addrspace(11)*
  store i64 1, i64 addrspace(11)* %31, align 16
  store %jl_value_t addrspace(10)* %26, %jl_value_t addrspace(10)** %15, align 8
  %32 = call %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %ptls_i8, i32 1432, i32 16)
  %33 = addrspacecast %jl_value_t addrspace(10)* %32 to %jl_value_t addrspace(11)*
  %34 = bitcast %jl_value_t addrspace(11)* %33 to %jl_value_t addrspace(10)* addrspace(11)*
  %35 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %34, i64 -1
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652692239792 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(11)* %35, align 8
  %36 = bitcast %jl_value_t addrspace(10)* %32 to i64 addrspace(10)*
  %37 = addrspacecast i64 addrspace(10)* %36 to i64 addrspace(11)*
  store i64 2, i64 addrspace(11)* %37, align 16
  store %jl_value_t addrspace(10)* %32, %jl_value_t addrspace(10)** %22, align 8
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652691390488 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %.sub, align 8
  store %jl_value_t addrspace(10)* %26, %jl_value_t addrspace(10)** %23, align 8
  store %jl_value_t addrspace(10)* %32, %jl_value_t addrspace(10)** %24, align 8
  %38 = call %jl_value_t addrspace(10)* @jl_invoke(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652165224592 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %.sub, i32 3)
  store %jl_value_t addrspace(10)* %38, %jl_value_t addrspace(10)** %gcroot5, align 8
  store %jl_value_t addrspace(10)* %38, %jl_value_t addrspace(10)** %22, align 8
  %39 = bitcast %jl_value_t addrspace(10)* %38 to i64 addrspace(10)*
  %40 = load i64, i64 addrspace(10)* %39, align 16
  %41 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %4, align 8
  %42 = bitcast i8* %ptls_i8 to %jl_value_t addrspace(10)**
  store %jl_value_t addrspace(10)* %41, %jl_value_t addrspace(10)** %42, align 8
  ret i64 %40
}

There's an unnecessary root for the second result at the end of the function (a bit hard to see due to #22414), i.e. in

%38 = call %jl_value_t addrspace(10)* @jl_invoke(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140652165224592 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %.sub, i32 3)
store %jl_value_t addrspace(10)* %38, %jl_value_t addrspace(10)** %gcroot5, align 8
store %jl_value_t addrspace(10)* %38, %jl_value_t addrspace(10)** %22, align 8

The result of the function call (%38) is stored to two different slots even though it is returned later without hitting another safepoint so it shouldn't need any roots.

@Keno
Copy link
Member

Keno commented Jun 18, 2017

Can you open a second issue for the second point here. That should be a fairly straightforward fix once I look at it. The first one is a bit more involved.

@yuyichao
Copy link
Contributor Author

K. I actually opened this one for the second one but decided to also include the first one... I'll open another one for the jlcall frame reuse if you prefer two issues.

@Keno
Copy link
Member

Keno commented Jun 18, 2017

I do. That way I can reference the issue in the pull request that fixes it and it's clear what's being fixed.

@yuyichao
Copy link
Contributor Author

Sorry, the last one was tested with the wrong julia version...........................

@yuyichao
Copy link
Contributor Author

It is still a simpler way to reproduce the extra gcroot issue though.... just with very different LLVM IR....

julia> @noinline g2(a...) = a[1][]
g2 (generic function with 1 method)

julia> f2() = g2(Ref(1))
f2 (generic function with 1 method)

julia> @code_llvm f2()

define i64 @julia_f2_63184() #0 !dbg !5 {
top:
  %0 = alloca [2 x %jl_value_t addrspace(10)*], align 8
  %gcframe2 = alloca [4 x %jl_value_t addrspace(10)*], align 8
  %gcframe2.sub = getelementptr inbounds [4 x %jl_value_t addrspace(10)*], [4 x %jl_value_t addrspace(10)*]* %gcframe2, i64 0, i64 0
  %.sub = getelementptr inbounds [2 x %jl_value_t addrspace(10)*], [2 x %jl_value_t addrspace(10)*]* %0, i64 0, i64 0
  %gcroot1 = getelementptr inbounds [4 x %jl_value_t addrspace(10)*], [4 x %jl_value_t addrspace(10)*]* %gcframe2, i64 0, i64 2
  %1 = getelementptr inbounds [4 x %jl_value_t addrspace(10)*], [4 x %jl_value_t addrspace(10)*]* %gcframe2, i64 0, i64 1
  %2 = bitcast %jl_value_t addrspace(10)** %1 to i8*
  call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 24, i32 8, i1 false)
  %ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10928, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #2
  %3 = bitcast [4 x %jl_value_t addrspace(10)*]* %gcframe2 to i64*
  store i64 4, i64* %3, align 8
  %4 = getelementptr inbounds [4 x %jl_value_t addrspace(10)*], [4 x %jl_value_t addrspace(10)*]* %gcframe2, i64 0, i64 1
  %5 = bitcast i8* %ptls_i8 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = bitcast %jl_value_t addrspace(10)** %4 to i64*
  store i64 %6, i64* %7, align 8
  %8 = bitcast i8* %ptls_i8 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe2.sub, %jl_value_t addrspace(10)*** %8, align 8
  %9 = call %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %ptls_i8, i32 1432, i32 16)
  %10 = addrspacecast %jl_value_t addrspace(10)* %9 to %jl_value_t addrspace(11)*
  %11 = bitcast %jl_value_t addrspace(11)* %10 to %jl_value_t addrspace(10)* addrspace(11)*
  %12 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %11, i64 -1
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140124422927792 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(11)* %12, align 8
  %13 = bitcast %jl_value_t addrspace(10)* %9 to i64 addrspace(10)*
  %14 = addrspacecast i64 addrspace(10)* %13 to i64 addrspace(11)*
  store i64 1, i64 addrspace(11)* %14, align 16
  %15 = getelementptr inbounds [4 x %jl_value_t addrspace(10)*], [4 x %jl_value_t addrspace(10)*]* %gcframe2, i64 0, i64 3
  store %jl_value_t addrspace(10)* %9, %jl_value_t addrspace(10)** %15, align 8
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140124422078488 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %.sub, align 8
  %16 = getelementptr inbounds [2 x %jl_value_t addrspace(10)*], [2 x %jl_value_t addrspace(10)*]* %0, i64 0, i64 1
  store %jl_value_t addrspace(10)* %9, %jl_value_t addrspace(10)** %16, align 8
  %17 = call %jl_value_t addrspace(10)* @jl_invoke(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140123895890192 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %.sub, i32 2)
  store %jl_value_t addrspace(10)* %17, %jl_value_t addrspace(10)** %gcroot1, align 8
  store %jl_value_t addrspace(10)* %17, %jl_value_t addrspace(10)** %15, align 8
  %18 = bitcast %jl_value_t addrspace(10)* %17 to i64 addrspace(10)*
  %19 = load i64, i64 addrspace(10)* %18, align 16
  %20 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %4, align 8
  %21 = bitcast i8* %ptls_i8 to %jl_value_t addrspace(10)**
  store %jl_value_t addrspace(10)* %20, %jl_value_t addrspace(10)** %21, align 8
  ret i64 %19
}

Keno added a commit that referenced this issue Jun 18, 2017
They are not needed any more in this brave new world.

Fixes #22415
@ararslan ararslan added the GC Garbage collector label Jun 18, 2017
@yuyichao yuyichao added compiler:codegen Generation of LLVM IR and native code and removed GC Garbage collector labels Jun 18, 2017
yuyichao pushed a commit to yuyichao/julia that referenced this issue Jun 18, 2017
They are not needed any more in this brave new world.

Fixes JuliaLang#22415
Keno added a commit that referenced this issue Jun 19, 2017
They are not needed any more in this brave new world.

Fixes #22415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

No branches or pull requests

3 participants