Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in Julia-0.7.0-rc2 on Windows and Linux #28474

Closed
mohamed82008 opened this issue Aug 6, 2018 · 26 comments
Closed

Memory leak in Julia-0.7.0-rc2 on Windows and Linux #28474

mohamed82008 opened this issue Aug 6, 2018 · 26 comments

Comments

@mohamed82008
Copy link
Contributor

mohamed82008 commented Aug 6, 2018

I managed to narrow down the trigger of a memory leak happening using this branch of IterativeSolvers.jl https://github.com/mohamed82008/IterativeSolvers.jl/tree/memory_leak and the following code:

using LinearAlgebra, Random, IterativeSolvers

T = ComplexF32
n = 10
Random.seed!(1234322)
A = rand(T, n, n) + 2n * I
b = rand(T, n)
xi = jacobi(A, b, maxiter=2n)

More specifically, the problem happens in this function call https://github.com/mohamed82008/IterativeSolvers.jl/blob/315d75e89703336038f8291f2223da1c24637c86/src/stationary.jl#L45; it's not going through the method it should.

Here is my versioninfo:

julia> versioninfo()
Julia Version 0.7.0-rc2.0
Commit 78540cba4c (2018-08-02 19:14 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

The problem is not there in v0.7.0-beta2.

@KristofferC
Copy link
Member

it's not going through the method it should.

Could you elaborate on this?

@mohamed82008
Copy link
Contributor Author

The line I referred to keeps hanging forever, increasing the memory the Julia app is taking until it takes up all the RAM and my OS crashes. When I print something before the function call and another inside the function, it prints what's before but not what's inside. So my guess is that it gets lost somewhere before finding the method and starts taking increasing amounts of memory doing what it's doing, until it takes up all the RAM.

@mohamed82008
Copy link
Contributor Author

Perhaps memory leak is not the correct term, but I don't know what else to call it.

@KristofferC
Copy link
Member

KristofferC commented Aug 6, 2018

I cannot repro on mac so perhaps windows only?

julia> xi = jacobi(A, b, maxiter=2n)
10-element Array{Complex{Float32},1}:
   0.029282639f0 + 0.004886905f0im
   0.044216298f0 - 0.002815192f0im
   0.018236611f0 + 0.02393339f0im
   0.027247218f0 + 0.030936312f0im
    0.03949458f0 - 0.00412868f0im
  0.0031394882f0 - 0.0018772872f0im
...

julia> versioninfo()
Julia Version 0.7.0-rc2.0
Commit 78540cba4c (2018-08-02 19:14 UTC)
...

@mohamed82008
Copy link
Contributor Author

I can repro on a Ubuntu:

Julia Version 0.7.0-rc2.0
Commit 78540cba4c (2018-08-02 19:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

@mohamed82008 mohamed82008 changed the title Memory leak in Julia-0.7.0-rc2 on Windows Memory leak in Julia-0.7.0-rc2 on Windows and Linux Aug 6, 2018
@Keno
Copy link
Member

Keno commented Aug 6, 2018

I can't reproduce this either (on Linux).

@mohamed82008
Copy link
Contributor Author

I could reproduce it on a fresh build on Ubuntu 18.04 LTS.

@Keno
Copy link
Member

Keno commented Aug 6, 2018

Can you upload an exact MANIFEST file that reproduces this for you on a clean build?

@mohamed82008

This comment has been minimized.

@KristofferC
Copy link
Member

path = "/home/mohd/.julia/dev/IterativeSolvers" means it is impossible to reproduce

@mohamed82008
Copy link
Contributor Author

I am sorry if I misunderstood Keno's request but this is what's in the file .julia/environments/v0.7/Manifest.toml. If there is more info that you need please tell me how to get it.

@KristofferC
Copy link
Member

The point is that if you have any local changes, that might be why we cannot reproduce this. So do pkg> free IterativeSolvers and repro. Alt, add IterativeSolvers#master and repro.

@mohamed82008
Copy link
Contributor Author

But the problem is not on IterativeSolvers master, it's on the memory_leak branch on my repo which I linked above https://github.com/mohamed82008/IterativeSolvers.jl/tree/memory_leak.

@KristofferC
Copy link
Member

KristofferC commented Aug 6, 2018

Sorry, I missed that. If you wanted a reproducible Manifest you could have done add IterativeSolvers#memory_leak. Anyway, I'll try on that branch.

Edit: Still cannot repro.

@mohamed82008
Copy link
Contributor Author

mohamed82008 commented Aug 6, 2018

[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[Distributed]]
deps = ["LinearAlgebra", "Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"

[[InteractiveUtils]]
deps = ["LinearAlgebra", "Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"

[[IterativeSolvers]]
deps = ["LinearAlgebra", "Printf", "Random", "RecipesBase", "SparseArrays", "Test"]
git-tree-sha1 = "4964fc02193d5f697dab099b4872e061bd946b55"
repo-rev = "memory_leak"
repo-url = "https://github.com/mohamed82008/IterativeSolvers.jl"
uuid = "42fd0dbc-a981-5370-80f2-aaf504508153"
version = "0.7.0+"

[[Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[LinearAlgebra]]
deps = ["Libdl"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"

[[Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"

[[Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[[Random]]
deps = ["Serialization"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[[RecipesBase]]
deps = ["Random", "Test"]
git-tree-sha1 = "298e774bc0fd26b7011358fc9ff0223b4e81aaa6"
uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
version = "0.4.0"

[[Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"

[[Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"

[[SparseArrays]]
deps = ["LinearAlgebra", "Random"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[[Test]]
deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

@Keno
Copy link
Member

Keno commented Aug 6, 2018

I tried both my local source build and the generic binaries with this manifest and both work fine.

@mohamed82008
Copy link
Contributor Author

I tried it on Travis and I cannot reproduce it there.

@tomaklutfu
Copy link
Contributor

I can reproduce that julia sucks up all memory and crashes + cannot interrupt it. Here is llvm IR if it helps. I can run with generic target (a.k.a julia -Cgeneric).

julia> versioninfo()
Julia Version 0.7.0-rc2.0
Commit 78540cba4c (2018-08-02 19:14 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

julia> @code_llvm jacobi(A, b, maxiter=2n)

; Function #jacobi
; Location: none
define nonnull %jl_value_t addrspace(10)* @"julia_#jacobi_35214"({ i64 } addrspace(11)* nocapture nonnull readonly dereferenceable(8), %jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40), %jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40)) {
top:
  %3 = alloca %jl_value_t addrspace(10)*, i32 5
  %gcframe = alloca %jl_value_t addrspace(10)*, i32 3
  %4 = bitcast %jl_value_t addrspace(10)** %gcframe to i8*
  call void @llvm.memset.p0i8.i32(i8* %4, i8 0, i32 24, i32 0, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"()
  %ptls_i8 = getelementptr i8, i8* %thread_ptr, i64 -10920
  %ptls = bitcast i8* %ptls_i8 to %jl_value_t***
; Function pairs; {
; Location: iterators.jl:226
; Function Type; {
; Location: iterators.jl:169
  %5 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 0
  %6 = bitcast %jl_value_t addrspace(10)** %5 to i64*
  store i64 2, i64* %6
  %7 = getelementptr %jl_value_t**, %jl_value_t*** %ptls, i32 0
  %8 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 1
  %9 = bitcast %jl_value_t addrspace(10)** %8 to %jl_value_t***
  %10 = load %jl_value_t**, %jl_value_t*** %7
  store %jl_value_t** %10, %jl_value_t*** %9
  %11 = bitcast %jl_value_t*** %7 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe, %jl_value_t addrspace(10)*** %11
  %12 = bitcast %jl_value_t*** %ptls to i8*
  %13 = call noalias nonnull %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %12, i32 1448, i32 32) #1
  %14 = bitcast %jl_value_t addrspace(10)* %13 to %jl_value_t addrspace(10)* addrspace(10)*
  %15 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(10)* %14, i64 -1
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140574987654192 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(10)* %15
  %16 = addrspacecast %jl_value_t addrspace(10)* %13 to %jl_value_t addrspace(11)*
  %17 = getelementptr inbounds { i64 }, { i64 } addrspace(11)* %0, i64 0, i32 0
  %18 = bitcast %jl_value_t addrspace(10)* %13 to i64 addrspace(10)*
  %19 = load i64, i64 addrspace(11)* %17, align 8
  store i64 %19, i64 addrspace(10)* %18, align 8
  %20 = bitcast %jl_value_t addrspace(11)* %16 to i8 addrspace(11)*
  %21 = getelementptr inbounds i8, i8 addrspace(11)* %20, i64 8
  %22 = bitcast i8 addrspace(11)* %21 to %jl_value_t addrspace(10)* addrspace(11)*
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140575001743104 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(11)* %22, align 8
  %23 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 2
  store %jl_value_t addrspace(10)* %13, %jl_value_t addrspace(10)** %23
;}}
  %24 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 0
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140574972332056 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %24
  %25 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 1
  store %jl_value_t addrspace(10)* %13, %jl_value_t addrspace(10)** %25
  %26 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 2
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140574972330856 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %26
  %27 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 3
  store %jl_value_t addrspace(10)* %1, %jl_value_t addrspace(10)** %27
  %28 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 4
  store %jl_value_t addrspace(10)* %2, %jl_value_t addrspace(10)** %28
  %29 = call nonnull %jl_value_t addrspace(10)* @jl_invoke(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140574977403408 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %3, i32 5)
  %30 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 1
  %31 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %30
  %32 = getelementptr %jl_value_t**, %jl_value_t*** %ptls, i32 0
  %33 = bitcast %jl_value_t*** %32 to %jl_value_t addrspace(10)**
  store %jl_value_t addrspace(10)* %31, %jl_value_t addrspace(10)** %33
  ret %jl_value_t addrspace(10)* %29
}

@mohamed82008
Copy link
Contributor Author

Actually, I can reproduce the problem on IterativeSolvers master branch so no need to use my repo or the memory_leak branch.

@Keno
Copy link
Member

Keno commented Aug 6, 2018

Aha, I can reproduce the behavior with -Cskylake.

@Keno
Copy link
Member

Keno commented Aug 6, 2018

With LLVM assertions, I get:

julia: /home/keno/julia/usr/include/llvm/Support/Casting.h:255: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::VectorType; Y = llvm::Type; typename llvm::cast_retty<X, Y*>::ret_type = llvm::VectorType*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

signal (6): Aborted
in expression starting at no file:0
__libc_signal_restore_set at /build/glibc-itYbWN/glibc-2.26/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80 [inlined]
raise at /build/glibc-itYbWN/glibc-2.26/signal/../sysdeps/unix/sysv/linux/raise.c:48
abort at /build/glibc-itYbWN/glibc-2.26/stdlib/abort.c:90
__assert_fail_base at /build/glibc-itYbWN/glibc-2.26/assert/assert.c:92
__assert_fail at /build/glibc-itYbWN/glibc-2.26/assert/assert.c:101
cast<llvm::VectorType, llvm::Type> at /home/keno/julia/usr/include/llvm/Support/Casting.h:255
NumberVectorBase at /home/keno/julia/src/llvm-late-gc-lowering.cpp:631

@Keno
Copy link
Member

Keno commented Aug 6, 2018

That's a simple fix. From looking at the IR, this may have been another instance of #28445. I'm gonna push a fix for that LLVM assertion momentarily. After that, please try with master and see if the issue is resolved.

Keno added a commit that referenced this issue Aug 6, 2018
GEPs can make vectors out of regular pointers if the offset is a
vector. Fixes an assertion noticed in #28474.
@mohamed82008
Copy link
Contributor Author

Not resolved yet.

@mohamed82008
Copy link
Contributor Author

Julia Version 1.0.0-DEV.56
Commit 3c2c8112ef (2018-08-06 19:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

@mohamed82008
Copy link
Contributor Author

Oh I didn't checkout ur branch. I will try again.

@mohamed82008
Copy link
Contributor Author

Ok, seems to be solved with your fix on my Linux machine.

Keno added a commit that referenced this issue Aug 6, 2018
GEPs can make vectors out of regular pointers if the offset is a
vector. Fixes an assertion noticed in #28474.
@Keno Keno closed this as completed Aug 7, 2018
ararslan pushed a commit that referenced this issue Aug 7, 2018
GEPs can make vectors out of regular pointers if the offset is a
vector. Fixes an assertion noticed in #28474.

(cherry picked from commit a37e090)
Keno added a commit that referenced this issue Aug 7, 2018
GEPs can make vectors out of regular pointers if the offset is a
vector. Fixes an assertion noticed in #28474.

(cherry picked from commit a37e090)
KristofferC pushed a commit that referenced this issue Feb 11, 2019
GEPs can make vectors out of regular pointers if the offset is a
vector. Fixes an assertion noticed in #28474.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants