Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In REPL, ((1:10000)...); eats all memory and hangs. #14126

Closed
xitology opened this issue Nov 25, 2015 · 9 comments
Closed

In REPL, ((1:10000)...); eats all memory and hangs. #14126

xitology opened this issue Nov 25, 2015 · 9 comments
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster

Comments

@xitology
Copy link
Contributor

In REPL

julia> ((1:10000)...);

eats 4G of memory and hangs while running at 100% of CPU.

However, it works, albeit very slowly, with

$ time julia -E '((1:10000)...)'
(1,...)
real    0m30.810s
user    0m30.603s
sys 0m0.341s
julia> versioninfo()
Julia Version 0.5.0-dev+1403
Commit 30dd83b (2015-11-21 19:22 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
@nalimilan
Copy link
Member

Interestingly, [1:10000...] works fine, but ntuple(i->i, 10000) exhibits the same bug. So it looks like the problem is with tuple construction.

@rfourquet
Copy link
Member

It seems to have to do with the display of the tuples (even if output is supressed with ;), as the following works just fine

julia> ((1:10000)...); 1

With ntuple, the problem is also mitigated by suppressing output, but still is slow and allocates a lot:

julia> @time ntuple(i->i, 10000); 1;
  0.239551 seconds (9.03 M allocations: 366.526 MB)

julia> @time ntuple(i->i, 40000); 1;
  8.432398 seconds (156.04 M allocations: 7.021 GB, 10.56% gc time)

(avoiding the lambda with a generic function cuts the time by half (roughly), and allocates 1.1GB less).

@stevengj
Copy link
Member

This is not the sort of thing you should use tuples for; see also #13722 and #11320.

@JeffBezanson JeffBezanson added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Dec 5, 2015
@JeffBezanson
Copy link
Member

All the time seems to be in LLVM. We're generating code like this:

  call void @"julia_put!_22580"({ [1004 x i64], [0 x i1] }* sret %3, %jl_value_t* %4, { [1004 x i64], [0 x i1] }* %7)
  %.fca.0.0.gep1 = bitcast { [1004 x i64], [0 x i1] }* %3 to i64*
  %.fca.0.0.load = load i64* %.fca.0.0.gep1, align 8
  %.fca.0.0.insert = insertvalue { [1004 x i64], [0 x i1] } undef, i64 %.fca.0.0.load, 0, 0
  %.fca.0.1.gep = getelementptr inbounds { [1004 x i64], [0 x i1] }* %3, i64 0, i32 0, i64 1
  %.fca.0.1.load = load i64* %.fca.0.1.gep, align 8
  %.fca.0.1.insert = insertvalue { [1004 x i64], [0 x i1] } %.fca.0.0.insert, i64 %.fca.0.1.load, 0, 1
  %.fca.0.2.gep = getelementptr inbounds { [1004 x i64], [0 x i1] }* %3, i64 0, i32 0, i64 2
  %.fca.0.2.load = load i64* %.fca.0.2.gep, align 8
  %.fca.0.2.insert = insertvalue { [1004 x i64], [0 x i1] } %.fca.0.1.insert, i64 %.fca.0.2.load, 0, 2
...

and it goes on like that for every element of the tuple. This is in the code for boxing a struct containing a big tuple after an sret, in a jlcall wrapper. Looks like we should use a memcpy after a certain threshold? Or is it possible to allocate the box first, and sret directly into it? cc @vtjnash @Keno

@JeffBezanson
Copy link
Member

Update: confirmed this code is generated by the SROA pass. This also interacts badly with jb/functions during the tests, making some of them take much longer.

@vtjnash
Copy link
Member

vtjnash commented Dec 12, 2015

i think the problem is that we (the frontend) aren't supposed to be creating ArrayTypes of this size, and should instead be switching to malloc'd opaque pointers (or preallocated boxes) for this usage pattern.

@JeffBezanson
Copy link
Member

That would be fine. I think the biggest problem there is that we could lose optimizations that rely on structs being SSA values, for example reusing the space for structs with disjoint lifetimes. Can LLVM be told that malloc'd storage has "value" semantics and can be optimized this way? Of course this is a detail that may not matter much (after all, we're not running into the particular issue here very often as it is).

@vtjnash
Copy link
Member

vtjnash commented Dec 12, 2015

yeah, i think the priority here can be (1) make it work (2) make it optimized

@vtjnash
Copy link
Member

vtjnash commented Jul 4, 2016

ntuple is still broken (and seems to have gotten worse), but I think there's already an issue for that, and the original issue here is fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants