In REPL, `((1:10000)...);` eats all memory and hangs. #14126

xitology · 2015-11-25T01:51:08Z

In REPL

julia> ((1:10000)...);

eats 4G of memory and hangs while running at 100% of CPU.

However, it works, albeit very slowly, with

$ time julia -E '((1:10000)...)'
(1,...)
real    0m30.810s
user    0m30.603s
sys 0m0.341s

julia> versioninfo()
Julia Version 0.5.0-dev+1403
Commit 30dd83b (2015-11-21 19:22 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

The text was updated successfully, but these errors were encountered:

nalimilan · 2015-11-25T09:21:26Z

Interestingly, [1:10000...] works fine, but ntuple(i->i, 10000) exhibits the same bug. So it looks like the problem is with tuple construction.

rfourquet · 2015-11-25T12:18:54Z

It seems to have to do with the display of the tuples (even if output is supressed with ;), as the following works just fine

julia> ((1:10000)...); 1

With ntuple, the problem is also mitigated by suppressing output, but still is slow and allocates a lot:

julia> @time ntuple(i->i, 10000); 1;
  0.239551 seconds (9.03 M allocations: 366.526 MB)

julia> @time ntuple(i->i, 40000); 1;
  8.432398 seconds (156.04 M allocations: 7.021 GB, 10.56% gc time)

(avoiding the lambda with a generic function cuts the time by half (roughly), and allocates 1.1GB less).

stevengj · 2015-11-25T19:18:42Z

This is not the sort of thing you should use tuples for; see also #13722 and #11320.

JeffBezanson · 2015-12-05T17:57:30Z

All the time seems to be in LLVM. We're generating code like this:

  call void @"julia_put!_22580"({ [1004 x i64], [0 x i1] }* sret %3, %jl_value_t* %4, { [1004 x i64], [0 x i1] }* %7)
  %.fca.0.0.gep1 = bitcast { [1004 x i64], [0 x i1] }* %3 to i64*
  %.fca.0.0.load = load i64* %.fca.0.0.gep1, align 8
  %.fca.0.0.insert = insertvalue { [1004 x i64], [0 x i1] } undef, i64 %.fca.0.0.load, 0, 0
  %.fca.0.1.gep = getelementptr inbounds { [1004 x i64], [0 x i1] }* %3, i64 0, i32 0, i64 1
  %.fca.0.1.load = load i64* %.fca.0.1.gep, align 8
  %.fca.0.1.insert = insertvalue { [1004 x i64], [0 x i1] } %.fca.0.0.insert, i64 %.fca.0.1.load, 0, 1
  %.fca.0.2.gep = getelementptr inbounds { [1004 x i64], [0 x i1] }* %3, i64 0, i32 0, i64 2
  %.fca.0.2.load = load i64* %.fca.0.2.gep, align 8
  %.fca.0.2.insert = insertvalue { [1004 x i64], [0 x i1] } %.fca.0.1.insert, i64 %.fca.0.2.load, 0, 2
...

and it goes on like that for every element of the tuple. This is in the code for boxing a struct containing a big tuple after an sret, in a jlcall wrapper. Looks like we should use a memcpy after a certain threshold? Or is it possible to allocate the box first, and sret directly into it? cc @vtjnash @Keno

JeffBezanson · 2015-12-12T17:28:41Z

Update: confirmed this code is generated by the SROA pass. This also interacts badly with jb/functions during the tests, making some of them take much longer.

vtjnash · 2015-12-12T21:02:11Z

i think the problem is that we (the frontend) aren't supposed to be creating ArrayTypes of this size, and should instead be switching to malloc'd opaque pointers (or preallocated boxes) for this usage pattern.

JeffBezanson · 2015-12-12T21:37:35Z

That would be fine. I think the biggest problem there is that we could lose optimizations that rely on structs being SSA values, for example reusing the space for structs with disjoint lifetimes. Can LLVM be told that malloc'd storage has "value" semantics and can be optimized this way? Of course this is a detail that may not matter much (after all, we're not running into the particular issue here very often as it is).

vtjnash · 2015-12-12T21:44:47Z

yeah, i think the priority here can be (1) make it work (2) make it optimized

vtjnash · 2016-07-04T04:38:28Z

ntuple is still broken (and seems to have gotten worse), but I think there's already an issue for that, and the original issue here is fixed

JeffBezanson added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Dec 5, 2015

JeffBezanson mentioned this issue Mar 16, 2016

compiler stack overflow #15526

Closed

JeffBezanson mentioned this issue Mar 30, 2016

map on tuples is prohibitively slow #15695

Closed

vtjnash closed this as completed Jul 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In REPL, `((1:10000)...);` eats all memory and hangs. #14126

In REPL, `((1:10000)...);` eats all memory and hangs. #14126

xitology commented Nov 25, 2015

nalimilan commented Nov 25, 2015

rfourquet commented Nov 25, 2015

stevengj commented Nov 25, 2015

JeffBezanson commented Dec 5, 2015

JeffBezanson commented Dec 12, 2015

vtjnash commented Dec 12, 2015

JeffBezanson commented Dec 12, 2015

vtjnash commented Dec 12, 2015

vtjnash commented Jul 4, 2016

In REPL, ((1:10000)...); eats all memory and hangs. #14126

In REPL, ((1:10000)...); eats all memory and hangs. #14126

Comments

xitology commented Nov 25, 2015

nalimilan commented Nov 25, 2015

rfourquet commented Nov 25, 2015

stevengj commented Nov 25, 2015

JeffBezanson commented Dec 5, 2015

JeffBezanson commented Dec 12, 2015

vtjnash commented Dec 12, 2015

JeffBezanson commented Dec 12, 2015

vtjnash commented Dec 12, 2015

vtjnash commented Jul 4, 2016

In REPL, `((1:10000)...);` eats all memory and hangs. #14126

In REPL, `((1:10000)...);` eats all memory and hangs. #14126