Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's too easy to pass large tuples by value #11187

Closed
Keno opened this issue May 7, 2015 · 7 comments
Closed

It's too easy to pass large tuples by value #11187

Keno opened this issue May 7, 2015 · 7 comments
Labels
compiler:codegen Generation of LLVM IR and native code
Milestone

Comments

@Keno
Copy link
Member

Keno commented May 7, 2015

Passing large tuples around in julia 0.4 will try to pass them by value which makes LLVM unhappy. We should pass them by pointer before LLVM gets to them. @vtjnash @JeffBezanson

@ihnorton ihnorton added the compiler:codegen Generation of LLVM IR and native code label May 8, 2015
@JeffBezanson JeffBezanson added this to the 0.4.0 milestone May 27, 2015
vtjnash added a commit that referenced this issue Jun 9, 2015
fix #11187, fix #11450, fix #11026, ref #10525, fix #11003
TODO: confirm all of those numbers were fixed
TODO: ensure the lazy-loaded objects have gc-roots
TODO: re-enable VectorType objects, so small objects still end up in
registers in the calling convention
TODO: allow moving pointers sometimes rather than copying
TODO: teach the GC how it can re-use an existing pointer as a box

this also changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue
vtjnash added a commit that referenced this issue Jun 9, 2015
fix #11187, fix #11450, fix #11026, ref #10525, fix #11003
TODO: confirm all of those numbers were fixed
TODO: ensure the lazy-loaded objects have gc-roots
TODO: re-enable VectorType objects, so small objects still end up in
registers in the calling convention
TODO: allow moving pointers sometimes rather than copying
TODO: teach the GC how it can re-use an existing pointer as a box

this also changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc
vtjnash added a commit that referenced this issue Jun 9, 2015
fix #11187, fix #11450, fix #11026, ref #10525, fix #11003
TODO: confirm all of those numbers were fixed
TODO: ensure the lazy-loaded objects have gc-roots
TODO: re-enable VectorType objects, so small objects still end up in
registers in the calling convention
TODO: allow moving pointers sometimes rather than copying
TODO: teach the GC how it can re-use an existing pointer as a box

this also changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc
vtjnash added a commit that referenced this issue Jun 11, 2015
fix #11187 (pass struct and tuple objects by stack pointer)
fix #11450 (ccall emission was frobbing the stack)
likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
vtjnash added a commit that referenced this issue Jun 12, 2015
fix #11187 (pass struct and tuple objects by stack pointer)
fix #11450 (ccall emission was frobbing the stack)
likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
vtjnash added a commit that referenced this issue Jun 16, 2015
fix #11187 (pass struct and tuple objects by stack pointer)
fix #11450 (ccall emission was frobbing the stack)
likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
@oxinabox
Copy link
Contributor

With this now closed, does that mean all tuples are passed by reference now? Or just large ones?

@ArchRobison
Copy link
Contributor

Appears to be all tuples. Even tiny ones.

julia> foo(x) = x
foo (generic function with 1 method)

julia> code_llvm(foo,(Tuple{Int8},))

define [1 x i8] @julia_foo_21494([1 x i8]*) {
top:
  %1 = load [1 x i8]* %0, align 1
  ret [1 x i8] %1
}

See also #11899. My impression is that we should pass small homogeneous tuples by value and as LLVM vectors instead of LLVM arrays.

@ScottPJones
Copy link
Contributor

If all tuples are passed by reference, won't that have a serious performance effect? (badly, for small tuples)

@vtjnash
Copy link
Member

vtjnash commented Jun 30, 2015

If all tuples are passed by reference, won't that have a serious performance effect? (badly, for small tuples)

empirically, no, since that is what it is doing now. the inliner should take care of any place where the overhead would significantly matter, and it can improve performance by reducing register pressure near call sites (as stated in the motivation for #11899)

@ArchRobison
Copy link
Contributor

On the register pressure issue, I suspect it's dependent on the tuple's type and target architecture. If the tuple fits in a vector register, then passing it by value uses one register, and passing it by reference uses one register. Of course empirical evidence gets last say on performance.

@ScottPJones
Copy link
Contributor

Yes, and passing a small immutable tuple by reference means you just have to fetch it from memory later on, which can affect what you've got in caches. A single AVX-512 register == 1 64 byte cache-line.
Right now, the memory allocation in julia lets small things straddle cache-line boundaries, so being able to keep small <= 64 byte things in registers could save two memory accesses (and using up 2 L1/L2 cache "slots")

@carnaval
Copy link
Contributor

LLVM is well aware of this so I'd rather leave the decision to the backend where possible. Anyway, this kind of calls would rather be made on a per-tuple basis instead of for the whole type so we need Jameson's rewrite anyway.

fcard pushed a commit to fcard/julia that referenced this issue Jul 8, 2015
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer)
fix JuliaLang#11450 (ccall emission was frobbing the stack)
likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
fcard pushed a commit to fcard/julia that referenced this issue Jul 8, 2015
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer)
fix JuliaLang#11450 (ccall emission was frobbing the stack)
likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

No branches or pull requests

8 participants