-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's too easy to pass large tuples by value #11187
Comments
fix #11187, fix #11450, fix #11026, ref #10525, fix #11003 TODO: confirm all of those numbers were fixed TODO: ensure the lazy-loaded objects have gc-roots TODO: re-enable VectorType objects, so small objects still end up in registers in the calling convention TODO: allow moving pointers sometimes rather than copying TODO: teach the GC how it can re-use an existing pointer as a box this also changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue
fix #11187, fix #11450, fix #11026, ref #10525, fix #11003 TODO: confirm all of those numbers were fixed TODO: ensure the lazy-loaded objects have gc-roots TODO: re-enable VectorType objects, so small objects still end up in registers in the calling convention TODO: allow moving pointers sometimes rather than copying TODO: teach the GC how it can re-use an existing pointer as a box this also changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc
fix #11187, fix #11450, fix #11026, ref #10525, fix #11003 TODO: confirm all of those numbers were fixed TODO: ensure the lazy-loaded objects have gc-roots TODO: re-enable VectorType objects, so small objects still end up in registers in the calling convention TODO: allow moving pointers sometimes rather than copying TODO: teach the GC how it can re-use an existing pointer as a box this also changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc
fix #11187 (pass struct and tuple objects by stack pointer) fix #11450 (ccall emission was frobbing the stack) likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix #11187 (pass struct and tuple objects by stack pointer) fix #11450 (ccall emission was frobbing the stack) likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix #11187 (pass struct and tuple objects by stack pointer) fix #11450 (ccall emission was frobbing the stack) likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
With this now closed, does that mean all tuples are passed by reference now? Or just large ones? |
Appears to be all tuples. Even tiny ones.
See also #11899. My impression is that we should pass small homogeneous tuples by value and as LLVM vectors instead of LLVM arrays. |
If all tuples are passed by reference, won't that have a serious performance effect? (badly, for small tuples) |
empirically, no, since that is what it is doing now. the inliner should take care of any place where the overhead would significantly matter, and it can improve performance by reducing register pressure near call sites (as stated in the motivation for #11899) |
On the register pressure issue, I suspect it's dependent on the tuple's type and target architecture. If the tuple fits in a vector register, then passing it by value uses one register, and passing it by reference uses one register. Of course empirical evidence gets last say on performance. |
Yes, and passing a small immutable tuple by reference means you just have to fetch it from memory later on, which can affect what you've got in caches. A single AVX-512 register == 1 64 byte cache-line. |
LLVM is well aware of this so I'd rather leave the decision to the backend where possible. Anyway, this kind of calls would rather be made on a per-tuple basis instead of for the whole type so we need Jameson's rewrite anyway. |
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer) fix JuliaLang#11450 (ccall emission was frobbing the stack) likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer) fix JuliaLang#11450 (ccall emission was frobbing the stack) likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
Passing large tuples around in julia 0.4 will try to pass them by value which makes LLVM unhappy. We should pass them by pointer before LLVM gets to them. @vtjnash @JeffBezanson
The text was updated successfully, but these errors were encountered: