Hoist object allocation before inner field initialization #45153

gbaraldi · 2022-05-02T15:52:56Z

Just to run tests on the fixes :)

Consider the following pattern for building up nested objects ``` %obj = Expr(:new, obj, ...) %obj_wrapper = Expr(:new, obj_wrapper, ..., %obj) %obj_wrapper2 = Expr(:new, obj_wrapper2, ..., %obj_wrapper) %outer = Expr(:new, outer, %obj_wrapper2) ``` Asssuming everything except `struct outer` is struct-inlineable, the LLVM IR we emit looks something like the following: ``` %obj = alloca %obj_wrapper = alloca %obj_wrapper_wrapper = alloca %obj = alloca init(%obj, <obj>) init(%obj_wrapper, <obj_wrapper>); memcpy(%obj_wrapper, %obj) init(%obj_wrapper2, <obj_wrapper2>); memcpy(%obj_wrapper2, %obj_wrapper) init(%outer, <outer>); memcpy(%outer, %obj_wrapper2) %outer_boxed = julia.gc_alloc memcpy(%outer_boxed, %outer) ``` While LLVM is capable of removing all the allocas and memcpys, it's taking an unreasonable amount of time to do so. This PR introduces a small optimization into the frontend lowering for `:new`: If all the :new calls are in the same LLVM basic block, then we delete the alloca, and hoist the allocation of the object to the earliest point before the initialization of the fields. This gives essentially the same result as LLVM would have given us post-optimization, but is much cheaper to do because we don't have to perform any analysis to tell us that it is a legal optimization. In the above example, we would end up with something like: ``` %outer_boxed = julia.gc_alloc init(%outer_boxed, <obj>) init(%outer_boxed, <obj_wrapper>); init(%outer_boxed, <obj_wrapper2>); init(%outer_boxed, <outer>); ``` Of course this does extend the liftime of the outer object, but I don't think that's a particular problem as long as we're careful not to hoist any boxings out of error paths. In the current implementation, I only allow this optimization to occur in the same LLVM basic block, but I think it should be fine to extend it to allow the same julia basic block or more generally, any allocation that post-dominates the relevant promotion points.

vtjnash · 2022-05-13T07:10:11Z

@Keno what was the bug for 'fix codegen bug" in the commit message? Is it something we should backport? Is there a test? Was this reviewed? None of this seems clear from the context here. There isn't even a statement of what this PR does.

Keno · 2022-05-13T08:07:04Z

This PR is #45093 plus the typo fix that was causing a test failure there.

vtjnash · 2022-05-13T09:28:59Z

What is the bugfix? Is there a test?

…5153)" This reverts commit 93ac4a3.

vtjnash · 2022-05-16T08:45:52Z

src/cgutils.cpp

+                            promotion_point = inst;
+                            promotion_ssa = fval_info.promotion_ssa;
+                        }
+                    } else if (!promotion_point) {


Suggested change

} else if (!promotion_point) {

}

else if (!promotion_point) {

vtjnash · 2022-05-16T08:48:08Z

src/cgutils.cpp

+                    if (field_promotable) {
+                        fval_info.V->replaceAllUsesWith(dest);
+                        cast<Instruction>(fval_info.V)->eraseFromParent();
+                    } else {


Suggested change

} else {

}

else {

vtjnash · 2022-05-16T08:56:10Z

src/codegen.cpp

-        jl_cgval_t res = emit_call(ctx, ex, expr_t);
+        else {
+            expr_t = jl_is_long(ctx.source->ssavaluetypes) ? (jl_value_t*)jl_any_type : jl_array_ptr_ref(ctx.source->ssavaluetypes, ssaidx_0based);
+            is_promotable = ctx.ssavalue_usecount[ssaidx_0based] == 1;


Generally preferable to use .at(i) for all of these, so we don't corrupt memory on bad inputs

vtjnash · 2022-05-16T09:01:15Z

src/cgutils.cpp

+                recursively_adjust_ptr_type(originalAlloca, 0, AddressSpace::Derived);
+                originalAlloca->replaceAllUsesWith(decayed);
+                // end illegal IR
+                cast<Instruction>(vinfo.V)->eraseFromParent();


Cast-away-const should not be done. Don't mark the argument const if you are going to destroy it later

Suggested change

cast<Instruction>(vinfo.V)->eraseFromParent();

originalAlloca->eraseFromParent();

Is there any risk that the numerous invalid pointers this leaves behind will be re-examined later? In particular, we wouldn't previously have assumed that boxed will now destroy the argument passed to it. We could potentially keep a map of all pointers promoted to boxes, and then delete them en mass at the end of the codegen.

In particular, boxed might be called via emit_typecheck, then again on the the next line as part of the store to the field?

I thought about this for a bit, we could use removeFromParentinsted of eraseFromParent so the value still exists but it might be incorrectly used by someone, or make boxed have a default argument that makes the hoisting path only acessible from inside emit_new_struct

vtjnash · 2022-05-16T09:08:37Z

src/cgutils.cpp

+            SmallVector<Type*, 3> ArgTys;
+            Intrinsic::getIntrinsicSignature(II->getCalledFunction(), ArgTys);
+            assert(ArgTys.size() <= II->arg_size());
+            for (size_t i = 0; i < ArgTys.size(); ++i)
+                ArgTys[i] = II->getArgOperand(i)->getType();


See the function_sig_t::emit_a_ccall function for how to minimally remangle a function name correctly (this is not correct). Or see https://llvm.org/doxygen/namespacellvm_1_1Intrinsic.html#a008c247b3c1a9df8b3103a897a967078 for a complicated version (I don't know why LLVM doesn't expose a more convenient wrapper of this, as it is a useful operation we have needed many times)

vtjnash · 2022-05-16T11:09:52Z

src/cgutils.cpp

@@ -3571,6 +3646,7 @@ static jl_cgval_t emit_new_struct(jl_codectx_t &ctx, jl_value_t *ty, size_t narg
                                ->setOrdering(AtomicOrdering::Unordered);
                }
                else if (jl_is_uniontype(jtype)) {
+                    assert(!field_promotable);


This doesn't seem to be ensured anywhere, but just arbitrarily asserted here?

I haven't seen that assertion being hit but I will have to defer to @Keno on this one. If we aren't checking anywhere we could just it to

julia/src/cgutils.cpp

Lines 3609 to 3610 in 6d9ae9d

bool field_promotable = !init_as_value && fval_info.promotion_ssa != -1 &&

fval_info.promotion_point && fval_info.promotion_point->getParent() == ctx.builder.GetInsertBlock();

Functions that return union cgvals don't give it a promotion point (or at least they shouldn't at the moment, which is what the assert is for).

This branch seemed to be about fieldtypes, not return types?

Here is an example of a failure this caused, for reference: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/1c5858b_vs_1c5858b/SumTypes.primary.log

I can add a explicit check here but it might just be a bandaid

vtjnash · 2022-05-16T11:23:10Z

src/cgutils.cpp

+            recursively_adjust_ptr_type(Inst, FromAS, ToAS);
+        }
+#endif
+    }


This may also be required to handle ICmpEQ and AddrspaceCast, but I have the impression we just care about things that could appear in the store code-path? It would be good to document in a comment here that recursively_adjust_ptr_type is very specific to our codegen patterns for this exact case.

vtjnash · 2022-05-16T11:26:13Z

src/cgutils.cpp

+                // Warning: Very illegal IR here temporarily
+                originalAlloca->mutateType(decayed->getType());
+                recursively_adjust_ptr_type(originalAlloca, 0, AddressSpace::Derived);
+                originalAlloca->replaceAllUsesWith(decayed);
+                // end illegal IR


Would it be better if we emitted the original alloca with an addrspacecast to the derived space, so that we don't require any illegal operations to convert it later?

You mean when the nested struct is first emitted already cast the alloca to the derived space?

vtjnash · 2022-05-16T11:50:14Z

src/cgutils.cpp

                emit_typecheck(ctx, fval_info, jtype, "new");
                fval_info = update_julia_type(ctx, fval_info, jtype);
+                // TODO: Use (post-)domination instead.
+                bool field_promotable = !init_as_value && fval_info.promotion_ssa != -1 &&
+                    fval_info.promotion_point && fval_info.promotion_point->getParent() == ctx.builder.GetInsertBlock();


There may also need to be a check here that the type being stored here is compatible, since we might not post-dominate in execution (e.g. via a typeassert call), even though we observe post-domination in IR order (yes, this is also a problem with our current errors violating dominance)? The update_julia_type will partly help with that already (by checking for trivial cases where the types are clearly incompatible, but might miss a case where we store a concrete struct into a field–e.g. Int–that is a union of multiple other types which excludes our type–e.g. Union{Nothing,Int8}).

Yeah, I'm struggling with this one too :/ so I defer to @Keno. But what I got is that a typeassert might make us do an incompatible store but we don't see that because it doesn't dominate the IR we are locally looking at?

Hypothetically, the store might be post-dominated in the normal flow of the IR, but be removed by an exception in the runtime flow

vtjnash · 2022-05-25T19:54:28Z

Bisect identifies that this PR is responsible for the performance and allocation regression noted in 7f84d46#commitcomment-74469445

Keno · 2022-05-25T20:03:48Z

@gbaraldi Can you take a look and come up with the reduced example at least? Happy to jump in if you need help.

gbaraldi · 2022-05-25T20:46:28Z

Will take a look

using BenchmarkTools
X2 = Vector{Union{ComplexF64, Missing}}(rand(ComplexF64, 10_000))
X2[rand(10_000) .> .9] .= missing
Sx = skipmissing(X2)
function perf_sumskipmissing(sX)
    T = eltype(sX)
    s = zero(T)
    @inbounds for i in eachindex(sX)
        s += sX[i]
    end
    s
end

@btime perf_sumskipmissing($Sx)

The allocation comes from getindex that apparently gets boxed in current master. It doesn't get boxed in 1.8 at all, even before optimization. Due to the boxing the function keeps some error checks and that's probably what makes it slower.

gbaraldi · 2022-05-25T22:47:14Z

@Keno The issue seems to be that in 1.8 we could prove that eachindex would always index inbounds and non missing. While in master we can't so there is an allocation for that error. Changing from eachindex to for i in sX makes it go away.

smaller reproducer

X2 = Vector{Union{Int64, Missing}}(rand(Int64, 10))
X2[rand(10) .> .9] .= missing
Sx = skipmissing(X2)
function perf_sumskipmissing(sX)
    @inbounds for i in 1:2
        sX[i]
    end
end

The issue seems to be that we hoist the allocation for the error in master, that probably messes up the optimizations later on.

https://gist.github.com/gbaraldi/2b0ea18799251da2aab208db893b77f4

gbaraldi changed the base branch from kf/heaphoist to master May 2, 2022 15:53

gbaraldi marked this pull request as ready for review May 2, 2022 17:44

Keno and others added 2 commits May 2, 2022 20:06

fix codegen bug

603be2e

gbaraldi force-pushed the heaphoist-fix branch from f6eada8 to 603be2e Compare May 2, 2022 23:06

Keno merged commit 93ac4a3 into JuliaLang:master May 12, 2022

vtjnash added a commit that referenced this pull request May 16, 2022

Revert "Hoist object allocation before inner field initialization (#4…

d518061

…5153)" This reverts commit 93ac4a3.

vtjnash reviewed May 16, 2022

View reviewed changes

gbaraldi mentioned this pull request May 27, 2022

correct codegen bugs introduced by allocation-hoisting-PR #45476

Merged

inkydragon mentioned this pull request Jul 2, 2022

Invalid conversion between floating-point tuples #45485

Closed

gbaraldi mentioned this pull request Mar 7, 2023

Instruction does not dominate all uses!, getelementptr inbounds when testing DimensionalData #48917

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hoist object allocation before inner field initialization #45153

Hoist object allocation before inner field initialization #45153

gbaraldi commented May 2, 2022

vtjnash commented May 13, 2022

Keno commented May 13, 2022

vtjnash commented May 13, 2022

vtjnash May 16, 2022

vtjnash May 16, 2022

vtjnash May 16, 2022

vtjnash May 16, 2022

vtjnash May 16, 2022

vtjnash May 16, 2022

gbaraldi May 16, 2022 •

edited

Loading

vtjnash May 16, 2022

vtjnash May 16, 2022

gbaraldi May 16, 2022 •

edited

Loading

Keno May 16, 2022

vtjnash May 16, 2022

vtjnash Jun 14, 2022

gbaraldi Jun 14, 2022

vtjnash May 16, 2022 •

edited

Loading

vtjnash May 16, 2022

gbaraldi May 16, 2022

vtjnash May 16, 2022 •

edited

Loading

gbaraldi May 16, 2022

vtjnash May 16, 2022

vtjnash commented May 25, 2022

Keno commented May 25, 2022

gbaraldi commented May 25, 2022 •

edited

Loading

gbaraldi commented May 25, 2022 •

edited

Loading

	cast<Instruction>(vinfo.V)->eraseFromParent();
	originalAlloca->eraseFromParent();

	bool field_promotable = !init_as_value && fval_info.promotion_ssa != -1 &&
	fval_info.promotion_point && fval_info.promotion_point->getParent() == ctx.builder.GetInsertBlock();

Hoist object allocation before inner field initialization #45153

Hoist object allocation before inner field initialization #45153

Conversation

gbaraldi commented May 2, 2022

vtjnash commented May 13, 2022

Keno commented May 13, 2022

vtjnash commented May 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaraldi May 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaraldi May 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash May 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash May 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash commented May 25, 2022

Keno commented May 25, 2022

gbaraldi commented May 25, 2022 • edited Loading

gbaraldi commented May 25, 2022 • edited Loading

gbaraldi May 16, 2022 •

edited

Loading

gbaraldi May 16, 2022 •

edited

Loading

vtjnash May 16, 2022 •

edited

Loading

vtjnash May 16, 2022 •

edited

Loading

gbaraldi commented May 25, 2022 •

edited

Loading

gbaraldi commented May 25, 2022 •

edited

Loading