Hoist object allocation before inner field initialization #45093
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Consider the following pattern for building up nested objects
Asssuming everything except
struct outer
is struct-inlineable,the LLVM IR we emit looks something like the following:
While LLVM is capable of removing all the allocas and memcpys, it's
taking an unreasonable amount of time to do so.
This PR introduces a small optimization into the frontend lowering
for
:new
: If all the :new calls are in the same LLVM basic block,then we delete the alloca, and hoist the allocation of the object
to the earliest point before the initialization of the fields.
This gives essentially the same result as LLVM would have given us
post-optimization, but is much cheaper to do because we don't have
to perform any analysis to tell us that it is a legal optimization.
In the above example, we would end up with something like:
Of course this does extend the liftime of the outer object, but I
don't think that's a particular problem as long as we're careful
not to hoist any boxings out of error paths. In the current
implementation, I only allow this optimization to occur in the
same LLVM basic block, but I think it should be fine to extend
it to allow the same julia basic block or more generally, any
allocation that post-dominates the relevant promotion points.
Timings
On the benchmark from #44998, this does quite well, essentially fixing the issue
modulo the noted issue where SLPVectorizer spends a significant chunk of time
on this function without actually doing anything. Our timings are as follows
(ignore the memory allocations, those depend on whether inference had already
run or not when I benchmarked it, which I wasn't careful about because it
only takes ~2 seconds):
master:
master - SLPVectorizer
This PR:
This PR - SLPVectorizer: