-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vm] Smaller list literals #44391
Comments
I wonder if this transform was less effective than anticipated (only 0.22% total size improvement) because the From precompile2 --trace-inlining:
I saw one example of something like I did the following experiment: compile pkg/compiler/lib/src/dart2js.dart:
The total file size was reduced by 0.535%, from 25139776 to 25005208. |
@rakudrama In general we should be able to eliminate write-barriers even if there is an interfering boxing, we have special optimization which does that (WB is eliminated but if boxing causes GC then we would apply special GC invariant restoration code, which compensates for potentially removed barriers). It would be good to check why this optimization does not kick in in this particular case. |
In trying to make a repro, I needed to have an optional argument that is not optimized away: main() {
foo(1,1,1);
foo(2,2,2,2);
}
@pragma('vm:never-inline')
foo(int i1, int i2, int i3, [int i4 = 0]) {
print([i1,i2,i3]);
print([i1,i2,i3]);
print(i4);
} The generated code for the first I think there might be two issues here:
I'm interested to hear more about GC invariant restoration code. I can see that when parsing the stack for GC there might be an indication that some slot is an new-ish object with future writes with a removed barrier, and so the object is preemptively marked if it moves to a generation that needs marking. Perhaps this exists, but not for arrays since their marking is different (to prevent scanning the whole array, which was a problem in the past). The general case for arrays is hard (knowing the set of indexes written since the last GC by inspecting the stack) but it should be tractable for constant indexes for the first N elements.
|
Okay, I have reread the diff --git a/runtime/vm/compiler/write_barrier_elimination.cc b/runtime/vm/compiler/write_barrier_elimination.cc
index 855cffb37a0..ad32ae1bdd9 100644
--- a/runtime/vm/compiler/write_barrier_elimination.cc
+++ b/runtime/vm/compiler/write_barrier_elimination.cc
@@ -114,7 +114,7 @@ class WriteBarrierElimination : public ValueObject {
// Bitvector with all non-Array-allocation instructions set. Used to
// un-mark Array allocations as usable.
- BitVector* array_allocations_mask_;
+ BitVector* large_array_allocations_mask_;
// Bitvectors for each block of which allocations are new or remembered
// at the start (after Phis).
@@ -189,8 +189,15 @@ void WriteBarrierElimination::SaveResults() {
}
}
+static bool IsCreateLargeArray(Definition* defn) {
+ if (auto create_array = defn->AsCreateArray()) {
+ return create_array->GetConstantNumElements() >= Thread::kArrayLengthLimitForWriteBarrierElimination;
+ }
+ return false;
+}
+
void WriteBarrierElimination::IndexDefinitions(Zone* zone) {
- BitmapBuilder array_allocations;
+ BitmapBuilder large_array_allocations;
GrowableArray<Definition*> create_array_worklist;
@@ -198,7 +205,7 @@ void WriteBarrierElimination::IndexDefinitions(Zone* zone) {
BlockEntryInstr* const block = block_order_->At(i);
if (auto join_block = block->AsJoinEntry()) {
for (PhiIterator it(join_block); !it.Done(); it.Advance()) {
- array_allocations.Set(definition_count_, false);
+ large_array_allocations.Set(definition_count_, false);
definition_indices_.Insert({it.Current(), definition_count_++});
#if defined(DEBUG)
if (tracing_) {
@@ -211,10 +218,10 @@ void WriteBarrierElimination::IndexDefinitions(Zone* zone) {
for (ForwardInstructionIterator it(block); !it.Done(); it.Advance()) {
if (Definition* current = it.Current()->AsDefinition()) {
if (IsUsable(current)) {
- const bool is_create_array = current->IsCreateArray();
- array_allocations.Set(definition_count_, is_create_array);
+ const bool is_create_large_array = IsCreateLargeArray(current);
+ large_array_allocations.Set(definition_count_, is_create_large_array);
definition_indices_.Insert({current, definition_count_++});
- if (is_create_array) {
+ if (is_create_large_array) {
create_array_worklist.Add(current);
}
#if defined(DEBUG)
@@ -234,8 +241,8 @@ void WriteBarrierElimination::IndexDefinitions(Zone* zone) {
it.Advance()) {
if (auto phi_use = it.Current()->instruction()->AsPhi()) {
const intptr_t index = Index(phi_use);
- if (!array_allocations.Get(index)) {
- array_allocations.Set(index, /*can_be_create_array=*/true);
+ if (!large_array_allocations.Get(index)) {
+ large_array_allocations.Set(index, /*can_be_create_large_array=*/true);
create_array_worklist.Add(phi_use);
}
}
@@ -244,9 +251,9 @@ void WriteBarrierElimination::IndexDefinitions(Zone* zone) {
vector_ = new (zone) BitVector(zone, definition_count_);
vector_->SetAll();
- array_allocations_mask_ = new (zone) BitVector(zone, definition_count_);
+ large_array_allocations_mask_ = new (zone) BitVector(zone, definition_count_);
for (intptr_t i = 0; i < definition_count_; ++i) {
- if (!array_allocations.Get(i)) array_allocations_mask_->Add(i);
+ if (!large_array_allocations.Get(i)) large_array_allocations_mask_->Add(i);
}
}
@@ -388,9 +395,9 @@ void WriteBarrierElimination::UpdateVectorForBlock(BlockEntryInstr* entry,
if (current->CanCallDart()) {
vector_->Clear();
} else if (current->CanTriggerGC()) {
- // Clear array allocations. These are not added to the remembered set
- // by Thread::RememberLiveTemporaries() after a scavenge.
- vector_->Intersect(array_allocations_mask_);
+ // Clear large array allocations. These are not added to the remembered
+ // set by Thread::RememberLiveTemporaries() after a scavenge.
+ vector_->Intersect(large_array_allocations_mask_);
}
if (AllocationInstr* const alloc = current->AsAllocation()) {
diff --git a/runtime/vm/thread.cc b/runtime/vm/thread.cc
index 6efc403e9c0..e7d6ddf14d2 100644
--- a/runtime/vm/thread.cc
+++ b/runtime/vm/thread.cc
@@ -659,10 +659,15 @@ class RestoreWriteBarrierInvariantVisitor : public ObjectPointerVisitor {
// Stores into new-space objects don't need a write barrier.
if (obj->IsSmiOrNewObject()) continue;
- // To avoid adding too much work into the remembered set, skip
+ // To avoid adding too much work into the remembered set, skip large
// arrays. Write barrier elimination will not remove the barrier
// if we can trigger GC between array allocation and store.
- if (obj->GetClassId() == kArrayCid) continue;
+ if (obj->GetClassId() == kArrayCid) {
+ const auto length = Smi::Value(Array::RawCast(obj)->untag()->length());
+ if (length >= Thread::kArrayLengthLimitForWriteBarrierElimination) {
+ continue;
+ }
+ }
// Dart code won't store into VM-internal objects except Contexts and
// UnhandledExceptions. This assumption is checked by an assertion in
diff --git a/runtime/vm/thread.h b/runtime/vm/thread.h
index be48283f062..7b76f8f66b6 100644
--- a/runtime/vm/thread.h
+++ b/runtime/vm/thread.h
@@ -1073,6 +1073,8 @@ class Thread : public ThreadState {
: SafepointLevel::kGCAndDeopt;
}
+ static constexpr intptr_t kArrayLengthLimitForWriteBarrierElimination = 16;
+
private:
template <class T>
T* AllocateReusableHandle();
This code is much simpler. We don't need to track which fields will be potentially written - we just need to treat the whole object in a special way and rescan it as whole. So the code just takes all objects in the bottom frame which can potentially violate the invariant and puts those objects into the store buffer / deferred marking queue preemptively. |
Uploaded my fix for the review to eliminate WB when interfering boxing is involved: https://dart-review.googlesource.com/c/sdk/+/229903 |
@mraleph dart2js static calls to ArayWriteBarrierStub: 3357 ⟶ 3177 = 180/3357 = 5.3% reduction.
Eyeballing the I don't know if we can generalize from dart2js, but most interpolations are small and most of the remaining StoreIndexed write barriers comming from interpolation (about 500) could be removed by reordering via an inlined interpolation helper like |
The current VM code pattern for a list literal like
<T>[a, b()]
does the following:a
in array (~4 instructions)b()
b()
could have moved the array via some GCs._GrowableList<T>._fromLiteral(array)
(~9 instructions)For small list literals (up to, say, 8 elements) it might be better to call a helper, e.g.
_GrowableList<T>._literal2(a, b())
._GrowableList<T>._fromLiteral
can be aggressively inlined into the helper to 'pay down' the additional call cost.The benefit of eliminating write barriers can also be achieved by evaluating the element expressions before CreateArray.
The values would tend to spill, so it would need to be clear that a write barrier was indeed avoided by the early evaluation.
It should be possible to rematerialize constants at the indexed store rather than spill them, and other expressions (e.g. variable values) might need a write barrier but no spill slot, so a policy for this would be sensitive to the number of spill slots vs the number of write barriers avoided.
Example:
[a(), const b, const c, d(), e, f]
:a()
andd()
return values that don't require a write barrier.a()
ahead of the array allocation does not help remove a write barrier forconst b
since there is none.d()
ahead of the allocation forcesa()
ahead of allocation, but might allowe
andf
to avoid a write barrier.a()
andd()
remove two write barriers fore
andf
.For larger literals this kind of analysis might be better than using a
_literalN
constructor, which in effect is indiscriminately spilling all the elements via pushing arguments.The text was updated successfully, but these errors were encountered: