Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spillslot allocation to actually reuse spillslots. #56

Merged
merged 2 commits into from
Jun 3, 2022

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Jun 3, 2022

The old logic, which did some linked-list rearranging to try to probe
more-likely-to-be-free slots first and which was inherited straight from
the original IonMonkey allocator, was slightly broken (error in
translation and not in IonMonkey, to be clear): it did not get the
list-splicing right, so quite often dropped a slot on the floor and
failed to consider it for further reuse.

On some experimentation, it seems to work just as well to keep a
SmallVec of spillslot indices per size class instead, and save the last
probe-point in order to spread load throughout the allocated slots while
limiting the number of probes (to bound quadratic behavior).

This change moves the maximum slot count from 285 to 92 in python.wasm
from bytecodealliance/wasmtime#4214, and the maximum frame size from
2384 bytes to 752 bytes.

Sightglass results with Cranelift: no differences except on SpiderMonkey
(presumably because of very large stack frames in main interpreter loop):

compilation :: nanoseconds :: benchmarks-next/spidermonkey/benchmark.wasm

  Δ = 322649484.40 ± 276150647.93 (confidence = 99%)

  new.so is 1.01x to 1.11x faster than old.so!
  old.so is 0.90x to 0.99x faster than new.so!

  [5265054121 5522371452.20 5790849036] new.so
  [5773919099 5845020936.60 5921755485] old.so

compilation :: cycles :: benchmarks-next/spidermonkey/benchmark.wasm

  Δ = 1226157217.60 ± 1049578146.87 (confidence = 99%)

  new.so is 1.01x to 1.11x faster than old.so!
  old.so is 0.90x to 0.99x faster than new.so!

  [20009188534 20987183251.80 22007591586] new.so
  [21943062532 22213340469.40 22505025114] old.so

execution :: nanoseconds :: benchmarks-next/spidermonkey/benchmark.wasm

  Δ = 83516918.70 ± 67008953.30 (confidence = 99%)

  new.so is 1.00x to 1.04x faster than old.so!
  old.so is 0.96x to 1.00x faster than new.so!

  [3473341325 3579943166.40 3666372097] new.so
  [3570475540 3663460085.10 3737485602] old.so

execution :: cycles :: benchmarks-next/spidermonkey/benchmark.wasm

  Δ = 317375574.40 ± 254647615.25 (confidence = 99%)

  new.so is 1.00x to 1.04x faster than old.so!
  old.so is 0.96x to 1.00x faster than new.so!

  [13200115708 13605189187.20 13933711484] new.so
  [13569226542 13922564761.60 14203849912] old.so

The old logic, which did some linked-list rearranging to try to probe
more-likely-to-be-free slots first and which was inherited straight from
the original IonMonkey allocator, was slightly broken (error in
translation and not in IonMonkey, to be clear): it did not get the
list-splicing right, so quite often dropped a slot on the floor and
failed to consider it for further reuse.

On some experimentation, it seems to work just as well to keep a
SmallVec of spillslot indices per size class instead, and save the last
probe-point in order to spread load throughout the allocated slots while
limiting the number of probes (to bound quadratic behavior).

This change moves the maximum slot count from 285 to 92 in `python.wasm`
from bytecodealliance/wasmtime#4214, and the maximum frame size from
2384 bytes to 752 bytes.
first_slot = spillslot_iter;
}
for _attempt in 0..std::cmp::min(self.slots_by_size[size].slots.len(), MAX_ATTEMPTS) {
let spillslot = self.slots_by_size[size].slots[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the correctness of this [i] oddly subtle even though it's ensured mostly by the line immediately above. It wasn't immediately obvious to me that the loop doesn't run when the slots array is empty. If you wanted to make my head hurt less, you could define a method on SpillSlotList that returns Option<SpillSlotIndex> and encapsulates the wrapping behavior that's currently split across several parts of this loop body. I don't think that's necessary for merging this fix, though.

Other than that: I'm convinced this patch does what you said it does, and it's nice that it deletes a bunch of code!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes, it's not totally clear as-is! I went a bit further and defined an iterator; let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I woke up in the middle of the night wondering whether probe_start is updated if the iteration limit is hit. I see that in either unsuccessful case you always reset it to the newly added slot. So that seems okay, I guess?

Thank you for the extended comments, that helps a lot!

@cfallin
Copy link
Member Author

cfallin commented Jun 3, 2022

Updated slightly non-trivially, PTAL if you like!

@cfallin
Copy link
Member Author

cfallin commented Jun 3, 2022

Actually, went back to version you reviewed and then did a more minimal revision based more directly on your suggestion; the std::mem::take in my iterator refactor is potentially moving a lot of memory and causing a slowdown.

@cfallin cfallin merged commit 427e041 into bytecodealliance:main Jun 3, 2022
@cfallin cfallin deleted the fix-spillslot-alloc branch June 3, 2022 23:01
@cfallin cfallin mentioned this pull request Jun 3, 2022
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jun 3, 2022
Pulls in an improvement to spillslot allocation
(bytecodealliance/regalloc2#56).
cfallin added a commit to bytecodealliance/wasmtime that referenced this pull request Jun 4, 2022
Pulls in an improvement to spillslot allocation
(bytecodealliance/regalloc2#56).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants