Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced #75974

SkiFire13 · 2020-08-27T08:13:38Z

If deref_mut is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to call sift_down.

This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.

rust-highfive · 2020-08-27T08:13:41Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @LukasKalbertodt (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

LukasKalbertodt · 2020-08-30T10:59:38Z

This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.

But in those cases, couldn't you simply not call peek_mut until you checked that condition? I guess it is rather unlikely that people call peek_mut and then at most immutably deref it later, right?

The assignment self.sift = true in the DerefMut worries me, as it makes it slightly slower which might or might not have an effect in practice. And note that one sift_down is also not terribly slow: if there is no need to sift, that's only one 2 * pos + 1 and one comparison.

So yeah, unfortunately, it's not clear to me if this would be an actual improvement :/
Do you have a specific use case for this?

SkiFire13 · 2020-08-30T19:32:10Z

But in those cases, couldn't you simply not call peek_mut until you checked that condition? I guess it is rather unlikely that people call peek_mut and then at most immutably deref it later, right?

I could, but I need to know how the internals of PeekMut work in order to realize I should. Otherwise it would be counterintuitive.

Consider this code (taken from the example linked below):

let mut max = heap.peek_mut().unwrap();
if x < *max {
    *max = x;
}

vs

if x < heap.peek().unwrap() {
    heap.peek_mut().unwrap() = x;
}

Why would someone choose the second one? It requires more and uglier code! Intuitively the first one should be the way to go, it has less boilerplate, less method calls and you would expect it to also be faster, but that's wrong.

The assignment self.sift = true in the DerefMut worries me, as it makes it slightly slower which might or might not have an effect in practice. And note that one sift_down is also not terribly slow: if there is no need to sift, that's only one 2 * pos + 1 and one comparison.

LLVM should be able to optimize it, at least in most trivial usecases. But even if it can't, chances are that the rest of your code is at least an order of magnitude slower than that assignment.

sift_down is not terribly slow, but it may become a bottleneck since it may be as expensive as the rest of your code (like in the following example)

Do you have a specific use case for this?

This reddit thread has a simple example that runs considerably slower (1.5-2x slower) because of this missed optimization.

LukasKalbertodt · 2020-09-05T09:13:15Z

Ok, that makes sense, thanks for that explanation. Thinking a bit more about it, you are probably right that the self.sift = true will be optimized out or not make a significant difference.

However, it would be great to have some benchmarks to confirm our assumptions. I took a quick look, but couldn't find any binary heap benchmarks yet. They should be in library/alloc/benches/. Could you add a couple of benchmarks? For this PR, only benchmarks using peek_mut are relevant: please add one that shows the speedup of this PR, but also one that might get slower due to this change. And if you have the time, adding some other benchmarks not using peek_mut (something like Dijkstra maybe?) would be appreciated as well!
If you don't have any time to add benchmarks, just let me know!

tesuji · 2020-09-05T09:25:20Z

Might conflict with #76334 .

Mark-Simulacrum · 2020-09-05T12:20:39Z

I am concerned about the "without internal mutability" caveat here - I don't think we should assume that's not the case, especially when the user has explicitly called peek_mut() hinting that they're changing the value.

We could set the flag in both deref and derefmut, but presumably that's not helpful?

Maybe an entry-like API, which provides upgradable shared access would help here.

SkiFire13 · 2020-09-05T13:36:33Z

@LukasKalbertodt I'm working on them, you can expect an update later today or tomorrow.

@Mark-Simulacrum This PR aims to improve PeekMut's performance by not setting the flag when only deref is called, so just changing it would clearly not be an acceptable solution. Maybe add another method that acts as deref but doesn't set the flag? It would be explicit, so it wouldn't conflict with internal mutability, but would also not be intuitive.
The entry-like API could be another solution, but it also doesn't seem intuitive.

If we leave this PR as it is then if someone wants to use internal mutability it can make an explicit call to deref_mut to mark that the item has been mutated. IMO this is ok because the current docs don't give any guarantee on the use of peek_mut, let alone with internal mutability. In fact BinaryHeap's docs say that

It is a logic error for an item to be modified in such a way that the item's ordering relative to any other item, as determined by the Ord trait, changes while it is in the heap.

PeekMut's item is in the heap, so this applies, and neither peek_mut's nor PeekMut's docs say anything to correct this! So for the docs using peek_mut to modify the greatest element and make it smaller than another element is a logic error. Of course that's not the current case, but anyone that is currently using peek_mut is just relying on implementation details.

We could take this opportunity to fix peek_mut docs and specify its behaviour when using internal mutability according to this PR (that is, it doesn't guarantee to leave the heap in a consistent state, unless deref_mut is called).
There's also the possibility to add a clippy lint when PeekMut is never mutably dereferenced (and maybe consumed?).

Mark-Simulacrum · 2020-09-05T14:13:42Z

Yes, that's true that the documentation technically permits us to do this. I don't think that there's a need to add an explicit "resift" method to PeekMut. I would not want to document deref_mut as special. I am okay with this as-is -- I missed that we made the internal mutability claim that strong in the docs.

As an aside, I am surprised that we document PeekMut as "Cost is O(1) in the worst case." -- it seems like the sifting in the destructor is O(log n) in the worst case.

calebsander · 2020-09-05T18:36:57Z

As an aside, I am surprised that we document PeekMut as "Cost is O(1) in the worst case." -- it seems like the sifting in the destructor is O(log n) in the worst case.

I think the documentation is referring to the peek_mut() method itself, which is constant-time. But I agree it's a bit misleading to discount the cost of the destructor, as it will always run unless PeekMut::pop() is called.

calebsander · 2020-09-05T19:09:47Z

You are probably right that the self.sift = true will be optimized out or not make a significant difference.

The generated assembly after this change confirms that sift is optimized out. If the x < *max comparison fails, the function returns immediately. Otherwise, it swaps the value and runs BinaryHeap::sift_down().

Since sift is optimized out anyways, I think this PR makes a lot more sense than the one I proposed (#76334, which removes the sift field entirely and always performs a sift_down() if PeekMut::pop() isn't called).

rust-log-analyzer · 2020-09-05T21:26:09Z

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

Initialized empty Git repository in /home/runner/work/rust/rust/.git/
##[command]git remote add origin https://github.com/rust-lang/rust
##[command]git config gc.auto 0
##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/75974/merge:refs/remotes/pull/75974/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

SkiFire13 · 2020-09-05T21:33:25Z

Oops, looks like I messed up something with git.

SkiFire13 · 2020-09-05T21:45:26Z

I don't know how I ended up with those changes...
Anyway, I added some benchmarks, in particular:

bench_find_smallest_1000: shows the improvements this PR brings compared to the current implementation in a possible usecase
bench_peek_mut_deref_mut: shows the overhead this PR brings in what should be the worst case.
bench_from_vec, bench_into_sorted_vec, bench_push and bench_pop: just benchmarks for the main BinaryHeap's functions

I couldn't figure out how to easily benchmark something like Dijkstra. What should be the input to test? Could it be randomized? In the end I opted for simple benchmarks for the main functions.

LukasKalbertodt

Thanks a bunch for those benchmarks! Running the benchmarks locally:

With your sift changes:

test binaryheap::bench_find_smallest_1000   ... bench:     347,545 ns/iter (+/- 6,682)
test binaryheap::bench_from_vec             ... bench:     786,415 ns/iter (+/- 20,890)
test binaryheap::bench_into_sorted_vec      ... bench:     493,413 ns/iter (+/- 3,795)
test binaryheap::bench_peek_mut_deref_mut   ... bench:   1,203,026 ns/iter (+/- 8,469)
test binaryheap::bench_pop                  ... bench:     598,884 ns/iter (+/- 22,753)
test binaryheap::bench_push                 ... bench:     688,020 ns/iter (+/- 8,597)

Without the sift changes:

test binaryheap::bench_find_smallest_1000   ... bench:     752,333 ns/iter (+/- 14,042)
test binaryheap::bench_from_vec             ... bench:     796,535 ns/iter (+/- 73,442)
test binaryheap::bench_into_sorted_vec      ... bench:     497,072 ns/iter (+/- 8,736)
test binaryheap::bench_peek_mut_deref_mut   ... bench:     717,605 ns/iter (+/- 2,112)
test binaryheap::bench_pop                  ... bench:     606,103 ns/iter (+/- 4,773)
test binaryheap::bench_push                 ... bench:     680,548 ns/iter (+/- 3,965)

The three general benchmarks don't show any meaningful difference. The "smallest 1000" one showed a more than 2x performance improvement, while the bench_peek_mut_deref_mut one ran notably slower. However, bench_peek_mut_deref_mut is highly artificial and as stated above, I think hardly any real world code will be affected by this slow down. On the other hand, "smallest 1000" is real code, so 👍 on that change.

I left three mini inline comments. Also, could you fix the confusion in the docs (see this comment)? Just add a short sentence "If the item is modified, the the worst case time complexity is O(log n)" ... or something like that.

Finally, before merging this, could you fix the git history? The merge commits and "revert merge" commit should be removed. It would be best if your PR contains 3 commits in this order: "add benchmarks", "avoid useless sift_down", "fix peek_mut docs". In particularly, adding those benchmarks in a commit before the sift change makes it easier to compare both versions.

How experienced are you with git? If you want, I can guide you through it on Zulip or tell you all the commands necessary.

library/alloc/benches/binaryheap.rs

LukasKalbertodt

Great, thanks a bunch! :)

LukasKalbertodt · 2020-09-20T14:41:13Z

@bors r+

bors · 2020-09-20T14:41:14Z

📌 Commit ca15e9d has been approved by LukasKalbertodt

…Kalbertodt Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced If `deref_mut` is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to call `sift_down`. This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.

bors · 2020-09-21T05:31:06Z

⌛ Testing commit ca15e9d with merge a409a23...

bors · 2020-09-21T07:48:50Z

☀️ Test successful - checks-actions, checks-azure
Approved by: LukasKalbertodt
Pushing a409a23 to master...

rust-highfive assigned LukasKalbertodt Aug 27, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 27, 2020

jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 3, 2020

LukasKalbertodt mentioned this pull request Sep 5, 2020

Simplify binary_heap::PeekMut struct and destructor #76334

Closed

calebsander mentioned this pull request Sep 13, 2020

Simplify push_pop_{max, min} and replace_{max, min} implementations tov/min-max-heap-rs#8

Merged

SkiFire13 mentioned this pull request Sep 15, 2020

Add helper method for taking the k smallest elements in an iterator rust-itertools/itertools#473

Merged

LukasKalbertodt suggested changes Sep 19, 2020

View reviewed changes

library/alloc/benches/binaryheap.rs Outdated Show resolved Hide resolved

library/alloc/benches/binaryheap.rs Outdated Show resolved Hide resolved

library/alloc/benches/binaryheap.rs Outdated Show resolved Hide resolved

SkiFire13 force-pushed the peekmut-opt-sift branch from 0716aab to abba50e Compare September 19, 2020 23:09

SkiFire13 added 3 commits September 20, 2020 01:12

Added benchmarks for BinaryHeap

924cd13

Set sift=true only when PeekMut yields a mutable reference

af1e363

Fix time complexity in BinaryHeap::peek_mut docs

ca15e9d

SkiFire13 force-pushed the peekmut-opt-sift branch from abba50e to ca15e9d Compare September 19, 2020 23:12

SkiFire13 requested a review from LukasKalbertodt September 20, 2020 08:59

LukasKalbertodt approved these changes Sep 20, 2020

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 20, 2020

Dylan-DPC-zz mentioned this pull request Sep 21, 2020

Rollup of 14 pull requests #76987

Closed

Dylan-DPC-zz mentioned this pull request Sep 21, 2020

Rollup of 17 pull requests #76988

Closed

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 21, 2020

bors merged commit a409a23 into rust-lang:master Sep 21, 2020

rustbot added this to the 1.48.0 milestone Sep 21, 2020

SkiFire13 deleted the peekmut-opt-sift branch September 21, 2020 08:10

cuviper mentioned this pull request Sep 28, 2020

Avoid sift_down for unmutated PeekMut (rust#75974) sekineh/binary-heap-plus-rs#27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced #75974

Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced #75974

SkiFire13 commented Aug 27, 2020

rust-highfive commented Aug 27, 2020

LukasKalbertodt commented Aug 30, 2020

SkiFire13 commented Aug 30, 2020

LukasKalbertodt commented Sep 5, 2020

tesuji commented Sep 5, 2020

Mark-Simulacrum commented Sep 5, 2020

SkiFire13 commented Sep 5, 2020

Mark-Simulacrum commented Sep 5, 2020

calebsander commented Sep 5, 2020

calebsander commented Sep 5, 2020 •

edited

Loading

rust-log-analyzer commented Sep 5, 2020

SkiFire13 commented Sep 5, 2020

SkiFire13 commented Sep 5, 2020

LukasKalbertodt left a comment

LukasKalbertodt left a comment

LukasKalbertodt commented Sep 20, 2020

bors commented Sep 20, 2020

bors commented Sep 21, 2020

bors commented Sep 21, 2020

Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced #75974

Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced #75974

Conversation

SkiFire13 commented Aug 27, 2020

rust-highfive commented Aug 27, 2020

LukasKalbertodt commented Aug 30, 2020

SkiFire13 commented Aug 30, 2020

LukasKalbertodt commented Sep 5, 2020

tesuji commented Sep 5, 2020

Mark-Simulacrum commented Sep 5, 2020

SkiFire13 commented Sep 5, 2020

Mark-Simulacrum commented Sep 5, 2020

calebsander commented Sep 5, 2020

calebsander commented Sep 5, 2020 • edited Loading

rust-log-analyzer commented Sep 5, 2020

SkiFire13 commented Sep 5, 2020

SkiFire13 commented Sep 5, 2020

LukasKalbertodt left a comment

Choose a reason for hiding this comment

LukasKalbertodt left a comment

Choose a reason for hiding this comment

LukasKalbertodt commented Sep 20, 2020

bors commented Sep 20, 2020

bors commented Sep 21, 2020

bors commented Sep 21, 2020

calebsander commented Sep 5, 2020 •

edited

Loading