Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Chain with Fuses #70332

Closed
wants to merge 1 commit into from
Closed

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Mar 23, 2020

This change was inspired by the proposal on the internals forum.

By implementing Chain with Fuses, it no longer has to juggle the exhaustion state of each of its iterators, greatly simplifying the implementation. However, there was some concern that nested chains could wastefully recurse into exhausted sides when FusedIterator specialization is involved, so that possibility is hidden by a simple Defuse wrapper.

r? @scottmcm

(edit: the original "Fuse with Option" part of this PR was merged in #70366.)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 23, 2020
@Centril
Copy link
Contributor

Centril commented Mar 23, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Mar 23, 2020

⌛ Trying commit da75e67f3f78b50e5a7f9845f43e389c4bdc93fc with merge 6f956ef5779fcbc05aaf93da3972715167a03291...

Copy link
Contributor

@RustyYato RustyYato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nth and nth_back can be improved by avoiding the for loop

src/libcore/iter/adapters/chain.rs Show resolved Hide resolved
src/libcore/iter/adapters/chain.rs Show resolved Hide resolved
@bors
Copy link
Contributor

bors commented Mar 23, 2020

☀️ Try build successful - checks-azure
Build commit: 6f956ef5779fcbc05aaf93da3972715167a03291 (6f956ef5779fcbc05aaf93da3972715167a03291)

@rust-timer
Copy link
Collaborator

Queued 6f956ef5779fcbc05aaf93da3972715167a03291 with parent 55299b2, future comparison URL.

@cuviper
Copy link
Member Author

cuviper commented Mar 23, 2020

@KrishnaSannasi

nth and nth_back can be improved by avoiding the for loop

I agree that's a good idea, but maybe it should be a followup PR? For the moment, I've kept mostly the same style of computation, apart from the direct simplification in state tracking.

FWIW, the internal LoopState might be a better type for your implementation. I also try to avoid closures in iterator implementations (#62429), but that one would be an easy generic function, no captured values needed.

@RustyYato
Copy link
Contributor

RustyYato commented Mar 24, 2020

@cuviper That sounds like a good idea, can I make a PR after if this one is merged?

[ed: I resolved the comments above given this direction ~ scottmcm]

@cuviper
Copy link
Member Author

cuviper commented Mar 24, 2020

Sure, it's all yours.

@cuviper
Copy link
Member Author

cuviper commented Mar 24, 2020

1188.7% change in deeply-nested-opt!?!

@scottmcm
Copy link
Member

scottmcm commented Mar 24, 2020

Oh, that benchmark is this:

    Box::new(empty()
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty()) // 10th .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty()) // 16th .chain(empty())
    )

I guess it makes sense that if this change is going to impact anything, it'd be that 😕

Looks like it's way more for LLVM to chug through,
image
turned into
image

Hmm, come to think of it, Empty<T> is a ZST, so layout optimization must be having a field day on that, and empty never returns anything, so I wonder if it started realizing that it can just remove all the code in next() for that bogus iterator...

(Looking at the bug that added that benchmark, #38528, this is hitting cost in a different part.)

@scottmcm
Copy link
Member

@cuviper I suppose this is making two changes at once. Your idea to do Fuse with Option seems clearly-good and separable from this; want to make a PR (either this one or a new one) with just that part? That might help make it easier to isolate what's going on in perf.

@cuviper
Copy link
Member Author

cuviper commented Mar 24, 2020

I was extra confused for a bit that it looks like that benchmark only tests construction, but it returns a Box<Iterator>, so it will have to codegen the entire Iterator vtable...

@joelpalmer joelpalmer added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 2, 2020
@cuviper
Copy link
Member Author

cuviper commented Apr 3, 2020

That deeply-nested perf test is evil, but I'm a little inclined to accept the loss here. The performance here is not as bad as the original issue #38528, and I found alternate cases that are much worse on the current compiler, but OK with this PR.

The original performance problem was up to Rust 1.25, where I get 11.37s for build --release. With the fix #48296 in Rust 1.26, I get 0.24s, and on 1.44.0-nightly (537ccdf3a 2020-04-02) I get 0.20s. With this Chain PR I get 1.37s -- definitely worse, but not a full regression.

However, Box<Iterator<Item = ()>> is pretty useless, and your comment about ZSTs made me think to try other Item types (still with Empty). With bool, nightly gets 0.26s and this PR gets 1.43s. With NonZeroU32, nightly gets 0.24s and this PR gets 1.40s. With u32, nightly did not finish (burned 200% CPU for 5 minutes before I killed it), yet this PR gets 1.43s.

I'll file a new issue for that -- something in the compiler needs to be fixed regardless of whether we change Chain.


The rest of the perf report still showed some slowdowns, though not as drastic, so I'll also explore if there are any other tweaks I can make here.

@cuviper cuviper changed the title Implement Chain with Fuses, and Fuse with Option Implement Chain with Fuses Apr 3, 2020
@cuviper cuviper closed this Apr 3, 2020
@cuviper cuviper deleted the fused-chain branch April 3, 2020 18:32
@cuviper cuviper restored the fused-chain branch April 3, 2020 18:32
@cuviper

This comment has been minimized.

@dtolnay dtolnay reopened this Apr 3, 2020
@cuviper

This comment has been minimized.

@cuviper
Copy link
Member Author

cuviper commented Apr 3, 2020

The original performance problem was up to Rust 1.25, where I get 11.37s for build --release. With the fix #48296 in Rust 1.26, I get 0.24s, and on 1.44.0-nightly (537ccdf3a 2020-04-02) I get 0.20s. With this Chain PR I get 1.37s -- definitely worse, but not a full regression.

When rebased onto #70750, this PR is down to 0.45s compiling deeply-nested. So the added iterator layers still slow us down, but maybe it's close enough to write off for this pathological test.

With u32, nightly did not finish (burned 200% CPU for 5 minutes before I killed it), yet this PR gets 1.43s.

See issue #70749 for the hang. With #70750, this PR gets down to 0.54s.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Apr 4, 2020
Match options directly in the Fuse implementation

Rather than using `as_ref()`, `as_mut()`, and `?`, we can use `match` directly to save a lot of generated code. This was mentioned as a possibility in rust-lang#70366 (comment), and I found that it had a very large impact on rust-lang#70332 using `Fuse` within `Chain`. Let's evaluate this change on its own first.
Centril added a commit to Centril/rust that referenced this pull request Apr 5, 2020
Match options directly in the Fuse implementation

Rather than using `as_ref()`, `as_mut()`, and `?`, we can use `match` directly to save a lot of generated code. This was mentioned as a possibility in rust-lang#70366 (comment), and I found that it had a very large impact on rust-lang#70332 using `Fuse` within `Chain`. Let's evaluate this change on its own first.
By letting `Fuse` track iterator exhaustion, the implementation of
`Chain` becomes a lot simpler. However, we don't want `FusedIterator`
specialization to keep visiting an exhausted side of the chain,
especially with nested chains, so a `Defuse` wrapper hides that
possibility.
@cuviper
Copy link
Member Author

cuviper commented Apr 6, 2020

Let's measure again now that #70750 has landed.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Apr 6, 2020

⌛ Trying commit 1e2a504 with merge 601c05538b7638fb0937bd659aab41fdfb41444d...

@bors
Copy link
Contributor

bors commented Apr 6, 2020

☀️ Try build successful - checks-azure
Build commit: 601c05538b7638fb0937bd659aab41fdfb41444d (601c05538b7638fb0937bd659aab41fdfb41444d)

@rust-timer
Copy link
Collaborator

Queued 601c05538b7638fb0937bd659aab41fdfb41444d with parent 4015890, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 601c05538b7638fb0937bd659aab41fdfb41444d, comparison URL.

@CAD97
Copy link
Contributor

CAD97 commented Apr 7, 2020

  • deeply-nested-opt: +124.7%
  • deeply-nested-debug: +28.3%
  • otherwise, close to noise (but trending slower)

@cuviper
Copy link
Member Author

cuviper commented Apr 7, 2020

I've posted an alternate in #70896 -- the code is not as pretty as the free Fuse would get us here, but it seems to perform a lot better.

Centril added a commit to Centril/rust that referenced this pull request Apr 9, 2020
Implement Chain with Option fuses

The iterators are now "fused" with `Option` so we don't need separate state to track which part is already exhausted, and we may also get niche layout for `None`. We don't use the real `Fuse` adapter because its specialization for `FusedIterator` unconditionally descends into the iterator, and that could be expensive to keep revisiting stuff like nested chains. It also hurts compiler performance to add more iterator layers to `Chain`.

This change was inspired by the [proposal](https://internals.rust-lang.org/t/proposal-implement-iter-chain-using-fuse/12006) on the internals forum. This is an alternate to rust-lang#70332, directly employing some of the same `Fuse` optimizations as rust-lang#70366 and rust-lang#70750.

r? @scottmcm
@cuviper
Copy link
Member Author

cuviper commented Apr 9, 2020

#70896 has merged, so I'm closing this.

@cuviper cuviper closed this Apr 9, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Apr 17, 2020
Hides default fns inside Fuse impl to avoid exposing it to any crate

Fixes rust-lang#70796

@cuviper I've added some default, private traits to do the job for us. If required, I can expose them to a specific visibility if you want to call these functions for rust-lang#70332

r? @cuviper
@cuviper cuviper deleted the fused-chain branch August 9, 2020 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants