Make `fold` standalone. #72139

nnethercote · 2020-05-12T12:37:10Z

fold is currently implemented via try_fold, but implementing it
directly results in slightly less LLVM IR being generated, speeding up
compilation of some benchmarks.

r? @cuviper

nnethercote · 2020-05-12T12:46:01Z

Lots of iterator functions (e.g. any, all, find, find_map, position) are implemented in terms of try_fold, and fold is currently done that way as well. As discussed in #64572, this is good for run-time performance because lots of iterators implement try_fold. But lots of iterators also implement fold, so it seems fold can be implemented more directly, as this PR does.

If necessary, I could also add fold implementations to all the iterators that don't implement it but do implement try_fold. (Cycle, StepBy, TakeWhile, etc.)

More generally, try_fold shows up as the single biggest source of LLVM IR generation, especially for clap-rs. There are many instantiations, from all, any, find, etc. And each of those actually has four instantiations per occurrence, in the following size order:

core::iter::traits::iterator::Iterator::try_fold
core::iter::traits::iterator::Iterator::any::check::{{closure}}
core::iter::traits::iterator::Iterator::any
core::iter::traits::iterator::Iterator::any::check

It would be nice to make those smaller, but I'm struggling to see how to do it while still using try_fold.

cc @bluss

nnethercote · 2020-05-12T12:46:10Z

@bors try @rust-timer queue

rust-timer · 2020-05-12T12:46:12Z

Awaiting bors try build completion

bors · 2020-05-12T12:46:21Z

⌛ Trying commit 4f8acfc491360a39675b1beadcfa977f5b780b9d with merge 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11...

src/libcore/iter/traits/iterator.rs

andjo403 · 2020-05-12T15:02:21Z

do not know if it is possible to do this change as it is documented to use try_fold

RalfJung · 2020-05-12T15:29:17Z

@andjo403 it says "Most of the other (forward) methods have default implementations in terms of this one". That's not actually saying that any particular method indeed uses try_fold.

But if we go with this PR, a similar remark should likely be added to fold (and/or to the trait root so it is not lost in one of more than a dozen method docs).

bors · 2020-05-12T15:50:55Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11 (84327c568fbc9f25d682e9c2f0a3b8a0971d1c11)

rust-timer · 2020-05-12T15:50:57Z

Queued 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11 with parent 09c817e, future comparison URL.

bluss · 2020-05-12T17:03:48Z

The vision is that try_fold is the "universal" Iterator method that all traversals can be implemented in terms of. In practice we have these extra functions and enums that cause the code size to grow.

"Fortunately" try_fold is unstable to implement, because of the Try trait. The vision thus hasn't even fully arrived to stable Rust.

To me seems like it would be acceptable to break the link between fold and try_fold; then we have two roots of traversals - fold for whole traversal and try_fold for short-circuiting traversal. I'd love to have that documented in that case, but maybe we don't want to commit to it.

As a practical note, it is much easier for users to implement fold correctly rather than try_fold, exactly because try_fold should stop short and leave the iterator resumable, so there are more pitfalls in the implementation.

The methods any, all, find, they are short-circuiting so try_fold is their natural root implementation, and that still needs a different solution. As written on the linked issue, what would happen if we special case any/all/find on the slice iterator, that could save a lot?

Could there be some wild specialization marker that we could use on collection iterators in general to opt them out of using try_fold? That sounds like a pretty crazy idea that probably doesn't work.

fn any(&mut self, f: impl FnMut(Self::Item) -> bool) -> bool {
    IteratorSpecializations::any(self, f)
}

/* Default IteratorSpecializations::any uses try_fold */
/* libcore/alloc iterators marked as non-segmented use simple loop? */

(I don't know how to do this specialization without the unstable specialization marker traits, but we can certainly demonstrate here the difference in IR generated for the slice iterator in the two implementations - playground link)

Reading the debug IR isn't so fun. No function is really innocent. Even raw pointer methods like is_null, offset or NonNull::new_unchecked are verbose.

cuviper · 2020-05-12T17:18:35Z

I agree, I think it's OK to make this change specifically because Try is still unstable. If the performance run shows benefit, which I expect it will, I'm all for it.

Please give the same treatment to rfold.

If necessary, I could also add fold implementations to all the iterators that don't implement it but do implement try_fold. (Cycle, StepBy, TakeWhile, etc.)

We should audit those that currently have only try_fold, but fold doesn't always make sense.

Cycle is infinite, so it doesn't really make sense to fold it. I guess the implementation can loop on folding the base iterator, which may help its performance, but you probably don't want to be here in the first place.
StepBy seems ok to fold -- it just optimizes the first_take check.
For things like TakeWhile that need to exit early, we'd probably just want fold to make sure it does still go to try_fold of the inner iterator.

rust-timer · 2020-05-12T18:57:08Z

Finished benchmarking try commit 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11, comparison URL.

cuviper · 2020-05-12T19:15:41Z

Perf looks like a solid win, except for ripgrep-opt incr-patched: println at +6.7%.

nnethercote · 2020-05-12T20:53:09Z

In case it's useful, here is the full list of libstd iterators that implement try_fold but don't implement fold: Cycle, StepBy, TakeWhile, MapWhile, Take, Scan, ResultShunt, RangeInclusive.

An alternative option to this PR is to go in the other direction: remove all the fold implementations and let them fall back to the default, which calls on to try_fold. That would make fold like any, all, etc. This would reduce the amount of code in libstd at the cost of potentially increased compile times.

As written on the linked issue, what would happen if we special case any/all/find on the slice iterator, that could save a lot?

Could there be some wild specialization marker that we could use on collection iterators in general to opt them out of using try_fold?

These are both interesting ideas. I will investigate them. Thanks!

cuviper · 2020-05-12T21:08:46Z

In case it's useful, here is the full list of libstd iterators that implement try_fold but don't implement fold: Cycle, StepBy, TakeWhile, MapWhile, Take, Scan, ResultShunt, RangeInclusive.

RangeInclusive may be a good one for fold. The others which terminate early should probably keep forwarding fold to their own try_fold -- might want a helper fn for that, basically preserving the forwarding code that you're changing here.

An alternative option to this PR is to go in the other direction: remove all the fold implementations and let them fall back to the default, which calls on to try_fold. That would make fold like any, all, etc. This would reduce the amount of code in libstd at the cost of potentially increased compile times.

I think that would be actively harmful for adapters, since users can't implement try_fold yet. We should still try to call a user's fold where it makes sense.

nnethercote · 2020-05-12T21:39:51Z

I just analyzed Iterator. Here is a summary.

Summary of Iterator methods
- '*' indicates important methods (as judged by me via profiling, see below)
- 'XXX' indicates possible changes

// No default implementation
* next

// Default implementation that doesn't rely on anything else
* size_hint
- nth
- by_ref

// Default implementation uses next
* try_fold
- cmp_by
- partial_cmp_by
- eq_by
- is_sorted_by

// Default implementation uses fold
* for_each
- count
- last
- fold_first

// Default implementation uses try_fold
* fold (XXX: should be implemented directly?)
* position -> try
* any (could use try_for_each)
* find (could use try_for_each)
- all (could use try_for_each)
- find_map (could use try_for_each)
- try_for_each

// Default implementation uses try_rfold
- rposition -> try_rfold

// Default implementation uses for_each (XXX: should use fold?)
- partition -> for_each
- unzip -> for_each

// Default implementation uses try_for_each (XXX: should use try_fold?)
- try_find

// Other
- partition_in_place -> find + rfind
- is_partitioned -> all + any
- max -> max_by
- min -> min_by
- max_by_key -> map + max_by
- min_by_key -> map + min_by
- max_by -> fold_first + fold
- min_by -> fold_first + fold
- cmp -> cmp_by
- partial_cmp -> partial_cmp_by
- eq -> eq_by
- ne -> eq
- lt -> partial_cmp
- le -> partial_cmp
- gt -> partial_cmp
- ge -> partial_cmp
- is_sorted -> is_sorted_by
- is_sorted_by_key -> map

// Uses an auxiliary adapter (some of those use fold, etc.)
- step_by -> StepBy
- chain -> Chain
- zip -> Zip
- map -> Map
- filter -> Filter
- filter_map -> FilterMap
- enumerate -> Enumerate
- peekable -> Peekable
- skip_while -> SkipWhile
- take_while -> TakeWhile
- map_while -> MapWhile
- skip -> Skip
- take -> Take
- scan -> Scan
- flat_map -> FlatMap
- flatten -> Flatten
- fuse -> Fuse
- inspect -> Inspect
- collect -> FromIterator
- rev -> Rev
- copied -> Copied
- cloned -> Cloned
- cycle -> Cycle
- sum -> Sum
- product -> Product

I also did some rough profiling of the rustc-perf benchmarks. The following shows the number of lines of LLVM IR associated with some of the methods, which gives an of their relative importances in relation to compile times.

749102 TOTAL (for all Iterator methods)
254224 try_fold
121461 next
83047 fold
45494 find
42836 size_hint
34061 any
25876 for_each
8578 position
4887 all
4554 find_map
3832 try_rfold
96 rfold
0 try_for_each

bluss · 2020-05-12T21:50:49Z

It can be fraught to change which method is called in default implementation. Not that there is any problem in the current PR with that, but if we begin talking about switching around default implementations, it's a factor to think about.

For example: If we would change any's default to use all, then that creates an infinite loop in those user iterators that for some reason implement all by calling any(!).

nnethercote · 2020-05-13T08:00:24Z

what would happen if we special case any/all/find on the slice iterator, that could save a lot?

I've done this in #72166, it's simple and gets a lot of the potential benefit. I will return to fold and this PR tomorrow.

nnethercote · 2020-05-14T04:44:49Z

I have updated the code. I've given rfold the same treatment, added a comment to fold, and added missing fold implementations to iterators that have a try_fold.

nnethercote · 2020-05-14T05:16:47Z

I did some local measurements and couldn't replicate the ripgrep regression. Maybe it was just a one-off. If this gets r+ I will do another perf run.

andjo403 · 2020-05-14T06:28:03Z

have you tested the perf together with #72166 is there any changes with this if that PR is merged?

nnethercote · 2020-05-14T07:50:29Z

#72166 is mostly orthogonal to this.

cuviper

Looks good, but there are a few that also need rfold now that we've changed that. I noted those where you added fold, and Skip is one more.

src/libcore/iter/range.rs

src/libcore/iter/adapters/mod.rs

`fold` is currently implemented via `try_fold`, but implementing it directly results in slightly less LLVM IR being generated, speeding up compilation of some benchmarks. (And likewise for `rfold`.) The commit adds `fold` implementations to all the iterators that lack one but do have a `try_fold` implementation. Most of these just call the `try_fold` implementation directly.

Many default iterator methods use `try_fold` or `fold`, and these ones can too.

nnethercote · 2020-05-18T04:46:44Z

@cuviper: I have added the requested rfolds.

rust-highfive · 2020-05-18T04:52:53Z

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

##[section]Starting: Linux mingw-check
##[section]Starting: Initialize job
Agent name: 'Azure Pipelines 3'
Agent machine name: 'fv-az578'
Current agent version: '2.168.2'
##[group]Operating System
16.04.6
LTS
LTS
##[endgroup]
##[group]Virtual Environment
Environment: ubuntu-16.04
Version: 20200430.2
Included Software: https://github.com/actions/virtual-environments/blob/ubuntu16/20200430.2/images/linux/Ubuntu1604-README.md
##[endgroup]
Agent running as: 'vsts'
Prepare build directory.
Set build variables.
Download all required tasks.
Download all required tasks.
Downloading task: Bash (3.163.2)
Checking job knob settings.
   Knob: AgentToolsDirectory = /opt/hostedtoolcache Source: ${AGENT_TOOLSDIRECTORY} 
   Knob: AgentPerflog = /home/vsts/perflog Source: ${VSTS_AGENT_PERFLOG} 
Start tracking orphan processes.
##[section]Finishing: Initialize job
##[section]Starting: Configure Job Name
==============================================================================
---
========================== Starting Command Output ===========================
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/aa52f2bd-985c-436f-8b6e-ff835d042696.sh

##[section]Finishing: Disable git automatic line ending conversion
##[section]Starting: Checkout rust-lang/rust@refs/pull/72139/merge to s
Task         : Get sources
Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
Version      : 1.0.0
Author       : Microsoft
---
##[command]git remote add origin https://github.com/rust-lang/rust
##[command]git config gc.auto 0
##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
##[command]git config --get-all http.proxy
##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/72139/merge:refs/remotes/pull/72139/merge
---
 ---> 3adb0605cc65
Step 6/7 : ENV RUN_CHECK_WITH_PARALLEL_QUERIES 1
 ---> Using cache
 ---> 28dbc326cb7f
Step 7/7 : ENV SCRIPT python3 ../x.py test src/tools/expand-yaml-anchors &&            python3 ../x.py check --target=i686-pc-windows-gnu --host=i686-pc-windows-gnu &&            python3 ../x.py build --stage 0 src/tools/build-manifest &&            python3 ../x.py test --stage 0 src/tools/compiletest &&            python3 ../x.py test src/tools/tidy &&            python3 ../x.py doc --stage 0 src/libstd &&            /scripts/validate-toolstate.sh
 ---> 537a01811900
Successfully built 537a01811900
Successfully tagged rust-ci:latest
Built container sha256:537a018119009dc218456238dec90b5530050db1e2a1e166550c218003f6159d
---
   Compiling autocfg v0.1.7
error: variable does not need to be mutable
   --> src/libcore/iter/range.rs:765:39
    |
765 |     fn rfold<B, F>(mut self, init: B, mut f: F) -> B
    |                                       |
    |                                       help: remove this `mut`
    |
    = note: `-D unused-mut` implied by `-D warnings`
---
  local time: Mon May 18 04:52:47 UTC 2020
  network time: Mon, 18 May 2020 04:52:47 GMT
== end clock drift check ==

##[error]Bash exited with code '1'.
##[section]Finishing: Run build
##[section]Starting: Checkout rust-lang/rust@refs/pull/72139/merge to s
Task         : Get sources
Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
Version      : 1.0.0
Author       : Microsoft
Author       : Microsoft
Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
==============================================================================
Cleaning any cached credential from repository: rust-lang/rust (GitHub)
##[section]Finishing: Checkout rust-lang/rust@refs/pull/72139/merge to s
Cleaning up task key
Start cleaning up orphan processes.
Terminate orphan process: pid (4790) (python)
##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

src/libcore/iter/traits/iterator.rs

cuviper · 2020-05-20T00:23:35Z

@bors r+

bors · 2020-05-20T00:23:37Z

📌 Commit 959bd48 has been approved by cuviper

bors · 2020-05-20T13:09:13Z

⌛ Testing commit 959bd48 with merge 1a814c344dd1588457199109c0722ff66765a295...

bors · 2020-05-20T13:14:47Z

💔 Test failed - checks-azure

RalfJung · 2020-05-20T13:32:13Z

msys2 failure
@bors retry

@ghost

Rollup of 6 pull requests Successful merges: - rust-lang#71863 (Suggest fixes and add error recovery for `use foo::self`) - rust-lang#72139 (Make `fold` standalone.) - rust-lang#72275 (Continue lowering for unsupported async generator instead of returning an error.) - rust-lang#72361 (split_inclusive: add tracking issue number (72360)) - rust-lang#72364 (Remove unused dependencies) - rust-lang#72366 (Adjust the zero check in `RawVec::grow`.) Failed merges: r? @ghost

nnethercote · 2020-05-20T22:17:10Z

@bors rollup=never

Because this affects performance.

nnethercote · 2020-05-21T05:53:18Z

How did this land when some of the tests failed?

RalfJung · 2020-05-21T06:31:21Z

It landed as part of a rollup that was created before you did rollup=never: #72378. All tests passed there.

rust-highfive assigned cuviper May 12, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 12, 2020

RalfJung reviewed May 12, 2020

View reviewed changes

src/libcore/iter/traits/iterator.rs Show resolved Hide resolved

nnethercote force-pushed the standalone-fold branch from 4f8acfc to 0e4005d Compare May 14, 2020 04:44

nnethercote mentioned this pull request May 15, 2020

Provide a tool to see what user code is causing rustc to use lots of time rust-lang/measureme#51

Open

cuviper reviewed May 16, 2020

View reviewed changes

src/libcore/iter/range.rs Show resolved Hide resolved

src/libcore/iter/adapters/mod.rs Show resolved Hide resolved

src/libcore/iter/adapters/mod.rs Show resolved Hide resolved

nnethercote added 2 commits May 18, 2020 05:41

Tweak partition, unzip, try_find.

c2abf8f

Many default iterator methods use `try_fold` or `fold`, and these ones can too.

nnethercote force-pushed the standalone-fold branch from 0e4005d to b18651d Compare May 18, 2020 04:46

Add some more rfold implementations.

959bd48

nnethercote force-pushed the standalone-fold branch from b18651d to 959bd48 Compare May 18, 2020 06:51

bluss reviewed May 19, 2020

View reviewed changes

src/libcore/iter/traits/iterator.rs Show resolved Hide resolved

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 20, 2020

Dylan-DPC-zz mentioned this pull request May 20, 2020

Rollup of 6 pull requests #72378

Merged

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 20, 2020

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 20, 2020

bors merged commit 5c52f9f into rust-lang:master May 20, 2020

nnethercote deleted the standalone-fold branch May 21, 2020 01:47

nnethercote mentioned this pull request Jun 3, 2020

Investigate possibly-pathological amounts of LLVM IR with cargo-llvm-lines servo/servo#26713

Open

the8472 mentioned this pull request Sep 5, 2020

Add Iterator::intersperse #75784

Closed

Make fold standalone. #72139

Make fold standalone. #72139

Uh oh!

Conversation

nnethercote commented May 12, 2020

Uh oh!

nnethercote commented May 12, 2020

Uh oh!

nnethercote commented May 12, 2020

Uh oh!

rust-timer commented May 12, 2020

Uh oh!

bors commented May 12, 2020

Uh oh!

Uh oh!

andjo403 commented May 12, 2020

Uh oh!

RalfJung commented May 12, 2020

Uh oh!

bors commented May 12, 2020

Uh oh!

rust-timer commented May 12, 2020

Uh oh!

bluss commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cuviper commented May 12, 2020

Uh oh!

rust-timer commented May 12, 2020

Uh oh!

cuviper commented May 12, 2020

Uh oh!

nnethercote commented May 12, 2020

Uh oh!

cuviper commented May 12, 2020

Uh oh!

nnethercote commented May 12, 2020

Uh oh!

bluss commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented May 13, 2020

Uh oh!

nnethercote commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented May 14, 2020

Uh oh!

andjo403 commented May 14, 2020

Uh oh!

nnethercote commented May 14, 2020

Uh oh!

cuviper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnethercote commented May 18, 2020

Uh oh!

rust-highfive commented May 18, 2020

Uh oh!

Uh oh!

cuviper commented May 20, 2020

Uh oh!

bors commented May 20, 2020

Uh oh!

bors commented May 20, 2020

Uh oh!

bors commented May 20, 2020

Uh oh!

RalfJung commented May 20, 2020

Uh oh!

nnethercote commented May 20, 2020

Uh oh!

nnethercote commented May 21, 2020

Uh oh!

RalfJung commented May 21, 2020

Uh oh!

Uh oh!

Make `fold` standalone. #72139

Make `fold` standalone. #72139

bluss commented May 12, 2020 •

edited

Loading

bluss commented May 12, 2020 •

edited

Loading

nnethercote commented May 14, 2020 •

edited

Loading