Use try_fold and try_rfold in default implementations of fold and rfold #106463

clarfonthey · 2023-01-04T20:49:26Z

This means that specialised versions will only need to implement the try variant and will get a specialised non-try variant for free. That is, once the Try trait and co. are stable.

This can't add any extra branches due to the use of NeverShortCircuit, but the actual perf implications are still unclear.

rustbot · 2023-01-04T20:49:33Z

r? @m-ou-se

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-01-04T20:49:36Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

the8472 · 2023-01-05T01:59:54Z

This means that specialised versions will only need to implement the try variant

Specialized code can sometimes do better in fold than try_fold because it takes ownership of self and can thus forego updating fields in self and such things.

Additionally having the default implementation go through a layer of indirection is likely going to produce more IR and make debug builds slower.

Before starting a perf run to check I suggest using NeverShortCircuit which encodes the property that it never branches in the type.

clarfonthey · 2023-01-05T16:16:29Z

Oh huh, I didn't realise that type existed. I'll use that instead.

scottmcm · 2023-01-14T23:00:07Z

I've wanted this for ages -- last I tried was #90886 -- but it was removed in the past for compiler perf reasons. (try_fold still can't be implemented on stable, so it's not a huge problem yet, though personally I consider this change as something necessary before we stabilize implementing try_fold.) With something like rustc_must_implement_one_of we'll eventually be able to write just try_fold and have a full iterator.

Let's see what perf has to say this year:
@bors try @rust-timer queue

bors · 2023-01-14T23:00:26Z

⌛ Trying commit 23aeb351d1201823fa1943b428a7774610f58496 with merge b7a648a967f63ef1ec80554b526a613a0c30663a...

scottmcm · 2023-01-14T23:03:37Z

library/core/src/iter/traits/double_ended.rs

-            accum = f(accum, x);
-        }
-        accum
+        self.try_rfold(init, |accum, item| NeverShortCircuit(f(accum, item))).0


Dunno if it'd impact perf, but there used to be a separate method for wrapping a function in NeverShortCircuit:

rust/library/core/src/ops/try_trait.rs

Line 463 in 0b2f717

pub fn wrap_mut_2<A, B>(mut f: impl FnMut(A, B) -> T) -> impl FnMut(A, B) -> Self {

This is neat, although honestly I think that it's not super worth having this since the alternative is so short. I feel like the current code looks a bit clearer than:

self.try_rfold(init, NeverShortCircuit.wrap_mut_2(f)).0

bors · 2023-01-15T01:44:25Z

☀️ Try build successful - checks-actions
Build commit: b7a648a967f63ef1ec80554b526a613a0c30663a (b7a648a967f63ef1ec80554b526a613a0c30663a)

rust-timer · 2023-01-15T03:04:07Z

Finished benchmarking commit (b7a648a967f63ef1ec80554b526a613a0c30663a): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.2%, 1.2%]	20
Regressions ❌ (secondary)	1.3%	[0.4%, 2.2%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[0.2%, 1.2%]	20

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[0.6%, 5.7%]	7
Regressions ❌ (secondary)	2.2%	[2.2%, 2.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.8%	[0.6%, 5.7%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[1.2%, 1.2%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.2%	[1.2%, 1.2%]	1

clarfonthey · 2023-01-17T23:40:45Z

I merged in the latest master and chose to move the function inside the closure like the function you linked, @scottmcm. I dunno if folks are willing to just lend the resources of the benchmark builder to mess around with this a bit more, but right now I feel like the main choices are:

Accept the perf penalty (I have a feeling many would object)
Lend the resources to mess with the definition a bit. (I'm fine with messing with the code, but only if folks are comfortable with using the build resources. I also don't know of a good way to check the actual outputs the builds are comparing, but would be willing to investigate that too.)
Just close this and wait until later.

Mostly @ ing you even though you're not on the libs team since you were enthusiastic about this, so feel free to defer to someone else if you think they should make the call instead.

scottmcm · 2023-01-18T07:46:49Z

Dunno if it matters, but it's perfectly reasonable to find out:
@bors try @rust-timer queue

bors · 2023-01-18T07:46:58Z

⌛ Trying commit a444e7073fee53a59e5aa7fafbb43ab656b7aa34 with merge ca8c3e728f0ecf32ede2208caa30a6cd8013abe0...

bors · 2023-01-18T09:55:00Z

☀️ Try build successful - checks-actions
Build commit: ca8c3e728f0ecf32ede2208caa30a6cd8013abe0 (ca8c3e728f0ecf32ede2208caa30a6cd8013abe0)

rust-timer · 2023-01-18T11:14:37Z

Finished benchmarking commit (ca8c3e728f0ecf32ede2208caa30a6cd8013abe0): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.2%, 1.2%]	20
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.3%	[-1.3%, -1.3%]	1
All ❌✅ (primary)	0.6%	[0.2%, 1.2%]	20

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.0%	[0.6%, 9.4%]	6
Regressions ❌ (secondary)	4.8%	[4.8%, 4.8%]	1
Improvements ✅ (primary)	-2.3%	[-2.3%, -2.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.3%	[-2.3%, 9.4%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[1.0%, 1.5%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.2%	[1.0%, 1.5%]	6

clarfonthey · 2023-01-19T15:55:14Z

So, it does seem to actually make a difference switching to move, but it looks like the main regression is still there, which is in the syn crate. Might poke around a bit to see what exactly is causing that.

clarfonthey · 2023-01-19T16:06:12Z

Okay, looking deeper into the actual benchmarks that regressed, the places I can find folds being used in those crates are mostly very simple ones-- stuff like summing up an iterator, concatenating strings, quote!(#x + #y), etc.

My gut feeling is that we're not there yet with the current Try work, at least enough to ensure there's no performance penalty here, but if someone more clever wants to step in and object to my findings, feel free.

scottmcm · 2023-03-22T03:36:59Z

Let's see if anything's changed:
@bors try @rust-timer queue

bors · 2023-03-22T03:37:08Z

⌛ Trying commit 2514d02 with merge 17f07ebd5a676f394244d27d433376451f14ab97...

bors · 2023-03-22T05:34:41Z

☀️ Try build successful - checks-actions
Build commit: 17f07ebd5a676f394244d27d433376451f14ab97 (17f07ebd5a676f394244d27d433376451f14ab97)

rust-timer · 2023-03-22T11:56:49Z

Finished benchmarking commit (17f07ebd5a676f394244d27d433376451f14ab97): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.4%, 1.4%]	13
Regressions ❌ (secondary)	0.8%	[0.8%, 0.8%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.9%	[0.4%, 1.4%]	13

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.4%	[0.8%, 4.0%]	2
Regressions ❌ (secondary)	0.6%	[0.4%, 0.8%]	2
Improvements ✅ (primary)	-3.0%	[-3.7%, -2.3%]	2
Improvements ✅ (secondary)	-1.6%	[-1.6%, -1.6%]	1
All ❌✅ (primary)	-0.3%	[-3.7%, 4.0%]	4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.9%, 1.1%]	3
Regressions ❌ (secondary)	0.5%	[0.5%, 0.6%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.5%	[-0.5%, -0.5%]	1
All ❌✅ (primary)	1.0%	[0.9%, 1.1%]	3

clarfonthey · 2023-03-23T04:07:23Z

Yeah, this is definitely getting worse. I'm going to just close this for now and I guess this can be a future improvement further down the line.

rustbot assigned m-ou-se Jan 4, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 4, 2023