Merge into larger interval set #120024

Mark-Simulacrum · 2024-01-16T15:54:51Z

This reduces the work done while merging rows. In at least one case (#50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over.

This cuts the runtime of the test case in #50450 from ~26 seconds to ~6 seconds locally, though it doesn't change the memory usage peak (~9.5GB).

This reduces the work done while merging rows. In at least one case (issue 50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over.

rustbot · 2024-01-16T15:54:59Z

r? @cjgillot

(rustbot has picked a reviewer for you, use r? to override)

Mark-Simulacrum · 2024-01-16T15:55:02Z

@bors try @rust-timer queue

bors · 2024-01-16T15:56:14Z

⌛ Trying commit 1696148 with merge 204a1d9...

…<try> Merge into larger interval set This reduces the work done while merging rows. In at least one case (rust-lang#50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over. This cuts the runtime of the test case in rust-lang#50450 from ~26 seconds to ~6 seconds locally, though it doesn't change the memory usage peak (~9.5GB).

bors · 2024-01-16T17:22:41Z

☀️ Try build successful - checks-actions
Build commit: 204a1d9 (204a1d92966a0e9ee8d99c8830806075c0a3abfe)

rust-timer · 2024-01-16T21:29:03Z

Finished benchmarking commit (204a1d9): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.5%]	17
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.2%, 0.5%]	17

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.5%	[1.4%, 1.5%]	4
Improvements ✅ (primary)	-2.5%	[-2.7%, -2.4%]	2
Improvements ✅ (secondary)	-1.7%	[-1.7%, -1.7%]	1
All ❌✅ (primary)	-2.5%	[-2.7%, -2.4%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 664.25s -> 665.588s (0.20%)
Artifact size: 308.27 MiB -> 308.31 MiB (0.01%)

Mark-Simulacrum · 2024-01-16T21:32:15Z

No strong opinion on whether these results merit doing this or not. In some cases (see PR description) this is a large win, but it's likely that the extra check is somewhat expensive for less clear-cut cases. We could try to tune (e.g., only do this if the difference is >300 elements or something) but I'm not convinced that's warranted.

The new branch costs instructions but likely is well-predicted by CPUs, so we're probably not actually regressing in the common case (or at least not significantly).

cjgillot · 2024-01-17T00:46:48Z

2 questions:

could 'other' be taken by value, to avoid cloning ?
would we gain with another algorithm that iterates over both sets at once.

For the second, I mean an algorithm close to the merge of sorted lists: take the lowest interval from both sets, expand it as much as necessary by consuming intervals, and repeat.

Mark-Simulacrum · 2024-01-17T02:30:14Z

At least a few call sites can't provide other by value without cloning, which would reduce to basically this same implementation (just more spread out).

I think the algorithm you suggest is possible, but I'm not sure it would be much of a win. The common case for interval sets is that we have ~1-5 intervals, since they're primarily used for representing liveness (I think? Or presence?) ranges (which are usually not that disjoint). Some code can be pathological though where a variable is live in thousands of discontingous intervals - which we seem to merge with a single "self" interval. For that case any complex algorithm seems unlikely to be better - we'll need a bunch of extra logic but in the end still end up either worse off or equal (basically the ideal is a binary search + Vec::splice, but it's almost what we have here).

My sense is that this case is sufficiently rare that the extra logic isn't warranted, while this simple delta perhaps is.

cjgillot · 2024-01-27T20:15:00Z

Fair enough.
@bors r+

bors · 2024-01-27T20:15:03Z

📌 Commit 1696148 has been approved by cjgillot

It is now in the queue for this repository.

bors · 2024-01-27T22:26:40Z

⌛ Testing commit 1696148 with merge 6351247...

bors · 2024-01-28T00:28:19Z

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing 6351247 to master...

rust-timer · 2024-01-28T01:44:22Z

Finished benchmarking commit (6351247): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.4%]	14
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.2%]	2
All ❌✅ (primary)	0.3%	[0.2%, 0.4%]	14

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.8%	[3.1%, 4.7%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.9%	[-2.3%, -1.4%]	2
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.2%	[2.5%, 3.8%]	9
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 659.977s -> 662.476s (0.38%)
Artifact size: 308.14 MiB -> 308.14 MiB (-0.00%)

rylev · 2024-01-30T15:04:30Z

Given that the results here mirror the pre-merge perf run results fairly closely, I think it's fair to take the review as justification that this is worth the cost to protect against the extreme case.

@rustbot label: +perf-regression-triaged

rustbot assigned cjgillot Jan 16, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 16, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 16, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 16, 2024

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 27, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 28, 2024

bors merged commit 6351247 into rust-lang:master Jan 28, 2024

rustbot added this to the 1.77.0 milestone Jan 28, 2024

rustbot added the perf-regression-triaged The performance regression has been triaged. label Jan 30, 2024

Merge into larger interval set #120024

Merge into larger interval set #120024

Uh oh!

Conversation

Mark-Simulacrum commented Jan 16, 2024

Uh oh!

rustbot commented Jan 16, 2024

Uh oh!

Mark-Simulacrum commented Jan 16, 2024

Uh oh!

This comment has been minimized.

bors commented Jan 16, 2024

Uh oh!

bors commented Jan 16, 2024

Uh oh!

This comment has been minimized.

rust-timer commented Jan 16, 2024

Overall result: ❌ regressions - ACTION NEEDED

Uh oh!

Mark-Simulacrum commented Jan 16, 2024

Uh oh!

cjgillot commented Jan 17, 2024

Uh oh!

Mark-Simulacrum commented Jan 17, 2024

Uh oh!

cjgillot commented Jan 27, 2024

Uh oh!

bors commented Jan 27, 2024

Uh oh!

bors commented Jan 27, 2024

Uh oh!

bors commented Jan 28, 2024

Uh oh!

rust-timer commented Jan 28, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

rylev commented Jan 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants