Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge into larger interval set #120024

Merged
merged 1 commit into from
Jan 28, 2024
Merged

Conversation

Mark-Simulacrum
Copy link
Member

This reduces the work done while merging rows. In at least one case (#50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over.

This cuts the runtime of the test case in #50450 from ~26 seconds to ~6 seconds locally, though it doesn't change the memory usage peak (~9.5GB).

This reduces the work done while merging rows. In at least one case
(issue 50450), we have thousands of union([range], [20,000 ranges]),
which previously inserted each of the 20,000 ranges one by one. Now we
only insert one range into the right hand set after copying the set
over.
@rustbot
Copy link
Collaborator

rustbot commented Jan 16, 2024

r? @cjgillot

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 16, 2024
@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 16, 2024
@bors
Copy link
Contributor

bors commented Jan 16, 2024

⌛ Trying commit 1696148 with merge 204a1d9...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 16, 2024
…<try>

Merge into larger interval set

This reduces the work done while merging rows. In at least one case (rust-lang#50450), we have thousands of union([range], [20,000 ranges]), which previously inserted each of the 20,000 ranges one by one. Now we only insert one range into the right hand set after copying the set over.

This cuts the runtime of the test case in rust-lang#50450 from ~26 seconds to ~6 seconds locally, though it doesn't change the memory usage peak (~9.5GB).
@bors
Copy link
Contributor

bors commented Jan 16, 2024

☀️ Try build successful - checks-actions
Build commit: 204a1d9 (204a1d92966a0e9ee8d99c8830806075c0a3abfe)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (204a1d9): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.5%] 17
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.2%, 0.5%] 17

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.5% [1.4%, 1.5%] 4
Improvements ✅
(primary)
-2.5% [-2.7%, -2.4%] 2
Improvements ✅
(secondary)
-1.7% [-1.7%, -1.7%] 1
All ❌✅ (primary) -2.5% [-2.7%, -2.4%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 664.25s -> 665.588s (0.20%)
Artifact size: 308.27 MiB -> 308.31 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 16, 2024
@Mark-Simulacrum
Copy link
Member Author

No strong opinion on whether these results merit doing this or not. In some cases (see PR description) this is a large win, but it's likely that the extra check is somewhat expensive for less clear-cut cases. We could try to tune (e.g., only do this if the difference is >300 elements or something) but I'm not convinced that's warranted.

The new branch costs instructions but likely is well-predicted by CPUs, so we're probably not actually regressing in the common case (or at least not significantly).

@cjgillot
Copy link
Contributor

2 questions:

  • could 'other' be taken by value, to avoid cloning ?
  • would we gain with another algorithm that iterates over both sets at once.

For the second, I mean an algorithm close to the merge of sorted lists: take the lowest interval from both sets, expand it as much as necessary by consuming intervals, and repeat.

@Mark-Simulacrum
Copy link
Member Author

At least a few call sites can't provide other by value without cloning, which would reduce to basically this same implementation (just more spread out).

I think the algorithm you suggest is possible, but I'm not sure it would be much of a win. The common case for interval sets is that we have ~1-5 intervals, since they're primarily used for representing liveness (I think? Or presence?) ranges (which are usually not that disjoint). Some code can be pathological though where a variable is live in thousands of discontingous intervals - which we seem to merge with a single "self" interval. For that case any complex algorithm seems unlikely to be better - we'll need a bunch of extra logic but in the end still end up either worse off or equal (basically the ideal is a binary search + Vec::splice, but it's almost what we have here).

My sense is that this case is sufficiently rare that the extra logic isn't warranted, while this simple delta perhaps is.

@cjgillot
Copy link
Contributor

Fair enough.
@bors r+

@bors
Copy link
Contributor

bors commented Jan 27, 2024

📌 Commit 1696148 has been approved by cjgillot

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 27, 2024
@bors
Copy link
Contributor

bors commented Jan 27, 2024

⌛ Testing commit 1696148 with merge 6351247...

@bors
Copy link
Contributor

bors commented Jan 28, 2024

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing 6351247 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 28, 2024
@bors bors merged commit 6351247 into rust-lang:master Jan 28, 2024
12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 28, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6351247): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.4%] 14
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.2%] 2
All ❌✅ (primary) 0.3% [0.2%, 0.4%] 14

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [3.1%, 4.7%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.9% [-2.3%, -1.4%] 2
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.2% [2.5%, 3.8%] 9
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 659.977s -> 662.476s (0.38%)
Artifact size: 308.14 MiB -> 308.14 MiB (-0.00%)

@rylev
Copy link
Member

rylev commented Jan 30, 2024

Given that the results here mirror the pre-merge perf run results fairly closely, I think it's fair to take the review as justification that this is worth the cost to protect against the extreme case.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants