Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allocate trailing uninit bits in the InitMap of CTFE Allocations #94936

Closed
wants to merge 1 commit into from

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Mar 14, 2022

This is part of a set of similar refactorings for Allocation, but I want to benchmark all parts on their own.

This PR changes the InitMask to only allocate up to the first initialized byte. Everything beyond the allocates space is considered uninitialized. Reading from it or defining it as uninitialized will not allocate.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 14, 2022
@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 14, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 14, 2022
@bors
Copy link
Contributor

bors commented Mar 14, 2022

⌛ Trying commit 21b759a with merge d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b...

@bors
Copy link
Contributor

bors commented Mar 14, 2022

☀️ Try build successful - checks-actions
Build commit: d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b (d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b)

@rust-timer
Copy link
Collaborator

Queued d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b with parent bce19cf, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b): comparison url.

Summary: This benchmark run did not return any relevant results. 5 results were found to be statistically significant but too small to be relevant.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 14, 2022
@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 14, 2022

r? @RalfJung

Copy link
Contributor

@erikdesjardins erikdesjardins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, since this doesn't seem to have resulted in any significant speed or max-rss improvements, is the increased complexity worth it? (Not sure if the description implies that you have additional changes dependent on this, or equivalent changes to other parts of the code)

}
}
} else {
if is_init {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a comment explaining the case it's handling--trailing uninit may not be allocated, so if the start block doesn't exist then it's all uninit

@@ -886,6 +913,9 @@ impl InitMask {
}
}
}
if !is_init && end_block_inclusive >= init_mask.blocks.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also deserves a comment, and perhaps ascii art about the case it's handling

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 16, 2022

Yea, I don't expect our benchmarks to excercise this much, but there are some open issues for it that I want to test against this change in combination with the equivalent one for the bytes field.

@RalfJung
Copy link
Member

This PR changes the InitMask to only allocate up to the first initialized byte.

I assume you mean last initialized byte?

let mut m = InitMask { blocks: vec![], len: Size::ZERO };
m.grow(size, state);
m.grow(size, true);
m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep the old API by using if state { ... } else { ... } -- that seems cleaner (not bothering the users with a somewhat cumbersome API)?

@RalfJung
Copy link
Member

This might be a good time to add some unit tests to InitMask?

pub fn set_range(&mut self, start: Size, end: Size, new_state: bool) {
let len = self.len;
if end > len {
if end > len && new_state {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised that we allow OOB indices at all here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think set_range_inbounds is called in situations where the index is in-bounds of the logical size of the InitMap, but OOB of its actual size due to leaving off the tail. So I think the growing logic needs to move to set_range_inbounds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you have ensure_blocks for that. But then we have two growing logics? That seems odd.

} else {
self.blocks[blocka] &= !range;
} else if let Some(block) = self.blocks.get_mut(blocka) {
*block &= !range;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment saying why it is okay to do nothing here if get_mut returns None.

@@ -673,15 +679,17 @@ impl InitMask {
for block in (blocka + 1)..blockb {
self.blocks[block] = u64::MAX;
}
} else {
} else if let Some(blocka_val) = self.blocks.get_mut(blocka) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, please add a comment saying why it is okay to do nothing here if get_mut returns None.

}
let additional_blocks = block - self.blocks.len() + 1;
self.blocks.extend(
// FIXME(oli-obk): optimize this by repeating `new_state as Block`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this FIXME. I see it just got moved around but still, it should be clarified or removed IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, basically instead of filling with uninit and then setting all of them to initialized, we can immediately fill with init.

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 28, 2022

So... this only makes sense imo if we also do the same optimization for data, but that's a lot harder. We'd have to

  1. reintroduce a size field, so we can track the actual size of the allocation
  2. rewrite all the APIs that return slices to return some type that works just like a slice, but returns 0 for everything beyond the actual data slice's size.

This seems very roundabout and fragile.

I'm not sure how to proceed from here. I don't think

can be solved this way.

We tried

previously. That one only allows differentiating between

  • all zero
  • all uninit
  • everything else

and is slightly better to handle, but not sure by how much.

@RalfJung do you think it's worthwile to pursue the trailing-zeros/uninit scheme as in this PR? Or should we start with the all-zeros/uninit scheme?

We could also look into funky schemes like making all uses of Allocation go through a trait and then using dynamic dispatch to implement the different schemes.

@RalfJung
Copy link
Member

All zero/all-uninit has the same problem re: slices though, doesn't it?

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 28, 2022

All zero/all-uninit has the same problem re: slices though, doesn't it?

yea, but it's slightly easier to handle, as you can usually just implement a default behaviour instead of having to run the regular logic to a certain point and then finish with zeros and somehow handle the edge between those two.

@RalfJung
Copy link
Member

In the end I don't have a strong opinion as long as we can keep the layers reasonably separated. It might sense to factor the init mask (and data representation) code into a separate module to ensure it is suitably isolated from the rest of the Allocation logic?

@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 12, 2022

I'll start with a new experiment, this time from just the API perspective, ignoring impl details for now

@oli-obk oli-obk closed this Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants