-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't allocate trailing uninit bits in the InitMap
of CTFE Allocation
s
#94936
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 21b759a with merge d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b... |
☀️ Try build successful - checks-actions |
Queued d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b with parent bce19cf, future comparison URL. |
Finished benchmarking commit (d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b): comparison url. Summary: This benchmark run did not return any relevant results. 5 results were found to be statistically significant but too small to be relevant. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
r? @RalfJung |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, since this doesn't seem to have resulted in any significant speed or max-rss improvements, is the increased complexity worth it? (Not sure if the description implies that you have additional changes dependent on this, or equivalent changes to other parts of the code)
} | ||
} | ||
} else { | ||
if is_init { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deserves a comment explaining the case it's handling--trailing uninit may not be allocated, so if the start block doesn't exist then it's all uninit
@@ -886,6 +913,9 @@ impl InitMask { | |||
} | |||
} | |||
} | |||
if !is_init && end_block_inclusive >= init_mask.blocks.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also deserves a comment, and perhaps ascii art about the case it's handling
Yea, I don't expect our benchmarks to excercise this much, but there are some open issues for it that I want to test against this change in combination with the equivalent one for the bytes field. |
I assume you mean last initialized byte? |
let mut m = InitMask { blocks: vec![], len: Size::ZERO }; | ||
m.grow(size, state); | ||
m.grow(size, true); | ||
m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could keep the old API by using if state { ... } else { ... }
-- that seems cleaner (not bothering the users with a somewhat cumbersome API)?
This might be a good time to add some unit tests to |
pub fn set_range(&mut self, start: Size, end: Size, new_state: bool) { | ||
let len = self.len; | ||
if end > len { | ||
if end > len && new_state { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised that we allow OOB indices at all here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think set_range_inbounds is called in situations where the index is in-bounds of the logical size of the InitMap, but OOB of its actual size due to leaving off the tail. So I think the growing logic needs to move to set_range_inbounds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you have ensure_blocks
for that. But then we have two growing logics? That seems odd.
} else { | ||
self.blocks[blocka] &= !range; | ||
} else if let Some(block) = self.blocks.get_mut(blocka) { | ||
*block &= !range; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment saying why it is okay to do nothing here if get_mut
returns None
.
@@ -673,15 +679,17 @@ impl InitMask { | |||
for block in (blocka + 1)..blockb { | |||
self.blocks[block] = u64::MAX; | |||
} | |||
} else { | |||
} else if let Some(blocka_val) = self.blocks.get_mut(blocka) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, please add a comment saying why it is okay to do nothing here if get_mut
returns None
.
} | ||
let additional_blocks = block - self.blocks.len() + 1; | ||
self.blocks.extend( | ||
// FIXME(oli-obk): optimize this by repeating `new_state as Block`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this FIXME. I see it just got moved around but still, it should be clarified or removed IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, basically instead of filling with uninit and then setting all of them to initialized, we can immediately fill with init.
So... this only makes sense imo if we also do the same optimization for
This seems very roundabout and fragile. I'm not sure how to proceed from here. I don't think can be solved this way. We tried previously. That one only allows differentiating between
and is slightly better to handle, but not sure by how much. @RalfJung do you think it's worthwile to pursue the trailing-zeros/uninit scheme as in this PR? Or should we start with the all-zeros/uninit scheme? We could also look into funky schemes like making all uses of |
All zero/all-uninit has the same problem re: slices though, doesn't it? |
yea, but it's slightly easier to handle, as you can usually just implement a default behaviour instead of having to run the regular logic to a certain point and then finish with zeros and somehow handle the edge between those two. |
In the end I don't have a strong opinion as long as we can keep the layers reasonably separated. It might sense to factor the init mask (and |
I'll start with a new experiment, this time from just the API perspective, ignoring impl details for now |
This is part of a set of similar refactorings for Allocation, but I want to benchmark all parts on their own.
This PR changes the InitMask to only allocate up to the first initialized byte. Everything beyond the allocates space is considered uninitialized. Reading from it or defining it as uninitialized will not allocate.