Don't allocate trailing uninit bits in the `InitMap` of CTFE `Allocation`s #94936

oli-obk · 2022-03-14T16:51:35Z

This is part of a set of similar refactorings for Allocation, but I want to benchmark all parts on their own.

This PR changes the InitMask to only allocate up to the first initialized byte. Everything beyond the allocates space is considered uninitialized. Reading from it or defining it as uninitialized will not allocate.

…ion`s

oli-obk · 2022-03-14T16:51:47Z

@bors try @rust-timer queue

rust-timer · 2022-03-14T16:51:48Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-03-14T16:51:55Z

⌛ Trying commit 21b759a with merge d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b...

bors · 2022-03-14T18:29:07Z

☀️ Try build successful - checks-actions
Build commit: d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b (d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b)

rust-timer · 2022-03-14T18:29:09Z

Queued d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b with parent bce19cf, future comparison URL.

rust-timer · 2022-03-14T19:58:57Z

Finished benchmarking commit (d4a5b1dc3a8717b8d4eaece9f4b6ca0fca86aa5b): comparison url.

Summary: This benchmark run did not return any relevant results. 5 results were found to be statistically significant but too small to be relevant.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

oli-obk · 2022-03-14T20:42:06Z

r? @RalfJung

erikdesjardins

Hmm, since this doesn't seem to have resulted in any significant speed or max-rss improvements, is the increased complexity worth it? (Not sure if the description implies that you have additional changes dependent on this, or equivalent changes to other parts of the code)

erikdesjardins · 2022-03-15T20:20:37Z

compiler/rustc_middle/src/mir/interpret/allocation.rs

+                    }
+                }
+            } else {
+                if is_init {


This deserves a comment explaining the case it's handling--trailing uninit may not be allocated, so if the start block doesn't exist then it's all uninit

erikdesjardins · 2022-03-15T20:26:13Z

compiler/rustc_middle/src/mir/interpret/allocation.rs

@@ -886,6 +913,9 @@ impl InitMask {
                        }
                    }
                }
+                if !is_init && end_block_inclusive >= init_mask.blocks.len() {


This also deserves a comment, and perhaps ascii art about the case it's handling

oli-obk · 2022-03-16T06:57:23Z

Yea, I don't expect our benchmarks to excercise this much, but there are some open issues for it that I want to test against this change in combination with the equivalent one for the bytes field.

RalfJung · 2022-03-16T21:45:10Z

This PR changes the InitMask to only allocate up to the first initialized byte.

I assume you mean last initialized byte?

RalfJung · 2022-03-16T21:46:17Z

compiler/rustc_middle/src/mir/interpret/allocation.rs

        let mut m = InitMask { blocks: vec![], len: Size::ZERO };
-        m.grow(size, state);
+        m.grow(size, true);
        m


We could keep the old API by using if state { ... } else { ... } -- that seems cleaner (not bothering the users with a somewhat cumbersome API)?

RalfJung · 2022-03-16T21:51:24Z

This might be a good time to add some unit tests to InitMask?

RalfJung · 2022-03-16T21:53:56Z

compiler/rustc_middle/src/mir/interpret/allocation.rs

    pub fn set_range(&mut self, start: Size, end: Size, new_state: bool) {
        let len = self.len;
-        if end > len {
+        if end > len && new_state {


I am a bit surprised that we allow OOB indices at all here?

Also, I think set_range_inbounds is called in situations where the index is in-bounds of the logical size of the InitMap, but OOB of its actual size due to leaving off the tail. So I think the growing logic needs to move to set_range_inbounds.

Ah, you have ensure_blocks for that. But then we have two growing logics? That seems odd.

RalfJung · 2022-03-16T21:56:28Z

compiler/rustc_middle/src/mir/interpret/allocation.rs

-            } else {
-                self.blocks[blocka] &= !range;
+            } else if let Some(block) = self.blocks.get_mut(blocka) {
+                *block &= !range;


Please add a comment saying why it is okay to do nothing here if get_mut returns None.

RalfJung · 2022-03-16T21:56:48Z

compiler/rustc_middle/src/mir/interpret/allocation.rs

@@ -673,15 +679,17 @@ impl InitMask {
            for block in (blocka + 1)..blockb {
                self.blocks[block] = u64::MAX;
            }
-        } else {
+        } else if let Some(blocka_val) = self.blocks.get_mut(blocka) {


Same, please add a comment saying why it is okay to do nothing here if get_mut returns None.

RalfJung · 2022-03-16T22:02:49Z

compiler/rustc_middle/src/mir/interpret/allocation.rs

        }
+        let additional_blocks = block - self.blocks.len() + 1;
+        self.blocks.extend(
+            // FIXME(oli-obk): optimize this by repeating `new_state as Block`.


I don't quite understand this FIXME. I see it just got moved around but still, it should be clarified or removed IMO.

ah, basically instead of filling with uninit and then setting all of them to initialized, we can immediately fill with init.

oli-obk · 2022-03-28T13:55:52Z

So... this only makes sense imo if we also do the same optimization for data, but that's a lot harder. We'd have to

reintroduce a size field, so we can track the actual size of the allocation
rewrite all the APIs that return slices to return some type that works just like a slice, but returns 0 for everything beyond the actual data slice's size.

This seems very roundabout and fragile.

I'm not sure how to proceed from here. I don't think

static array of zeroes can take minutes to lint check #55795

can be solved this way.

We tried

Avoid copying some undef memory in MIR #62655

previously. That one only allows differentiating between

all zero
all uninit
everything else

and is slightly better to handle, but not sure by how much.

@RalfJung do you think it's worthwile to pursue the trailing-zeros/uninit scheme as in this PR? Or should we start with the all-zeros/uninit scheme?

We could also look into funky schemes like making all uses of Allocation go through a trait and then using dynamic dispatch to implement the different schemes.

RalfJung · 2022-03-28T14:08:37Z

All zero/all-uninit has the same problem re: slices though, doesn't it?

oli-obk · 2022-03-28T14:30:53Z

All zero/all-uninit has the same problem re: slices though, doesn't it?

yea, but it's slightly easier to handle, as you can usually just implement a default behaviour instead of having to run the regular logic to a certain point and then finish with zeros and somehow handle the edge between those two.

RalfJung · 2022-03-28T14:51:53Z

In the end I don't have a strong opinion as long as we can keep the layers reasonably separated. It might sense to factor the init mask (and data representation) code into a separate module to ensure it is suitably isolated from the rest of the Allocation logic?

oli-obk · 2022-04-12T12:44:42Z

I'll start with a new experiment, this time from just the API perspective, ignoring impl details for now

Don't allocate trailing uninit bits in the InitMap of CTFE `Allocat…

21b759a

…ion`s

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 14, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 14, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 14, 2022

rust-highfive assigned RalfJung Mar 14, 2022

erikdesjardins reviewed Mar 15, 2022

View reviewed changes

RalfJung reviewed Mar 16, 2022

View reviewed changes

oli-obk closed this Apr 12, 2022

Don't allocate trailing uninit bits in the InitMap of CTFE Allocations #94936

Don't allocate trailing uninit bits in the InitMap of CTFE Allocations #94936

Uh oh!

Conversation

oli-obk commented Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oli-obk commented Mar 14, 2022

Uh oh!

rust-timer commented Mar 14, 2022

Uh oh!

bors commented Mar 14, 2022

Uh oh!

bors commented Mar 14, 2022

Uh oh!

rust-timer commented Mar 14, 2022

Uh oh!

rust-timer commented Mar 14, 2022

Uh oh!

oli-obk commented Mar 14, 2022

Uh oh!

erikdesjardins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oli-obk commented Mar 16, 2022

Uh oh!

RalfJung commented Mar 16, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Mar 16, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oli-obk commented Mar 28, 2022

Uh oh!

RalfJung commented Mar 28, 2022

Uh oh!

oli-obk commented Mar 28, 2022

Uh oh!

RalfJung commented Mar 28, 2022

Uh oh!

oli-obk commented Apr 12, 2022

Uh oh!

Uh oh!

Don't allocate trailing uninit bits in the `InitMap` of CTFE `Allocation`s #94936

Don't allocate trailing uninit bits in the `InitMap` of CTFE `Allocation`s #94936

oli-obk commented Mar 14, 2022 •

edited

Loading