Skip to content

Commit 05083c2

Browse files
committed
Auto merge of #60187 - tmandry:generator-optimization, r=eddyb
Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] #59897 Multi-variant layouts for generators - [x] #60840 Preserve local scopes in generator MIR - [x] #61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))
2 parents 961a9d6 + aeefbc4 commit 05083c2

File tree

8 files changed

+960
-324
lines changed

8 files changed

+960
-324
lines changed

src/librustc/mir/mod.rs

+16
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ use crate::hir::def_id::DefId;
99
use crate::hir::{self, InlineAsm as HirInlineAsm};
1010
use crate::mir::interpret::{ConstValue, InterpError, Scalar};
1111
use crate::mir::visit::MirVisitable;
12+
use rustc_data_structures::bit_set::BitMatrix;
1213
use rustc_data_structures::fx::FxHashSet;
1314
use rustc_data_structures::graph::dominators::{dominators, Dominators};
1415
use rustc_data_structures::graph::{self, GraphPredecessors, GraphSuccessors};
@@ -2997,6 +2998,11 @@ pub struct GeneratorLayout<'tcx> {
29972998
/// be stored in multiple variants.
29982999
pub variant_fields: IndexVec<VariantIdx, IndexVec<Field, GeneratorSavedLocal>>,
29993000

3001+
/// Which saved locals are storage-live at the same time. Locals that do not
3002+
/// have conflicts with each other are allowed to overlap in the computed
3003+
/// layout.
3004+
pub storage_conflicts: BitMatrix<GeneratorSavedLocal, GeneratorSavedLocal>,
3005+
30003006
/// Names and scopes of all the stored generator locals.
30013007
/// NOTE(tmandry) This is *strictly* a temporary hack for codegen
30023008
/// debuginfo generation, and will be removed at some point.
@@ -3193,6 +3199,7 @@ BraceStructTypeFoldableImpl! {
31933199
impl<'tcx> TypeFoldable<'tcx> for GeneratorLayout<'tcx> {
31943200
field_tys,
31953201
variant_fields,
3202+
storage_conflicts,
31963203
__local_debuginfo_codegen_only_do_not_use,
31973204
}
31983205
}
@@ -3572,6 +3579,15 @@ impl<'tcx> TypeFoldable<'tcx> for GeneratorSavedLocal {
35723579
}
35733580
}
35743581

3582+
impl<'tcx, R: Idx, C: Idx> TypeFoldable<'tcx> for BitMatrix<R, C> {
3583+
fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, _: &mut F) -> Self {
3584+
self.clone()
3585+
}
3586+
fn super_visit_with<V: TypeVisitor<'tcx>>(&self, _: &mut V) -> bool {
3587+
false
3588+
}
3589+
}
3590+
35753591
impl<'tcx> TypeFoldable<'tcx> for Constant<'tcx> {
35763592
fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, folder: &mut F) -> Self {
35773593
Constant {

src/librustc/ty/layout.rs

+487-280
Large diffs are not rendered by default.

src/librustc_data_structures/bit_set.rs

+83-1
Original file line numberDiff line numberDiff line change
@@ -636,7 +636,7 @@ impl<T: Idx> GrowableBitSet<T> {
636636
///
637637
/// All operations that involve a row and/or column index will panic if the
638638
/// index exceeds the relevant bound.
639-
#[derive(Clone, Debug)]
639+
#[derive(Clone, Debug, Eq, PartialEq, RustcDecodable, RustcEncodable)]
640640
pub struct BitMatrix<R: Idx, C: Idx> {
641641
num_rows: usize,
642642
num_columns: usize,
@@ -658,6 +658,23 @@ impl<R: Idx, C: Idx> BitMatrix<R, C> {
658658
}
659659
}
660660

661+
/// Creates a new matrix, with `row` used as the value for every row.
662+
pub fn from_row_n(row: &BitSet<C>, num_rows: usize) -> BitMatrix<R, C> {
663+
let num_columns = row.domain_size();
664+
let words_per_row = num_words(num_columns);
665+
assert_eq!(words_per_row, row.words().len());
666+
BitMatrix {
667+
num_rows,
668+
num_columns,
669+
words: iter::repeat(row.words()).take(num_rows).flatten().cloned().collect(),
670+
marker: PhantomData,
671+
}
672+
}
673+
674+
pub fn rows(&self) -> impl Iterator<Item = R> {
675+
(0..self.num_rows).map(R::new)
676+
}
677+
661678
/// The range of bits for a given row.
662679
fn range(&self, row: R) -> (usize, usize) {
663680
let words_per_row = num_words(self.num_columns);
@@ -737,6 +754,49 @@ impl<R: Idx, C: Idx> BitMatrix<R, C> {
737754
changed
738755
}
739756

757+
/// Adds the bits from `with` to the bits from row `write`, and
758+
/// returns `true` if anything changed.
759+
pub fn union_row_with(&mut self, with: &BitSet<C>, write: R) -> bool {
760+
assert!(write.index() < self.num_rows);
761+
assert_eq!(with.domain_size(), self.num_columns);
762+
let (write_start, write_end) = self.range(write);
763+
let mut changed = false;
764+
for (read_index, write_index) in (0..with.words().len()).zip(write_start..write_end) {
765+
let word = self.words[write_index];
766+
let new_word = word | with.words()[read_index];
767+
self.words[write_index] = new_word;
768+
changed |= word != new_word;
769+
}
770+
changed
771+
}
772+
773+
/// Sets every cell in `row` to true.
774+
pub fn insert_all_into_row(&mut self, row: R) {
775+
assert!(row.index() < self.num_rows);
776+
let (start, end) = self.range(row);
777+
let words = &mut self.words[..];
778+
for index in start..end {
779+
words[index] = !0;
780+
}
781+
self.clear_excess_bits(row);
782+
}
783+
784+
/// Clear excess bits in the final word of the row.
785+
fn clear_excess_bits(&mut self, row: R) {
786+
let num_bits_in_final_word = self.num_columns % WORD_BITS;
787+
if num_bits_in_final_word > 0 {
788+
let mask = (1 << num_bits_in_final_word) - 1;
789+
let (_, end) = self.range(row);
790+
let final_word_idx = end - 1;
791+
self.words[final_word_idx] &= mask;
792+
}
793+
}
794+
795+
/// Gets a slice of the underlying words.
796+
pub fn words(&self) -> &[Word] {
797+
&self.words
798+
}
799+
740800
/// Iterates through all the columns set to true in a given row of
741801
/// the matrix.
742802
pub fn iter<'a>(&'a self, row: R) -> BitIter<'a, C> {
@@ -748,6 +808,12 @@ impl<R: Idx, C: Idx> BitMatrix<R, C> {
748808
marker: PhantomData,
749809
}
750810
}
811+
812+
/// Returns the number of elements in `row`.
813+
pub fn count(&self, row: R) -> usize {
814+
let (start, end) = self.range(row);
815+
self.words[start..end].iter().map(|e| e.count_ones() as usize).sum()
816+
}
751817
}
752818

753819
/// A fixed-column-size, variable-row-size 2D bit matrix with a moderately
@@ -1057,6 +1123,7 @@ fn matrix_iter() {
10571123
matrix.insert(2, 99);
10581124
matrix.insert(4, 0);
10591125
matrix.union_rows(3, 5);
1126+
matrix.insert_all_into_row(6);
10601127

10611128
let expected = [99];
10621129
let mut iter = expected.iter();
@@ -1068,6 +1135,7 @@ fn matrix_iter() {
10681135

10691136
let expected = [22, 75];
10701137
let mut iter = expected.iter();
1138+
assert_eq!(matrix.count(3), expected.len());
10711139
for i in matrix.iter(3) {
10721140
let j = *iter.next().unwrap();
10731141
assert_eq!(i, j);
@@ -1076,6 +1144,7 @@ fn matrix_iter() {
10761144

10771145
let expected = [0];
10781146
let mut iter = expected.iter();
1147+
assert_eq!(matrix.count(4), expected.len());
10791148
for i in matrix.iter(4) {
10801149
let j = *iter.next().unwrap();
10811150
assert_eq!(i, j);
@@ -1084,11 +1153,24 @@ fn matrix_iter() {
10841153

10851154
let expected = [22, 75];
10861155
let mut iter = expected.iter();
1156+
assert_eq!(matrix.count(5), expected.len());
10871157
for i in matrix.iter(5) {
10881158
let j = *iter.next().unwrap();
10891159
assert_eq!(i, j);
10901160
}
10911161
assert!(iter.next().is_none());
1162+
1163+
assert_eq!(matrix.count(6), 100);
1164+
let mut count = 0;
1165+
for (idx, i) in matrix.iter(6).enumerate() {
1166+
assert_eq!(idx, i);
1167+
count += 1;
1168+
}
1169+
assert_eq!(count, 100);
1170+
1171+
if let Some(i) = matrix.iter(7).next() {
1172+
panic!("expected no elements in row, but contains element {:?}", i);
1173+
}
10921174
}
10931175

10941176
#[test]

src/librustc_data_structures/stable_hasher.rs

+10
Original file line numberDiff line numberDiff line change
@@ -503,6 +503,16 @@ impl<I: indexed_vec::Idx, CTX> HashStable<CTX> for bit_set::BitSet<I>
503503
}
504504
}
505505

506+
impl<R: indexed_vec::Idx, C: indexed_vec::Idx, CTX> HashStable<CTX>
507+
for bit_set::BitMatrix<R, C>
508+
{
509+
fn hash_stable<W: StableHasherResult>(&self,
510+
ctx: &mut CTX,
511+
hasher: &mut StableHasher<W>) {
512+
self.words().hash_stable(ctx, hasher);
513+
}
514+
}
515+
506516
impl_stable_hash_via_hash!(::std::path::Path);
507517
impl_stable_hash_via_hash!(::std::path::PathBuf);
508518

src/librustc_mir/dataflow/at_location.rs

+5
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,11 @@ where
131131
curr_state.subtract(&self.stmt_kill);
132132
f(curr_state.iter());
133133
}
134+
135+
/// Returns a bitset of the elements present in the current state.
136+
pub fn as_dense(&self) -> &BitSet<BD::Idx> {
137+
&self.curr_state
138+
}
134139
}
135140

136141
impl<'tcx, BD> FlowsAtLocation for FlowAtLocation<'tcx, BD>

0 commit comments

Comments
 (0)