Skip to content

Commit

Permalink
Auto merge of #60187 - tmandry:generator-optimization, r=eddyb
Browse files Browse the repository at this point in the history
Generator optimization: Overlap locals that never have storage live at the same time

The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.**

To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

1. Start with the set of locals assigned to only one variant. The rest are disqualified.
2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

```rust
enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}
```

where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them.

Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

---

Depends on:
- [x] #59897 Multi-variant layouts for generators
- [x] #60840 Preserve local scopes in generator MIR
- [x] #61373 Emit StorageDead along unwind paths for generators

Before merging:

- [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889)
- [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
- [x] Clean up TODO
- [x] Fix the layout code to enforce that the same field never moves around in the generator
- [x] Add tests for async/await
- [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity)
- [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))
  • Loading branch information
bors committed Jun 12, 2019
2 parents 961a9d6 + aeefbc4 commit 05083c2
Show file tree
Hide file tree
Showing 8 changed files with 960 additions and 324 deletions.
16 changes: 16 additions & 0 deletions src/librustc/mir/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ use crate::hir::def_id::DefId;
use crate::hir::{self, InlineAsm as HirInlineAsm};
use crate::mir::interpret::{ConstValue, InterpError, Scalar};
use crate::mir::visit::MirVisitable;
use rustc_data_structures::bit_set::BitMatrix;
use rustc_data_structures::fx::FxHashSet;
use rustc_data_structures::graph::dominators::{dominators, Dominators};
use rustc_data_structures::graph::{self, GraphPredecessors, GraphSuccessors};
Expand Down Expand Up @@ -2997,6 +2998,11 @@ pub struct GeneratorLayout<'tcx> {
/// be stored in multiple variants.
pub variant_fields: IndexVec<VariantIdx, IndexVec<Field, GeneratorSavedLocal>>,

/// Which saved locals are storage-live at the same time. Locals that do not
/// have conflicts with each other are allowed to overlap in the computed
/// layout.
pub storage_conflicts: BitMatrix<GeneratorSavedLocal, GeneratorSavedLocal>,

/// Names and scopes of all the stored generator locals.
/// NOTE(tmandry) This is *strictly* a temporary hack for codegen
/// debuginfo generation, and will be removed at some point.
Expand Down Expand Up @@ -3193,6 +3199,7 @@ BraceStructTypeFoldableImpl! {
impl<'tcx> TypeFoldable<'tcx> for GeneratorLayout<'tcx> {
field_tys,
variant_fields,
storage_conflicts,
__local_debuginfo_codegen_only_do_not_use,
}
}
Expand Down Expand Up @@ -3572,6 +3579,15 @@ impl<'tcx> TypeFoldable<'tcx> for GeneratorSavedLocal {
}
}

impl<'tcx, R: Idx, C: Idx> TypeFoldable<'tcx> for BitMatrix<R, C> {
fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, _: &mut F) -> Self {
self.clone()
}
fn super_visit_with<V: TypeVisitor<'tcx>>(&self, _: &mut V) -> bool {
false
}
}

impl<'tcx> TypeFoldable<'tcx> for Constant<'tcx> {
fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, folder: &mut F) -> Self {
Constant {
Expand Down
767 changes: 487 additions & 280 deletions src/librustc/ty/layout.rs

Large diffs are not rendered by default.

84 changes: 83 additions & 1 deletion src/librustc_data_structures/bit_set.rs
Original file line number Diff line number Diff line change
Expand Up @@ -636,7 +636,7 @@ impl<T: Idx> GrowableBitSet<T> {
///
/// All operations that involve a row and/or column index will panic if the
/// index exceeds the relevant bound.
#[derive(Clone, Debug)]
#[derive(Clone, Debug, Eq, PartialEq, RustcDecodable, RustcEncodable)]
pub struct BitMatrix<R: Idx, C: Idx> {
num_rows: usize,
num_columns: usize,
Expand All @@ -658,6 +658,23 @@ impl<R: Idx, C: Idx> BitMatrix<R, C> {
}
}

/// Creates a new matrix, with `row` used as the value for every row.
pub fn from_row_n(row: &BitSet<C>, num_rows: usize) -> BitMatrix<R, C> {
let num_columns = row.domain_size();
let words_per_row = num_words(num_columns);
assert_eq!(words_per_row, row.words().len());
BitMatrix {
num_rows,
num_columns,
words: iter::repeat(row.words()).take(num_rows).flatten().cloned().collect(),
marker: PhantomData,
}
}

pub fn rows(&self) -> impl Iterator<Item = R> {
(0..self.num_rows).map(R::new)
}

/// The range of bits for a given row.
fn range(&self, row: R) -> (usize, usize) {
let words_per_row = num_words(self.num_columns);
Expand Down Expand Up @@ -737,6 +754,49 @@ impl<R: Idx, C: Idx> BitMatrix<R, C> {
changed
}

/// Adds the bits from `with` to the bits from row `write`, and
/// returns `true` if anything changed.
pub fn union_row_with(&mut self, with: &BitSet<C>, write: R) -> bool {
assert!(write.index() < self.num_rows);
assert_eq!(with.domain_size(), self.num_columns);
let (write_start, write_end) = self.range(write);
let mut changed = false;
for (read_index, write_index) in (0..with.words().len()).zip(write_start..write_end) {
let word = self.words[write_index];
let new_word = word | with.words()[read_index];
self.words[write_index] = new_word;
changed |= word != new_word;
}
changed
}

/// Sets every cell in `row` to true.
pub fn insert_all_into_row(&mut self, row: R) {
assert!(row.index() < self.num_rows);
let (start, end) = self.range(row);
let words = &mut self.words[..];
for index in start..end {
words[index] = !0;
}
self.clear_excess_bits(row);
}

/// Clear excess bits in the final word of the row.
fn clear_excess_bits(&mut self, row: R) {
let num_bits_in_final_word = self.num_columns % WORD_BITS;
if num_bits_in_final_word > 0 {
let mask = (1 << num_bits_in_final_word) - 1;
let (_, end) = self.range(row);
let final_word_idx = end - 1;
self.words[final_word_idx] &= mask;
}
}

/// Gets a slice of the underlying words.
pub fn words(&self) -> &[Word] {
&self.words
}

/// Iterates through all the columns set to true in a given row of
/// the matrix.
pub fn iter<'a>(&'a self, row: R) -> BitIter<'a, C> {
Expand All @@ -748,6 +808,12 @@ impl<R: Idx, C: Idx> BitMatrix<R, C> {
marker: PhantomData,
}
}

/// Returns the number of elements in `row`.
pub fn count(&self, row: R) -> usize {
let (start, end) = self.range(row);
self.words[start..end].iter().map(|e| e.count_ones() as usize).sum()
}
}

/// A fixed-column-size, variable-row-size 2D bit matrix with a moderately
Expand Down Expand Up @@ -1057,6 +1123,7 @@ fn matrix_iter() {
matrix.insert(2, 99);
matrix.insert(4, 0);
matrix.union_rows(3, 5);
matrix.insert_all_into_row(6);

let expected = [99];
let mut iter = expected.iter();
Expand All @@ -1068,6 +1135,7 @@ fn matrix_iter() {

let expected = [22, 75];
let mut iter = expected.iter();
assert_eq!(matrix.count(3), expected.len());
for i in matrix.iter(3) {
let j = *iter.next().unwrap();
assert_eq!(i, j);
Expand All @@ -1076,6 +1144,7 @@ fn matrix_iter() {

let expected = [0];
let mut iter = expected.iter();
assert_eq!(matrix.count(4), expected.len());
for i in matrix.iter(4) {
let j = *iter.next().unwrap();
assert_eq!(i, j);
Expand All @@ -1084,11 +1153,24 @@ fn matrix_iter() {

let expected = [22, 75];
let mut iter = expected.iter();
assert_eq!(matrix.count(5), expected.len());
for i in matrix.iter(5) {
let j = *iter.next().unwrap();
assert_eq!(i, j);
}
assert!(iter.next().is_none());

assert_eq!(matrix.count(6), 100);
let mut count = 0;
for (idx, i) in matrix.iter(6).enumerate() {
assert_eq!(idx, i);
count += 1;
}
assert_eq!(count, 100);

if let Some(i) = matrix.iter(7).next() {
panic!("expected no elements in row, but contains element {:?}", i);
}
}

#[test]
Expand Down
10 changes: 10 additions & 0 deletions src/librustc_data_structures/stable_hasher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,16 @@ impl<I: indexed_vec::Idx, CTX> HashStable<CTX> for bit_set::BitSet<I>
}
}

impl<R: indexed_vec::Idx, C: indexed_vec::Idx, CTX> HashStable<CTX>
for bit_set::BitMatrix<R, C>
{
fn hash_stable<W: StableHasherResult>(&self,
ctx: &mut CTX,
hasher: &mut StableHasher<W>) {
self.words().hash_stable(ctx, hasher);
}
}

impl_stable_hash_via_hash!(::std::path::Path);
impl_stable_hash_via_hash!(::std::path::PathBuf);

Expand Down
5 changes: 5 additions & 0 deletions src/librustc_mir/dataflow/at_location.rs
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,11 @@ where
curr_state.subtract(&self.stmt_kill);
f(curr_state.iter());
}

/// Returns a bitset of the elements present in the current state.
pub fn as_dense(&self) -> &BitSet<BD::Idx> {
&self.curr_state
}
}

impl<'tcx, BD> FlowsAtLocation for FlowAtLocation<'tcx, BD>
Expand Down
Loading

0 comments on commit 05083c2

Please sign in to comment.