Skip to content

NLL: refine liveness with "maybe initialized" analysis #45665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nikomatsakis opened this issue Oct 31, 2017 · 5 comments
Closed

NLL: refine liveness with "maybe initialized" analysis #45665

nikomatsakis opened this issue Oct 31, 2017 · 5 comments
Labels
T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@nikomatsakis
Copy link
Contributor

As part of #45538, we are incorporating knowledge of which variables are live (i.e., may be used later). We also distinguish "drop-live", which means that they may be dropped later.

However, in MIR, we sometimes generate drops that we can see statically will be a no-op. For example, consider this program:

// compile-flags:-Znll
fn main() {
    let mut v = 1;
    let p = Wrap { value: &v }; // freezes `v` so long as `p` is in use
    ignore(p); // moves `p`
    v += 1; // Errors, even in NLL! (At least as implemented now.)
} // <-- p is dropped here, but we **know** this to be a no-op

fn ignore<T>(x: T) { }

struct Wrap<T> { value: T }

impl<T> Drop for Wrap<T> {
    fn drop(&mut self) { }
}

The problem is that p is considered "drop-live" at the point where v += 1, because there is a DROP in the MIR, even though we know that p must have been moved (and hence will not be dropped) at that point.

One way to account for this is to do a maybe initialized analysis. This is a forwards analysis that tells you, at any given point, whether a given value could possibly be initialized that this point. The idea then would be to change the code which invokes the add_drop_live_constraint function to take the "maybe initialized" state into account. In particular, if the local variable that is being dropped cannot be initialized (i.e., maybe initialized is false), then we can ignore the fact that it is drop live and just not invoke add_drop_live_constraint with its type.

We are already computing a "maybe initialized" in the borrow checker (the variable flow_inits). However, this analysis works a bit differently than the NLL code. For example, it is computed at the level of fragments, which are paths like a.b.c. This is done because it is possible that a was initialized, but some subpart of it (e.g., a.b.c) was moved, while the rest remains intact.

Honestly, all the data structures are a bit out of cache for me, but the high-level idea then is that, when we know that a variable X is drop-live, we want to iterate over all of the subpaths of a that may be initialized (which the existing data structures should be able to tell us) and then invoke the existing add_drop_live_constraint function on the types of those fragments.

cc @arielb1 @pnkfelix -- any notes you care to leave on how to find the maybe initialized fragments given a mir::Local would be appreciated. Else I will dig into the data structures at some point.

@nikomatsakis nikomatsakis added E-needs-mentor T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 31, 2017
@arielb1
Copy link
Contributor

arielb1 commented Nov 1, 2017

We should have the right sort of data in flow_inits, and I don't see why can't we have liveness fragment-by-fragment: structs with destructors can't be partially initialized anyway - see nikomatsakis/nll-rfc#42.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Nov 3, 2017

Let me leave some notes on how things work now. I've been digging about in the code a bit to get a better understanding of what's going to be required here. The NLL code approaches things just a hair differently from the existing borrowck data-flow operator, but I think the two should match up well enough.

First, a bit of background. In order to track which paths are moved etc, we currently construct up a tree of paths. Each such path is called a MovePath, and they are indexed by the MoveData struct. Typically though we don't pass around a MovePath directly, [but rather a MovePathIndex.

The move paths are arranged in a tree, so e.g. if you have a move path a, as well as a.b and a.c, you can walk from a to a.b and a.c -- basically, each move data has a link to its child (first_child) and then that child has links to its siblings (next_sibling). The helper has_any_child_of, for example, shows how to iterate through all the descendants of a node using these fields. (We'll need a similar helper, I think, but implemented on FlowInProgress<MaybeInitializedLvals<'b, 'gcx, 'tcx>> instead of FlowInProgress<MaybeUninitializedLvals<'b, 'gcx, 'tcx>> -- we should be able to share this code.)

It is instructive to look at where has_any_child_of is used today, since it is doing a similar thing. The check_if_path_is_moved function is invoked when there is an access to a path like a.b.c -- it checks whether that path is accessible or not. So, for example, it would be illegal to do some_function(a.b.c) if a.b.c.d has been moved, since a.b.c is not a "complete" value (it's also illegal to do some_function(a.b.c) if a or b has been moved).

In order to detect the case where a.b.c or some subpath of has been moved, the code works in two steps:

debug!("check_if_path_is_moved part2 lvalue: {:?}", lvalue);
if let Some(mpi) = self.move_path_for_lvalue(lvalue) {
if let Some(_) = maybe_uninits.has_any_child_of(mpi) {
self.report_use_of_moved(context, desired_action, lvalue_span);
return; // don't bother finding other problems.
}
}

First, it looks for a move-path for a.b.c, on line 738. There may be no such path, if the function only moves a or a.b (but never directly accesses the field a.b.c). If there is such a path, then on line 739 it looks for all maybe-uninitialized children of that path using the has_any_child_of function we saw earlier.

@nikomatsakis
Copy link
Contributor Author

Initial mentoring instructions: step 1

First, we are going to have to do a bit of surgery to the current control-flow, I think. Right now, the borrowck begins by invoking gather_moves as the first step, and then invokes the compute_regions call later on. This is actually a mistake, I believe, and we should be invoking compute_regions earlier -- we don't want to be doing the move analysis on the original MIR and then editing it, since the types and things will be changing, which could mess up some of the hashing etc.

Probably, also, it would be useful to make borrow_check into a directory, move the existing code into borrow_check/mod.rs, and move nll.rs from transform::nll into borrow_check::nll. That might be a good first PR.

Next, I think what we really want is to split up the compute_regions code into two steps: the "renumbering", which is what mutates the MIR, and then everything else. That way, the overall flow of borrowck can be:

OK, ran out of time, will write more later!

@spastorino
Copy link
Member

I'm taking this one 😄

@nikomatsakis
Copy link
Contributor Author

@Nashenas88 wound up doing this, but it's done. There's follow-up work to be done, I'll file a separate issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants