Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scopes in mir #32428

Merged
merged 24 commits into from
Mar 25, 2016
Merged

Scopes in mir #32428

merged 24 commits into from
Mar 25, 2016

Conversation

nikomatsakis
Copy link
Contributor

This PR adds scopes to MIR. There is a tree of scopes (each represented by a ScopeId). Every statement, variable, and terminator now has an associated scope and span. It also adds a -Z dump-mir switch one can use to conveniently examine the MIR as optimizations proceed.

The intention is two-fold. First, to support MIR debug-info. This PR does not attempt to modify trans to make use of the scope information, however.

Second, in a more temporary capacity, to support the goal of moving regionck and borowck into the MIR. To that end, the PR also constructs a "scope auxiliary" table storing the extent of each span (this is kept separate from the main MIR, since it contains node-ids) and the dom/post-dom of the region in the graph where the scope occurs. When we move to non-lexical lifetimes, I expect this auxiliary information to be discarded, but that is still some ways in the future (requires, at minimum, an RFC, and there are some thorny details to work out -- though I've got an in-progress draft).

Right now, I'm just dropping this auxiliary information after it is constructed. I was debating for some time whether to add some sort of sanity tests, but decided to just open this PR instead, because I couldn't figure out what such a test would look like (and we don't have independent tests for this today beyond the regionck and borrowck tests).

I'd prefer not to store the auxiliary data into any kind of "per-fn" map. Rather, I'd prefer that we do regionck/borrowck/whatever-else immediately after construction -- that is, we build the MIR for fn X and immediately thereafter do extended correctness checking on it. This will reduce peak memory usage and also ensure that the auxiliary data doesn't exist once optimizations begin. It also clarifies the transition point where static checks are complete and MIR can be more freely optimized.

cc @rust-lang/compiler @nagisa

@rust-highfive
Copy link
Contributor

r? @Aatch

(rust_highfive has picked a reviewer for you, use r? to override)

}

#[derive(Copy, Clone, Debug, RustcEncodable, RustcDecodable)]
pub struct ScopeId(u32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost feel like we should have a general-purpose Id<T> type for these kinds of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wanting for some time to make a macro for this purpose.

On Tue, Mar 22, 2016 at 09:36:17AM -0700, Eduard-Mihai Burtescu wrote:

  • #[inline]
  • fn index(&self, index: ScopeId) -> &ScopeData {
  •    &self.vec[index.index()]
    
  • }
    +}

+impl IndexMut for ScopeDataVec {

  • #[inline]
  • fn index_mut(&mut self, index: ScopeId) -> &mut ScopeData {
  •    &mut self.vec[index.index()]
    
  • }
    +}

+#[derive(Copy, Clone, Debug, RustcEncodable, RustcDecodable)]
+pub struct ScopeId(u32);

I almost feel like we should have a general-purpose Id<T> type for these kinds of things.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/rust-lang/rust/pull/32428/files/9e8bcf4901c547a29ceed5e27606110ff26a9842#r57020384

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in the HIR, it would be more practical to use a single generic type IMO.

let mut changed = true;
while changed {
pretty::dump_mir(tcx, "simplify_cfg", &counter, id, mir, None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that was for debugging purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa

I suppose that was for debugging purposes?

I was debugging things, but I left it in on purpose. My intention is that eventually we will have the option to dump MIR at every potentially interesting point during execution.

@nikomatsakis
Copy link
Contributor Author

Haven't had time to review comments yet; I just added a kind of drive-by cleanup of scope dropping code to make it a bit less complex, and supply a better span to the resume terminator. Seems to work locally but I havent' done a -Z orbit build.

@nikomatsakis nikomatsakis force-pushed the scopes-in-mir branch 2 times, most recently from 443d02f to 07e4843 Compare March 23, 2016 09:59
resumeblk
} else {
// Diverging from any other scope chains up to the previous scope.
build_diverge_scope(tcx, cfg, unit_temp, earlier_scopes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally avoided recursion, because the common case is to have many scopes each having few drops. This means that even for simple functions we would recurse a lot (on top of already big amount of recursion in other MIR build code), which potentially will not work out very well on small stack environments (i.e. musl).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also needlessly recurse on empty scopes (which are quite common), which were previously ignored by the filter(|s| !s.drops.is_empty() || s.free.is_some()).

@nikomatsakis
Copy link
Contributor Author

@nagisa

I missed two of your comments, an you raise a good point! You wrote:

Is the arg_scope_id here the ID of the scope of whole function?

and

I suspect this might be not correct. AFAIR the loop above does not pop any scopes, therefore this will be the scope_id of the inner-most scope. I’m not sure if that’s the real intention.

The way I coded it up, every terminator etc must have a scope, but that leaves the question of what scope the return block itself is. Perhaps we want an outer-most scope that is never popped. This would I guess correspond to the ParameterScope extent. Return would target that extent (but not exit it).
Hmm, good catch, I think you are correct that the scope there is probably wrong. I wll ad

@nikomatsakis
Copy link
Contributor Author

er, not the ParameterScope, but the CallSiteScope

@bors
Copy link
Collaborator

bors commented Mar 23, 2016

☔ The latest upstream changes (presumably #32390) made this pull request unmergeable. Please resolve the merge conflicts.

We should probably settle on some conventions here. In MIR code, I have
generally been importing `*`, but perhaps borrowck does not want to do
that.
also, when exiting a scope, assign the final goto terminator with the
target scope's id
@nikomatsakis
Copy link
Contributor Author

Rebased, make check passes locally.

@nikomatsakis
Copy link
Contributor Author

@nagisa

Weren’t we supposed to execute drops first and then frees second (drops may depend on memory still being allocated, but frees do not otherwise care about things being (not-)dropped)?

I believe the code as written does execute drops first. Note that it builds up the drops in reverse.

@nikomatsakis
Copy link
Contributor Author

@nagisa btw, the review sort of "defaulted" to @Aatch, which is fine with me -- but I don't know how much time he has. You want to be reviewer? (seeing as you've already taken a look) Does anyone have objections to this basic approach?

One thing that occurred to me while discussing with @pnkfelix -- we should figure out how we plan to handle "lifetime start/lifetime end" intrinsics in MIR trans. If we want to emit them when we exit a scope, we have to keep the "scope-auxiliary" information up-to-date. This seems plausible, but it may be easier to do it by having a SCOPE_EXIT(x) statement (so it would be carried around automatically). If we added such an instruction, it would also be nice for dataflow, I think, since it would correspond to kills (e.g. for all loans tied to that scope). (In trans, it would correspond to emitting lifetime-end intrinsics).

An alternative would be added lifetime-end explicitly in MIR, but it seems like that will result in many more instructions -- you would need N*M "words" where N is number of variables and M is number of exits from a scope, vs N+M (one word per variable to tie it to a scope, one word per exit for the statement).

@nikomatsakis
Copy link
Contributor Author

(To be clear, if we were going to have SCOPE_EXIT statements, I'd remove the doms/postdoms fields from "scope auxiliary" -- and I guess just change it to ScopeExtents)

@nagisa
Copy link
Member

nagisa commented Mar 23, 2016

@nikomatsakis I don’t think I have much more outstanding comments other than my dislike about needlessly making exit_scope recursive again.

re lifetimes: it seems to me we already store the necessary information in scopes and could implicitly insert e.g. lifetime-end on scope exits. I’m not sure it is necessary to encode MIR variable lifetimes in MIR itself. At very least it would make writing MIR passes considerably more complex, because most of these do not need or want to care about variable lifetimes at all.

@nagisa
Copy link
Member

nagisa commented Mar 23, 2016

r? @nagisa

Sure.

@rust-highfive rust-highfive assigned nagisa and unassigned Aatch Mar 23, 2016
@nikomatsakis
Copy link
Contributor Author

re lifetimes: it seems to me we already store the necessary information in scopes and could implicitly insert e.g. lifetime-end on scope exits.

I don't quite follow what you mean. The current version of this branch stores the information in the doms/post-doms vectors, which are going to be a pain to maintain (much harder than pseudo-instructions, which virtually maintain themselves; even better if they take a vector of operands that link to the variables that are to be dropped, though that gets back to the N*M storage question). I was imagining that we would just throw them way once optimization begins, but of course we can't do that if we need them in mir-trans.

I was thinking that perhaps we could recompute the end-lifetime instructions, but that's not really true. The problem is borrows: we'd need (at least) the region information for each borrow, which I plan to erase. But even if we didn't erase it, those are just going to be regions of the CFG which must be maintained, which again complicates later optimizations and so forth.

Basically: the reason to contemplate adding statements is to ensure that other optimizations don't have to think about this. These statements essentially are the same as DROP, but they are not terminators (and hence are only needed for variables that are Copy).

@nikomatsakis
Copy link
Contributor Author

@nagisa

I don’t think I have much more outstanding comments other than my dislike about needlessly making exit_scope recursive again.

Heh. Recursion seemed pretty natural, but I guess I can rewrite it into a loop easily enough.

while I'm at it, remove the "extra caching" that I was doing for no good
reason except laziness. Basically before I was caching at each scope in
the chain, but there's not really a reason to do that, since the cached
entry point at level N is always equal to the last cached exit point
from level N-1.
@nikomatsakis
Copy link
Contributor Author

@nagisa made it iterative.

We now visit more things (e.g., types) and also follow a deliberate
style designed to reduce omissions.
return None;
}
assert!(!self.scopes.is_empty()); // or `all` above would be true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assertion really necessary or, perhaps, it is misplaced and should go above the if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa it's more a comment than an assertion, I guess. Although I think that, in practice, the list of scopes cannot be empty here, so it could in principle go above the if

@nagisa
Copy link
Member

nagisa commented Mar 24, 2016

Commits up to 0769865 look okay to me. It seems you have more plans in mind for this PR, so just ping me once it is ready for final review.

@nikomatsakis
Copy link
Contributor Author

@nagisa meh, I think I may well just land what you reviewed, and we can iterate more in other PRs.

@nikomatsakis
Copy link
Contributor Author

(@pnkfelix reviewed the visitor commit)

@nikomatsakis
Copy link
Contributor Author

@bors r=nagisa

@bors
Copy link
Collaborator

bors commented Mar 24, 2016

📌 Commit 091a007 has been approved by nagisa

@bors
Copy link
Collaborator

bors commented Mar 25, 2016

⌛ Testing commit 091a007 with merge f1578d3...

bors added a commit that referenced this pull request Mar 25, 2016
Scopes in mir

This PR adds scopes to MIR. There is a tree of scopes (each represented by a `ScopeId`). Every statement, variable, and terminator now has an associated scope and span.  It also adds a `-Z dump-mir` switch one can use to conveniently examine the MIR as optimizations proceed.

The intention is two-fold. First, to support MIR debug-info. This PR does not attempt to modify trans to make use of the scope information, however.

Second, in a more temporary capacity, to support the goal of moving regionck and borowck into the MIR. To that end, the PR also constructs a "scope auxiliary" table storing the extent of each span (this is kept separate from the main MIR, since it contains node-ids) and the dom/post-dom of the region in the graph where the scope occurs. When we move to non-lexical lifetimes, I expect this auxiliary information to be discarded, but that is still some ways in the future (requires, at minimum, an RFC, and there are some thorny details to work out -- though I've got an in-progress draft).

Right now, I'm just dropping this auxiliary information after it is constructed. I was debating for some time whether to add some sort of sanity tests, but decided to just open this PR instead, because I couldn't figure out what such a test would look like (and we don't have independent tests for this today beyond the regionck and borrowck tests).

I'd prefer not to store the auxiliary data into any kind of "per-fn" map. Rather, I'd prefer that we do regionck/borrowck/whatever-else immediately after construction -- that is, we build the MIR for fn X and immediately thereafter do extended correctness checking on it. This will reduce peak memory usage and also ensure that the auxiliary data doesn't exist once optimizations begin. It also clarifies the transition point where static checks are complete and MIR can be more freely optimized.

cc @rust-lang/compiler @nagisa
@bors bors merged commit 091a007 into rust-lang:master Mar 25, 2016
@nikomatsakis nikomatsakis deleted the scopes-in-mir branch March 30, 2016 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants