Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove CTFE loop detector #54384

Closed
RalfJung opened this issue Sep 20, 2018 · 10 comments · Fixed by #70087
Closed

remove CTFE loop detector #54384

RalfJung opened this issue Sep 20, 2018 · 10 comments · Fixed by #70087
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Sep 20, 2018

The plan changed to removing the loop detector.

Original issue

The code at https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/snapshot.rs leaves me puzzled and confused after wasting about two hours on it. Open questions are:

  • Why is AllocIdSnapshot an Option? What does it mean when there is a None there and when does that happen?
  • Why does EvalSnapshot::new make a deep copy, and then it has a method EvalSnapshot::snapshot that does a snapshot (of the EvalSnapshot, confusing naming)? That method is called in PartialEq::eq, doesn't that mean we do tons of allocations and copies every time we compare?
  • Where and how does it avoid snapshotting the same allocation twice in one snapshot?

I tried fixing the second point (making a snapshot when EvalSnapshot is created, not when it is compared), but failed to do so as the thing produced by the Snapshot trait isn't actually a snapshot (which I would expect to be a deep, independent copy), but something full of references to somewhere (couldn't figure out where though, just saw the lifetime).

Documentation (inline in the code) should be improved to answer these questions, and if the part about allocating during comparison is correct, that should be fixed.

Cc @oli-obk @brunocodutra

@RalfJung
Copy link
Member Author

I am actually now also confused about why we are using "stable hash". According to what I learned in rust-lang/rustc-dev-guide#203, the purpose of "stable hash" is to not change between rustc invocations, which is important for incremental compilation where the hashes are stored to disk and loaded again by the next compiler invocation.

Why does this help / why do we want this with the loop detector?

@RalfJung
Copy link
Member Author

@brunocodutra at #52626 (comment)

At a high level, what it does is to crawl up the allocation stack and replace every weak reference (AllocId) by a strong shared reference, so that a trivial equality comparison does the right thing, comparing the actual allocations rather than their IDs.

I see now why we clone everything and use a reference at the time we compare for equality: it's to avoid duplicating clones. Imagine the same allocation is referred through its AllocId in several places. If we crawl up the allocation structure and replace every AllocId by a clone of the allocation, we will clone the same thing over and over again. Instead we clone all allocations only once and then just map AllocIds to references by the time we need to compare the state for equality. Does that make sense? This is actually not super expensive, the equality comparison needs to crawl up the structure either way, the only extra cost is querying Memory, which is a trade off to avoid duplicating clones.

Ah I think I get it! The term "snapshot" threw me off, I was sure that would mean "do a copy". It does not.

So "snapshot" is really, as you say, a "replace AllocId indirection by reference" pass. Actually its an Option<&Alloc> because an allocation could be missing. Now it makes sense.

However, I am not sure if that's what is actually happening. Imagine for a moment we have an allocation with a pointer to itself, i.e., the relocations contain the AllocId of the allocation itself. I claim your code will loop. Namely, it will reach Allocation::snapshot, which will call Relocations::snapshot, which will again call AllocId::snapshot, which will call Allocation::snapshot ad infinitum.

In case of such a cycle, you want to end up with a cyclic structure of shared references: The allocation reference in the relocations should point back to the allocation itself. Creating such a structure is not even possible without interior mutability, so I am pretty sure this is not what currently happens. In AllocId::snapshot, you will have to make a case distinction to figure out if this AllocId has already been snapshotted or not, and reuse the old snapshot if one exists already. Currently, you are creating a new snapshot of the same allocation each time you see it. The problem you were mentioning where you clone the same thing over and over? I think you have that same problem, just with snapshot.


However, I have some deeper architectural suggestions. Your snapshot is far from cheap! Notably, Relocations::snapshot creates a new allocation to hold the snapshotted allocations (that's the collect). Comparison better shouldn't need an allocation.

Moreover, if the program accumulates dead locations in memory, while your comparison step is blissfully ignorant of those, the actual copy done to create EvalSnapshot still becomes more and more expensive. Ideally the entire snapshotting -- both the copy and the comparison -- would only have complexity linear in the size of the reachable memory.

I see two ways to solve the first problem, and one of them also solves the second:

  • Currently, you are building up that entire "snapshot" only to compare it and then throw it away. What a waste! It would make much more sense to do the comparison "on the fly", i.e. to walk the entire thing and compare without ever constructing a full copy. So instead of allocating a new Vec in Relocations::snapshot, you would just iterate the relocations of the two allctions you are comparing, and compare them in pairs. You will still need to figure out how to compare two AllocId (it's not equality, they could have the same content without being the same ID!) without going in cycles, but that problem just doesn't go away no matter how you do it.
  • However, even better I think would be to not do a clone, but instead do a proper "canonicalizing copy" -- do a copy of live memory that only copies reachable allocations, and that replaces AllocId by a reference. As you remarked, this has to avoid going in cycles, but you'll have to solve that problem anyway.

The second solution has two advantages as I see it: First it actually achieves the goal of complexity linear in the size of reachable memory (the first solution still copies all of memory, even the unreachable part); second it is likely easier to implement since you only have to consider one machine at a time and how to make a copy of it; instead of having to consider two machines and how to compare their states.

@RalfJung
Copy link
Member Author

Uh okay my second solution is not quite as simple -- it still also needs care during the actual comparison. If you have two memory which both contain this cyclic allocation, if you are not careful, comparing them will loop endlessly. This will need some state to record pairs of allocations we have already reached and compared.

At that point, actually, I do not see the value of replacing AllocId by a reference. It makes everything more complicated, and I don't think you are gaining anything.

@brunocodutra
Copy link
Contributor

brunocodutra commented Sep 20, 2018

In case of such a cycle, you want to end up with a cyclic structure of shared references: The allocation reference in the relocations should point back to the allocation itself. Creating such a structure is not even possible without interior mutability, so I am pretty sure this is not what currently happens. In AllocId::snapshot, you will have to make a case distinction to figure out if this AllocId has already been snapshotted or not, and reuse the old snapshot if one exists already. Currently, you are creating a new snapshot of the same allocation each time you see it. The problem you were mentioning where you clone the same thing over and over? I think you have that same problem, just with snapshot.

That's absolutely true, I did not consider loops in the current implementation, the only thing references solve is the problem of cloning the same Allocation multiple times.

Currently, you are building up that entire "snapshot" only to compare it and then throw it away. What a waste! It would make much more sense to do the comparison "on the fly", i.e. to walk the entire thing and compare without ever constructing a full copy. So instead of allocating a new Vec in Relocations::snapshot, you would just iterate the relocations of the two allctions you are comparing, and compare them in pairs. You will still need to figure out how to compare two AllocId (it's not equality, they could have the same content without being the same ID!) without going in cycles, but that problem just doesn't go away no matter how you do it.

That was my first iteration at the solution, but it produced such scary handwritten equality code that @oli-obk suggested we attempted a different approach and that eventually led to the current design.
Mostly what makes it super messy is the fact that you have to crawl up the allocations in parallel for the two evaluation contexts you are comparing and that gets nasty because you have to match a few too many enums on the way. The idea is to keep all pairs of AllocIds visited in a hash set, so that cycles can be avoided. Also, for each such pair, the corresponding Allocations have to be resolved from the respective Memory and the recursion continues until there is no pair of AllocIds that haven't been visited.

I do want to point something out though: this whole cloning + equality comparison is expensive, however getting to actually executing it should be rare, because first of all, the loop detector only samples the evaluation context once every 256 steps and the whole thing is avoided if their hashes don't match.

And that leads to me answering the remaining open question:

I am actually now also confused about why we are using "stable hash". According to what I learned in rust-lang/rustc-dev-guide#203, the purpose of "stable hash" is to not change between rustc invocations, which is important for incremental compilation where the hashes are stored to disk and loaded again by the next compiler invocation.

Why does this help / why do we want this with the loop detector?

Stable hash is used because it pretty much does exactly what we need as far as hashing goes - it resolves AllocIds to their Allocations rather than hashing their actual value.

@RalfJung
Copy link
Member Author

Stable hash is used because it pretty much does exactly what we need as far as hashing goes - it resolves AllocIds to their Allocations rather than hashing their actual value.

Oh! That would explain it. Would be worth a comment. :) Though, couldn't the same be achieved by just manually implementing Hash for AllocationSnapshot? Then we could remove all those StableHash impls.

That was my first iteration at the solution, but it produced such scary handwritten equality code that @oli-obk suggested we attempted a different approach and that eventually led to the current design.

Yeah it's not trivial. But so far, I have not seen a design that works with loops in memory and entirely avoids that complication.

@brunocodutra
Copy link
Contributor

Couldn't the same be achieved by just manually implementing Hash for AllocationSnapshot?

We actually hash the stack, that is Vec<Frame>, which also means

  1. We would need a way resolve AllocId to the corresponding Allocation, which I'm not sure is possible without an equivalent to StableHashingContext?
  2. Dealing with Place, MemPlace, Scalar, ScalarMaybeUndef, Pointer, Allocation, etc, inline in order to find the AllocIds.

Conversely, reusing stable hash means the implementation for all of these types is already available, which means impl HashStable for Frame is trivial.

I have not seen a design that works with loops in memory and entirely avoids that complication.

Should I implement the algorithm I outlined so we can actually see how it looks?

@RalfJung
Copy link
Member Author

Ah dang I forgot again that we are hashing the original structure but comparing the snapshot.

But then... the StableHash for AllocId looks up allocations in the tcx, does it not? What if there is a pointer into a machine-specific allocation that has not been interned yet?

@brunocodutra
Copy link
Contributor

brunocodutra commented Sep 22, 2018

But then... the StableHash for AllocId looks up allocations in the tcx, does it not? What if there is a pointer into a machine-specific allocation that has not been interned yet?

It will increase the probability of a hash collision.

@jonas-schievink jonas-schievink added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) labels Jan 27, 2019
@RalfJung
Copy link
Member Author

So what is the plan here? We now have a more traditional step limit for CTFE. @ecstatic-morse said they'd submit a PR to remove the loop detector once that lands, so I guess doing so is all that remains to be tracked.

@RalfJung RalfJung changed the title Document and maybe fix CTFE loop detector remove CTFE loop detector Mar 12, 2020
@ecstatic-morse
Copy link
Contributor

Yep. It'll be a few days, however.

@JohnTitor JohnTitor added the C-cleanup Category: PRs that clean code up or issues documenting cleanup. label Mar 15, 2020
Centril added a commit to Centril/rust that referenced this issue Mar 23, 2020
…op-detector, r=RalfJung

Remove const eval loop detector

Now that there is a configurable instruction limit for CTFE (see rust-lang#67260), we can replace the loop detector with something much simpler. See rust-lang#66946 for more discussion about this. Although the instruction limit is nightly-only, the only practical way to reach the default limit uses nightly-only features as well (although CTFE will still execute code using such features inside an array initializer on stable).

This will at the very least require a crater run, since it will result in an error wherever the "long running const eval" warning appeared before. We may need to increase the default for `const_eval_limit` to work around this.

Resolves rust-lang#54384 cc rust-lang#49980
r? @oli-obk cc @RalfJung
@bors bors closed this as completed in 72c99f2 Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants