Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miri engine: basic support for pointer provenance tracking #54461

Merged
merged 12 commits into from
Oct 10, 2018

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 22, 2018

This enriches pointers with a new member, tag, that can be used to do provenance tracking. This is a new type parameter that propagates up through everything. It defaults to () (no tag), which is also the value used by CTFE -- but miri will use another type.

The only actually interesting piece here, I think, is what I had to do in the memory's get. The problem is that tcx (storing the allocations for statics) uses () for provenance information. But the machine might need another tag. The machine has a function to do the conversion, but if a conversion actually happened, we need to store the result of this somewhere -- we cannot return a pointer into tcx as we usually would.
So I introduced MonoHashMap which uses RefCell to be able to insert new entries even when we just have a shared ref. However, it is important that we can also return shared refs into the map without holding the RefCell opan. This is achieved by boxing the values stored in the map, so their addresses remain stable even when the map's table gets reallocated. This is all implemented in mono_hash_map.rs.

NOTE: This PR also contains the commits from #54380 (comment). Only the last two commits are new.

@rust-highfive

This comment has been minimized.

@Mark-Simulacrum
Copy link
Member

The mono hash map should almost certainly go in librustc_data_structures

@RalfJung
Copy link
Member Author

It is awfully specific in its API -- quirky and limited to what memory needs. Basically, it is in a separate module only because that reduces the amount of code that has to care about the invariant.

@eddyb
Copy link
Member

eddyb commented Sep 22, 2018

So I introduced MonoHashMap

As a general note, I'd like to not merge such changes until multiple people have looked at the problem, and decided that the new data structure is needed.

EDIT: I looked at it - we planned to do this ages ago, but we actually settled on using arenas instead - could you use a TypedArena here, and just have a HashMap with references from the arena as values?

@RalfJung
Copy link
Member Author

RalfJung commented Sep 22, 2018

I considered using an arena, but then when stack frames get popped or boxes dropped, their allocations would not get removed.

I could try recycling allocations from the arena, but at that point I'd also write a new data structure.

@eddyb
Copy link
Member

eddyb commented Sep 22, 2018

Okay, why not use Rc instead of Box then?

@RalfJung
Copy link
Member Author

You mean, we should use Rc<Allocation> instead of &Allocation as the return value of Memory::get? Not sure if that would overall be more or less efficient than what I got. I'd guess slightly less so because Rc has to check the refcount even when we have an &mut, which I currently avoid.

@eddyb
Copy link
Member

eddyb commented Sep 23, 2018

@RalfJung Ah, I was thinking in terms of tags there. We should probably discuss your plans there in more detail - what do you expect to use as the non-() type for Tag?
I would hope for something copyable, which would, I believe, remove the need for MonoHashMap?

Oops, I read your PR description again, so this is for allocating &Allocation.

Why not use a slightly different map than Relocation for tags, or just make them optional? Then, instead of (), you use ! (and I guess Pointer<!> would just contain Option<!>).

@RalfJung
Copy link
Member Author

RalfJung commented Sep 23, 2018

Why not use a slightly different map than Relocation for tags, or just make them optional? Then, instead of (), you use ! (and I guess Pointer<!> would just contain Option<!>).

I don't understand how that would help at all? It'd still be Option<!> in tcx and Option<Borrow> in the full miri machine.

I know you stroked that part of your comment, but my plans are described at https://www.ralfj.de/blog/2018/08/07/stacked-borrows.html :D

@TimNN TimNN added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 25, 2018
@eddyb
Copy link
Member

eddyb commented Sep 26, 2018

@RalfJung Sorry, I keep getting confused. Something workable could be keeping a map from AllocId to these tags, outside of the Allocation. That way, you don't have to do anything special when "importing" tcx allocations - they're just guaranteed to have trivial tags.

@RalfJung
Copy link
Member Author

RalfJung commented Sep 26, 2018 via email

@eddyb
Copy link
Member

eddyb commented Sep 28, 2018

Adding new unsafe code is not nice, IMO. However, if you want to use unsafe code...
For now, if you have a &HashMap<Foo, Box<!>>, you should be able to refer to it as &HashMap<Foo, Box<T>> for any T, as it's guaranteed to be empty anyway.

@RalfJung
Copy link
Member Author

Not sure where you think a HashMap<_, Box<!>> would come up here?

We could have a Relocations<Tag> in the allocation, so having two relocations instead of one with pairs, and then hope that the path with () gets optimized away. Not sure how that would help, though.

@RalfJung RalfJung mentioned this pull request Sep 29, 2018
@nikomatsakis
Copy link
Contributor

Having talked out the various options with @RalfJung, I have to say that MonoHashMap seems like a pretty reasonable solution to the existing constraints. Certainly the interface is pretty small and self-contained, and it permits retaining the convenient &-based API.

@bors

This comment has been minimized.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 4, 2018

@eddyb @oli-obk @nikomatsakis This is currently blocking progress on my main internship project. How do we proceed?

Copy link
Member

@bjorn3 bjorn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed two typos

fn static_with_default_tag(
alloc: &'_ Allocation
) -> Cow<'_, Allocation<Self::PointerTag>> {
// We do not use a tag so we can just cheapyl forward the reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*cheaply

/// the appropriate tags on each pointer.
///
/// This should avoid copying if no work has to be done! If this returns an owned
/// allocation (because a copy had to be done to add the tags), machibe memory will
Copy link
Member

@bjorn3 bjorn3 Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*machine

@nikomatsakis
Copy link
Contributor

@RalfJung my vote is to land as is; we can always revisit it

@nikomatsakis
Copy link
Contributor

I do agree with @eddyb that it's a good idea to be very cautious with introducing new unsafe abstractions. However, this one feels quite self-contained, and I don't see any other solution to the problem that doesn't involve large scale refactorings (I think there are various approaches one could take if we did want to do large-scale refactoring, but we could also do that as a follow-up, and it's not clear that such a thing would be good).

Alternative solutions I can see:

  • Use a long-lived arena (e.g., with a lifetime like 'tcx). I believe @RalfJung was concerned about memory for these tags not being freed "promptly", so a long-lived arena may not be a very good idea.
  • Use indices: big change from using &Allocation.
  • Use a ref-counted value and return a Cow or some such thing that means the allocation may either be ref-counted or borrowed. Again, big refactoring, probably annoying and messy.
  • Use a ref-counted value and a shorter arena (e.g., scoped to a particular set of memory operations). This shorter arena can be used just to hold onto a "second ref" to the Rc<Allocation> such that we can always return a &Allocation (sort of like a Cocoa auto-release pool). Again, big refactoring. This feels potentially clean to me but (a) probably nobody but me understands what I mean and (b) it'd require invasive edits and is probably best done as a follow-up anywya.

Other than that, I don't think there are other options.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 4, 2018

Rebased and fixed typos.

src/librustc/mir/interpret/value.rs Outdated Show resolved Hide resolved
fn static_with_default_tag(
alloc: &'_ Allocation
) -> Cow<'_, Allocation<Self::PointerTag>> {
// We do not use a tag so we can just cheapyl forward the reference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheapyl

src/librustc_mir/interpret/memory.rs Outdated Show resolved Hide resolved
src/librustc_mir/interpret/snapshot.rs Outdated Show resolved Hide resolved
src/librustc_mir/interpret/memory.rs Outdated Show resolved Hide resolved
@RalfJung
Copy link
Member Author

RalfJung commented Oct 5, 2018

Thanks to @oli-obk I was able to get rid of the unsafe code here. :) The actual map used by memory to manage its allocations is now abstracted away through the Machine trait. The miri tool will still need unsafe, but rustc itself and the CTFE engine do not.

@rust-highfive

This comment has been minimized.

@RalfJung
Copy link
Member Author

Rebased, and I hope tidy is also happy now.

write!(msg, "└{0:─^1$}┘ ", target, relocation_width as usize).unwrap();
pos = i + self.pointer_size();
}
trace!("{}", msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably want to do tag-printing here at some point, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so, but I am not sure what the best way is to do that (and to avoid printing useless ()).

// Downcasts only change the layout
assert_eq!(base.extra, None);
assert!(base.meta.is_none());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assert_eq was on purpose, to make sure that base.extra was debug printed if the assertion failed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid an Eq bound on tags. But I now added that to be able to put them into HashMap so I guess I could change these all back.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 10, 2018

@bors r+

@bors
Copy link
Contributor

bors commented Oct 10, 2018

📌 Commit bc9435d has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 10, 2018
@bors
Copy link
Contributor

bors commented Oct 10, 2018

⌛ Testing commit bc9435d with merge 2243fab...

bors added a commit that referenced this pull request Oct 10, 2018
miri engine: basic support for pointer provenance tracking

This enriches pointers with a new member, `tag`, that can be used to do provenance tracking. This is a new type parameter that propagates up through everything. It defaults to `()` (no tag), which is also the value used by CTFE -- but miri will use another type.

The only actually interesting piece here, I think, is what I had to do in the memory's `get`. The problem is that `tcx` (storing the allocations for statics) uses `()` for provenance information. But the machine might need another tag. The machine has a function to do the conversion, but if a conversion actually happened, we need to store the result of this *somewhere* -- we cannot return a pointer into `tcx` as we usually would.
So I introduced `MonoHashMap` which uses `RefCell` to be able to insert new entries even when we just have a shared ref. However, it is important that we can also return shared refs into the map without holding the `RefCell` opan. This is achieved by boxing the values stored in the map, so their addresses remain stable even when the map's table gets reallocated. This is all implemented in `mono_hash_map.rs`.

NOTE: This PR also contains the commits from #54380 (comment). Only the [last two commits](https://github.com/rust-lang/rust/pull/54461/files/8e74ee0998a5b11f28d61600dbb881c7168a4a40..HEAD) are new.
@bors
Copy link
Contributor

bors commented Oct 10, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: oli-obk
Pushing 2243fab to master...

@bors bors merged commit bc9435d into rust-lang:master Oct 10, 2018
@rust-highfive
Copy link
Collaborator

📣 Toolstate changed by #54461!

Tested on commit 2243fab.
Direct link to PR: #54461

💔 miri on windows: test-pass → build-fail (cc @oli-obk @RalfJung @eddyb, @rust-lang/infra).
💔 miri on linux: test-pass → build-fail (cc @oli-obk @RalfJung @eddyb, @rust-lang/infra).

rust-highfive added a commit to rust-lang-nursery/rust-toolstate that referenced this pull request Oct 10, 2018
Tested on commit rust-lang/rust@2243fab.
Direct link to PR: <rust-lang/rust#54461>

💔 miri on windows: test-pass → build-fail (cc @oli-obk @RalfJung @eddyb, @rust-lang/infra).
💔 miri on linux: test-pass → build-fail (cc @oli-obk @RalfJung @eddyb, @rust-lang/infra).
@ljedrz
Copy link
Contributor

ljedrz commented Oct 12, 2018

@RalfJung FYI, this change has worsened the results of max-rss ctfe-family benchmarks by up to 40% (perf link). Is this expected?

@RalfJung
Copy link
Member Author

Uh, no, that's odd. It should just have added a bunch of ZST in a couple places.

How does one go about debugging such things?

@RalfJung
Copy link
Member Author

Oh wait. I got rid of interning vtables... that could be it. I left a FIXME to add a cache. I guess I should put implementing that cache further up on my list.^^

I am not sure if vtables really should be put into tcx -- my gut feeling is that anything that might happen during an execution of the miri tool, shouldn't affect tcx. The fact that allocations in tcx now have a different type (no provenance) than the ones in miri makes sure we don't accidentally do that. But caching vtables within a miri execution makes a lot of sense, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants