Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prove the Rust type system sound #9883

Closed
brson opened this issue Oct 16, 2013 · 40 comments
Closed

Prove the Rust type system sound #9883

brson opened this issue Oct 16, 2013 · 40 comments
Labels
A-type-system Area: Type system E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-medium Medium priority

Comments

@brson
Copy link
Contributor

brson commented Oct 16, 2013

We want a reasonable assurance that Rust does what it says.

@brson
Copy link
Contributor Author

brson commented Oct 16, 2013

Nominating.

@nikomatsakis
Copy link
Contributor

To clarify: a proof of some subset of rust

@catamorphism
Copy link
Contributor

1.0, high

@brson
Copy link
Contributor Author

brson commented Jun 27, 2014

Nominating for removal. I expect that we won't be willing to actually block release on this.

@pnkfelix
Copy link
Member

pnkfelix commented Jul 3, 2014

P-high, taking off 1.0 milestone.

@pnkfelix pnkfelix removed this from the 1.0 milestone Jul 3, 2014
@dckc
Copy link
Contributor

dckc commented Nov 17, 2014

I'm interested to look at any recent work on this issue.

The #rust IRC channel topic refers to a Rust 1.0 resources spreadsheet that includes this issue. Is that spreadsheet no longer a good reference for 1.0 status?

Meanwhile, the spreadsheet has Eric Reed assigned. Eric, a pointer to your work would be appreciated.

@steveklabnik
Copy link
Member

Triage: @aturon is working with some external academics on some amount of formalization.

@aturon
Copy link
Member

aturon commented Sep 3, 2015

In particular, we're working with Derek Dreyer's group, which includes @RalfJung. They've made an incredible amount of progress already!

@aturon
Copy link
Member

aturon commented Sep 25, 2015

@RalfJung I wonder if you can put some effort into publicly tracking your progress? Either on this issue or a new one?

@RalfJung
Copy link
Member

I suppose I need something more like a blog for this ;-)

The trouble is, as I already mentioned last time you brought this up - I don't think I have the time to write a full tutorial on all the formal methods I am using. Which means that either I can't say all that much, or what I say will be pretty much incomprehensible.

@dckc
Copy link
Contributor

dckc commented Sep 29, 2015

"pretty much incomprehensible" is fine. I'd be very happy if you'd incrementally commit your work to someplace world-readable, even if you made zero effort to explain it to a wider audience.

@aturon
Copy link
Member

aturon commented Sep 30, 2015

@RalfJung A blog would of course be great, but a reasonable starting point would just be a page -- or even a tracking issue -- that spells out at a very high level what you're trying to accomplish, mentions the aspects of Rust that you're modeling, and tracks how far you've gotten in the various steps.

@RalfJung
Copy link
Member

So, I wrote a blog post on my Rust project right here: https://www.ralfj.de/blog/2015/10/12/formalizing-rust.html

I will probably not have the time to do real-time progress reports in such a detailed fashion. There might be more blog posts on some parts of the formalization, or maybe not - we will see.

Current status

The current status is that I have a formal definition of a language and type-system that have a "rust-y" flavor: There's borrowing and lifetimes, and everything allowed in safe Rust should be also allowed in that type system. The language "lambda-Rust" is a lambda calculus equipped with a heap, but no stack - still, MIR programs can be translated to lambda-Rust fairly directly (if you put the right glasses on, and like doing a CPS transform ;-).

Right now, I am fixing some places of that type system, to allow a few more programs to be type-checked.

I also started working on the semantic model. At this point, I have a first version of formal properties expressing what I want the borrowing system to be able to do. I know there are some issues with these properties which I'll try to fix later this week. I have some ideas for how to actually define the system such that it has all these desired properties, but I am still looking for some more common patterns here that I can abstract away. This will end up being a whole lot of bookkeeping, and I'd like to somehow avoid getting drowned in all the bureaucracy...

@michaelsproul
Copy link
Contributor

@RalfJung: Are you using a proof assistant to manage your model? I'm also interested in verifying Rust's type system.

@RalfJung
Copy link
Member

I plan to formalize the model, and the soundness of the syntactic type system, in Coq. These proofs often become so complicated that I don't really trust my own, hand-written proofs...

@michaelsproul
Copy link
Contributor

@RalfJung: Great! A Coq model of Rust's semantics is like a dream come true. I'm using Coq at the moment to verify language semantics too, and should probably devote all my time to that, but I'll be interested to watch your progress and help if I can :)

@emberian
Copy link
Member

I've been working on bringing up an Isabelle/HOL model of Rust as well, and recently discovered Beluga which may or may not be easier to use.

@emberian
Copy link
Member

(Though I'm not trying to capture as many things as @RalfJung is!)

@RalfJung
Copy link
Member

@cmr So you are talking about a formal model of the language (or some subset of it) and the type system, but without the semantic part of what I described above?

As I mentioned there, currently I am not trying to model exactly what the Rust compiler does. But at some point, I would like to connect this to the MIR and what rustc actually does. Something like: We can translate every MIR program to lambda-Rust, such that well-typed MIR programs become well-typed lambda-Rust programs. Connecting this with the theorems I hope to show about lambda-Rust would end up proving memory safety not just for a "rust-y" language, and for the concepts of ownership and borrowing, but for rustc itself! (Well, assuming the operational semantics of LLVM match with the one of lambda-Rust, which they most certainly don't. But whatever, it'd be a step.)

To do this, I will need a formal version of MIR and borrowck (and hopefully dropck ;-) ).

@emberian
Copy link
Member

@RalfJung What I planned on doing is defining a formal semantics for MIR (or a lower-level IR) to do analysis and verification of programs. I planned on assuming some soundness properties of the type system (so that, eg, I don't have to worry about data races). Sounds like we'll have some interesting things to talk about Wednesday :)

@Ericson2314
Copy link
Contributor

Excellent! Roughly, what does your model look like? I have a dream of getting a CompCert for Rust out of this someday :).

@RalfJung
Copy link
Member

RalfJung commented Nov 6, 2015

As I tried to make clear when I slightly edited my blog post a few hours after its publication, what I am looking for here is, roughly, the essence of borrowing. This means that many of the details that are crucial to an actual compiler of an actual language, are only of superficial interest to me. Of course, as part of my formal model, I need to give some kind of formal operational semantics to some kind of language - but I am fairly free here to make concessions.

So, the language I have could only be a rough starting spot for certifying the existing Rust compiler. That doesn't mean I am not interested in closing this gap, I very much am. Some of it is probably "just" a lot of work, but some of it will need some clever ideas. In particular, we'd have to figure out what the LLVM noalias annotation means, in a strict, formal sense. Nobody has any clue right now. (As far as I know.)

So, coming back to @Ericson2314's question - my model of the memory is inspired by CompCert, but much simpler. I assume all primitive types to have the same size, the size of a single memory location - this way, I don't have to get into the mess of alignment and overlapping reads/writes. My model of the language itself is a lambda-calculus with primitive operations to work on the heap. There is no stack.


Status update

I promised to give short status updates here, and then promptly forgot about this oops. So, here comes the first one.

I did quite some cleanup of the stuff I wrote up previously, trying to get rid of all redundancy and generally make things nicer and less ambiguous. I still use excessive amounts of syntactic sugar on all levels, but I think at least now I clearly define what I do ;-) . Speaking of excess, I also make excessive use of both the Latin and the Greek alphabet as source for variable names, and I'm running out of letters. Seems like I will have to figure out how to put Russian characters into LaTeX, or use multi-letter variable names 😱

I made the following type-check in my type system:

struct Two<'a> {
    f: &'a i32,
    g: &'a i32,
}

fn lazy_lft() {
    let (mut t, f, g) : (Two, i32, i32);
    f = 42;
    t = Two { f: &f, g: &f };
    *t.f; // The lifetime definitely is already active here
    g = 23; // And g is definitely not yet borrowed.
    t.g = &g; // But now I can borrow g at the *old* lifetime
}

The comments explain what is special about this example: When we borrow g in the last line, we borrow it at the lifetime that's used in the type of t - and that lifetime is already "active" long before g is borrowed! My type system can now do this, and I know I'm going to have lots of fun with this feature when I come to prove stuff correct...

For now, I still make one restriction here: When the lifetime starts (in the case of this example, that's where the variables are declared), every variable that will eventually be borrowed at that lifetime has to already exist, and be named. That is, if g would only be introduced in the second-to-last line (making it let g), then I still cannot type-check this code. Lucky enough, Rust agrees with me here. And I sure hope this will never be lifted, because doing so would get me in very serious technical trouble in the proof of correctness...

What else? I made sure with my advisor that it is okay for me to take the "easy" rather than the "general" solution, if that gets the job done :) . And just today, I extended my language to be able to prove data-race freedom for non-atomic locations, in a way that can deal with locations that are subject to both atomic and non-atomic accesses. This is a long and complicated story, but the short version is that I think we came up with a new and equivalent way to define data-race-freedom, such that this result should just fall out of the proofs I'd have done anyway. Yay :D

(Seeing how long this status-update got, I'm thinking maybe I should put them into my blog as well. I don't know what makes more sense, really.)

@mkpankov
Copy link
Contributor

mkpankov commented Nov 6, 2015

(I'm sorry to chime in with this slightly off-topic comment)

@RalfJung as I happened to write my master's in Russian 😄 I have something that might be useful for you:

https://github.com/mkpankov/adaptor/blob/master/doc/latex/DiplomaProject.tex

AFAIR, the important part there is

\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[english,russian]{babel}

...
% This insanity declares macros that emulate Russian letters, like \cyra

\DeclareSymbolFont{cyrillic}{T2A}{cmr}{m}{n}
\def\makecyrsymbol#1#2{%
    \begingroup\edef\temp{\endgroup
        \noexpand\DeclareMathSymbol{\noexpand#1}
        {\noexpand\mathalpha}{cyrillic}%
        {\expandafter\expandafter\expandafter
            \calccyr\expandafter\meaning\csname T2A\string#2\endcsname\end}}%
    \temp}
\expandafter\def\expandafter\calccyr\string\char#1\end{#1}

\makecyrsymbol\ca\cyra
\makecyrsymbol\be\cyrb
\makecyrsymbol\ve\cyrv
...

\makecyrsymbol\CA\CYRA
\makecyrsymbol\BE\CYRB
\makecyrsymbol\VE\CYRV
\makecyrsymbol\GE\CYRG
\makecyrsymbol\DE\CYRD
...

Of course you'll still have to fight font encoding problems but I guess it's a start. Examples of usage of this can be found in other files in the same directory.

@RalfJung
Copy link
Member

RalfJung commented Nov 6, 2015

😆 I wasn't entirely serious, but should I become really desperate, I may get back to this. Thanks :D

@RalfJung
Copy link
Member

RalfJung commented Jan 5, 2016

Long time no status update... I'm not good at doing this frequently, sorry ;-)

I finally started doing serious work on the semantic side of types - that is, figuring out which Iris assertions Rust types correspond to. It turned out that accounting for borrowed pointers ("borrows", as I call them) is way harder than I expected. The operations Rust lets you perform on borrows may seem harmless, but they are actually pretty much ruling out any simplification I could come up with. The worst part is stuff like

fn deref_borrowed_box<T>(b: &mut Box<T>) -> &mut T { *b }
fn deref_borrowed_borrow<T, 'a, 'b>(b: &'a mut &'b mut T) -> &'a mut T { *b }

So, after a few weeks of my idea of types becoming more and more complicated, I had to step back and figure out what the heck was going on. Turns out advisors are advisors for good reasons ;-) . Instead of directly trying to come up with formal definitions of owned and borrowed pointers (we're only talking about mutable borrows here, for now), I tried to somehow formalize my intuition for why all of this seems to be so obviously sound. In the end, I now have a set of proof rules that formalize the idea of "borrowing some permission for some (life)time", completely unrelated to pointers. I proved (yay, first results :D ) that from these proof rules, I can derive all that is going on with owned and mutable pointers. Since most of the complication is in the basic idea of borrowing, that was not extremely hard, but it gives me some confidence that my proof rules actually are useful.

Next step: Prove correctness of these proof rules! That will be hard, really hard - probably the most complicated protocol I have ever seen, and higher-order in too many crazy ways. This is stressing corners of our logic that I didn't even know existed, which means I have to be extra careful not to accidentally skip a proof step. I definitely will want a machine to check this proof for me, eventually. That's what I get for doing this generally :-/ . ("Protocol" is what we call a formal description of the interaction of everybody involved in borrowing permissions for some lifetime, using permissions that have been borrowed for an active lifetime, and getting back permissions out of a dead lifetime.)

I also realized that shared borrows are really subtle. Even with that lifetime logic done, I'll have some "fun" defining what a shared borrow is, in general - and then making instances of that for owned and mutable borrows. This is hard because we can do stuff like this:

fn deref_borrowed_box<T>(b: &Box<T>) -> &T { *b }
fn deref_borrowed_borrow<T, 'a, 'b>(b: &'a &'b mut T) -> &'a T { *b }

I will also have to make sure that my general definition of what constitutes a shared borrow, is wide enough to cover cases like Cell (should not be too hard, but I do want to prove that Cell could be Copy even though right now, it is not - and since this relies on Cell not being Sync, this opens an interesting can of worms) and RefCell (omg this will be crazy... the definition of RefCell, Ref and RefMut all interact in subtle ways, and they have to work for any other type T, e.g. giving you a shared borrow from a Ref - where a "shared borrow" can be lots of crazy things, exactly because types like RefCell are admissible, and can be nested without any restriction). So, yeah, more fun ahead ;-)

I'm also currently involved in two discussions that relate to my formal work: #30424 and https://internals.rust-lang.org/t/comparing-dangling-pointers/3019.

And finally, I found a way to encode in my definitions that data races yield undefined behavior. This means that whatever I do will prove not only absence of memory errors, but also absence of data races :) . The proofs won't get any more complicated, since data race freedom is a fairly natural consequence of the way Rust is set up - but the trick is to set up everything such that my theorems actually show that this is the case. This involves (a) detecting that a data race is happning, so that we can assign undefined behavior, and then (b) proving that there is no UB. Here, (a) was the interesting bit. (The first of the two discussions I mentioned in the previous paragraph is also about detecting UB, namely the UB arising from mutating through a shared borrow, or having aliasing on mutable borrows. This is related to the noalias trouble I already mentioned in earlier posts. My first model is not going to cover this, I already have enough to do with what I do plan to cover :D .)

@dckc
Copy link
Contributor

dckc commented Jan 5, 2016

I'd very much like to look at the coq sources of your work as it progresses, @RalfJung . Is it on the web? If not, would you stick it on the web somewhere, please?

If that's not your style... oh well. I've got plenty of other shiny toys to play with :)

@RalfJung
Copy link
Member

RalfJung commented Jan 5, 2016

There's no Coq formalization of my Rust work yet, it's all on paper only so far. We have to get our Iris formalization straight before I can build something large like the Rust work on top of it. That's a separate project, which I'm actually concurrently working on. That work is public, at https://gitlab.mpi-sws.org/FP/iris-coq/. (The "V2.0" branch is the one which is supposed to be scalable enough to eventually be the foundation for larger projects, like the Rust formalization.)

@dckc
Copy link
Contributor

dckc commented Jan 5, 2016

Ok. Thanks for clearing that up for me.

@RalfJung
Copy link
Member

RalfJung commented Jan 9, 2016

I wrote another blog post about unsafe Rust: https://www.ralfj.de/blog/2016/01/09/the-scope-of-unsafe.html.

@btrask
Copy link

btrask commented Jan 11, 2016

I believe the problem @RalfJung reported could be solved by tagging variables themselves as unsafe. Only unsafe {} code would be able to change unsafe variables, thus preserving "semantic type" rules within safe code.

I don't know enough about Rust to say how unsafe variable tagging would be implemented (traits?), but it seems like it would be backwards compatible. Any existing code would still compile and run, but it might have invariants that are invalidated by safe code (as is currently the case).

@emberian
Copy link
Member

There's an old idea of unsafe fields that would come up for datatypes. It can be emulated with a Unsafe wrapper that has unsafe methods to get references to its interior, or to read/write. It's not widely used, or used at all afaik, since the gains aren't that great. People writing unsafe code generally know the issues, and that privacy is the #1 way the abstraction boundary is maintained

On January 11, 2016 5:41:31 PM EST, Ben Trask notifications@github.com wrote:

I believe the problem @RalfJung reported could be solved by tagging
variables themselves as unsafe. Only unsafe {} code would be able to
change unsafe variables, thus preserving "semantic type" rules within
safe code.

I don't know enough about Rust to say how unsafe variable tagging would
be implemented (traits?), but it seems like it would be backwards
compatible. Any existing code would still compile and run, but it might
have variants that are invalidated by safe code (as is currently the
case).


Reply to this email directly or view it on GitHub:
#9883 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@btrask
Copy link

btrask commented Jan 11, 2016

Thanks for the info.

@solson
Copy link
Member

solson commented Jan 11, 2016

The subject of unsafe fields and @cmr's "privacy is the #1 way the abstraction boundary is maintained" got me thinking. It's pretty well known right now that privacy is necessary to encapsulate invariants which unsafe code assumes for its correctness, but with unsafe fields I think it might not be necessary.

Assuming, for example, that we were willing to publicize Vec's fields (thus stabilizing the representation), and ignoring details like RawVec and NonZero, we could have:

pub struct Vec<T> {
    pub unsafe data: *mut T,
    pub unsafe cap: usize,
    pub unsafe len: usize,
}

Now we could theoretically remove functions like set_len in favour of unsafe { vec.len = val; }. Obviously, code outside Vec's module could now make code inside the module go wrong (e.g. crash or cause undefined behaviour) by trampling its invariants, but this is already true of unsafe code outside the module. In fact, the only thing that would change is that safe code inside the module would no longer be able to be the cause of undefined behaviour, since vec.len = val is equally unsafe inside the module. To summarize:

Current safety-at-module-boundary Vec
Outside module Inside module
Safe code may not cause Vec code to go wrong may cause Vec code to go wrong
Unsafe code may cause Vec code to go wrong may cause Vec code to go wrong
Vec using unsafe fields
Outside module Inside module
Safe code may not cause Vec code to go wrong may not cause Vec code to go wrong
Unsafe code may cause Vec code to go wrong may cause Vec code to go wrong

Note that we actually get this win even if Vec's fields are private unsafe.

As @cmr mentioned on IRC,

the real trick is determining which fields must be unsafe

I think the rule would be:

  • Any field mentioned by any invariant assumed by any unsafe code is an unsafe field.
  • Otherwise, it's a safe field.

In other words, if a field can be set to any value by safe code and no unsafe code will go wrong as a result, it's a safe field.

If what I'm saying holds together, it would mean segfaults and the like would always be caused by code in unsafe blocks or fns — we would just have one new way for unsafe code to be at-fault: "it made an assumption about a safe field".


This possibly doesn't improve the language enough to be a worthwhile addition, but I thought it was an interesting observation. On the other hand, there may be holes in this I didn't consider, so please point them out if you see them.

tl;dr Add unsafe fields and an "unsafe code may not rely on safe fields to have any particular value" rule and you no longer need to rely on privacy to safely encapsulate.

P.S. This is a summary of an IRC discussion.

@RalfJung
Copy link
Member

I also just yesterday evening suggested something similar: https://internals.rust-lang.org/t/pre-rfc-unsafe-types/3073.

However, you mentioned "unsafe variables" above, and I think that's an important observation. Not just fields can have invariants that others rely on; the same holds for local variables. However, I am not sure if it is worth adding such an annotation, considering that local variables are, well, local, so it's not too hard to oversee the code that could modify them and violate their invariants. The story is different for private fields with their module-wide access.

Btw, there's a discussion about unsafe fields here rust-lang/rfcs#381, so it can be kept out of this thread. Unsafe fields are not about soundness proofs anyway ;-)

@glaebhoerl
Copy link
Contributor

Is changing the value of a variable or field the only way that safe code can indirectly break unsafe code? What if unsafe code calls a safe function, and it expects the safe function to return a particular result, but instead it returns something else? That seems to be a completely analogous situation as far as I can tell: there is nothing unsafe about the safe function in itself, except it violates invariants expected by the unsafe code. And as per @RalfJung's reddit comment that unsafety is "caused by" the code which has to be changed to fix it, it seems clear that the safe function would be at fault here. As far as I know this is the basic objection to adding unsafe fields: that it wouldn't actually be sufficient to ensure that only code inside unsafe can be at fault for undefined behavior.

@solson
Copy link
Member

solson commented Jan 12, 2016

Yeah, I think @glaebhoerl's right. I can't really think of a reasonable way to deal with that problem. It's tempting to claim that if safe code breaks an invariant which some unsafe code assumed, then the unsafe code is at fault for making the assumption, but that doesn't really work. As a user of Vec my unsafe code might make (perfectly reasonable) assumptions about the relationship between vec.len() and vec.set_len() and then if vec.len() was changed in some erroneous way that caused undefined behaviour in my code, it would be weird to say my code was at fault.

On another note, does it make any sense to have a way to say in code that std::vec is a module that upholds safety at a privacy boundary to make it more explicit, since finer grained unsafety encapsulation doesn't seem to work? Although I'm not sure if it could serve any purpose beyond documentation.

@RalfJung
Copy link
Member

You are absolutely right. If unsafe code calls other code, it may depend on that other code's functional correctness, not just on its memory safety. This is not restricted to the case where the depended-on code is safe. For example, even if HashMap had been proven safe, it could still be functionally wrong - and this could break unsafe code relying on HashMap being functionally correct.

I don't think it is feasible to make sure that the "code at fault" is guaranteed to be in an unsafe block. We could try to come closer to this goal, but we will never reach it. That's why I am not certain unsafe fields are a good idea.

(Btw, all this is still entirely off-topic here in this issue ;-)

@brson
Copy link
Contributor Author

brson commented Aug 25, 2016

This will be an open issue forever. Closing.

@brson brson closed this as completed Aug 25, 2016
@RalfJung
Copy link
Member

In case you are interested in the current status of my work on this, I recently gave a talk about this at the Paris Rust Meetup. See https://www.ralfj.de/blog/2017/01/20/paris-rust-meetup.html for more info and a recording.

@Mortal
Copy link

Mortal commented Feb 5, 2018

@RalfJung presented his paper "RustBelt: Securing the Foundations of the Rust Programming Language" at POPL2018 on Friday January 12, 2018, and the video is available on YouTube. The talk makes reference to Derek Dreyer's keynote at the same conference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-type-system Area: Type system E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-medium Medium priority
Projects
None yet
Development

No branches or pull requests