-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datasort refinements to Rust #1679
Comments
|
In response to the last question: the difference is just that the compiler has special knowledge about datasort refinements and can use it to eliminate checks. In the example I already gave, if In general it's hard to see how arbitrary predicates applied to single immutable values can be exploited by the compiler to do optimizations. |
Please don't. I agree that alts should be exhaustive by default, but some just aren't, and being able to use a non-exhaustive alt as an assertion is a very valuable thing to have. I assume you're just adding I'd much prefer if you added a way to explicitly specify that an alt is non-exhaustive, which would disable the warning/error for non-exhaustiveness, and make the compiler generate the implicit error path that current alts have. |
Marijn: What do you think of the proposal above? It accomplishes the same goal as your suggestion ("a way to explicitly specify that an alt is non-exhaustive"), but with less clutter (because annotations are part of a type signature, there will be fewer of them), and with a static guarantee that there will be no non-exhaustive alt failures (because it becomes possible to check statically that all alts are actually exhaustive). |
The problem is that proving that a given value was constructed by a specific (subset of) variant will, in a lot of situations, be equivalent to the halting problem. So you get the same thing that makes typestate painful to use: you'll have to run a check on your value (and probably also first put it in a local variable), to be able to pattern-match on it. In some programs this might work out, but I think there'll be enough cases where it won't, in which a non-statically-proven-safe form of non-exhaustive alt is still useful. |
Another problem that I see is that inserting an "alt-unsafe" (or whatever) form doesn't encourage the programmer to document why it's actually safe to assume that the alt will actually match. Allowing the programmer to state invariants by giving more precise types to values than is currently possible encourages more rigorous reasoning, IMO. "alt-unsafe" could encourage people to just throw it in until the compiler is happy. |
Where I'm coming from with this is that I'm looking at the many, many non-exhaustive alts in rustc and asking: "Why is this alt guaranteed to be exhaustive? What's the invariant?" And since in every single case, there is no documentation of why the code author expected that to be true, it's always unclear whether the non-exhaustivity is an error (that has coincidentally worked so far) or intentional. And I don't think having a different alt-unsafe form does much about that problem. I guess it does document one bit of information about the programmer's intent (that they believe there's an invariant), but not what the invariant is. |
Well, at the risk of sounding somewhat snarky, the invariant is that one of the patterns in the alt will match. I agree this doesn't usually amount to a perfect explanation, but neither will a special type or a predicate being applied to the value. (Don't get me wrong, I think the feature proposed here would be awesome. I just don't think it's a full solution to the unexhastive alt problem.) |
Well, you could also say that in a unityped language, the invariant is that there will be no dynamic type errors. However, statically typed languages allows richer invariants to be specified through the type system. In this case, the reason why the programmer believes that one of the pattern will match presumably has something to do with dataflow and control flow properties, and types are a way to express an approximation to these properties. As you pointed out, the properties that need to be specified may be undecidable in general, but I would say that if it's not the case that most invariants that currently underlie inexhaustive cases in rustc are simple to state, then there's something wrong. The whole principle behind static typing is that while some of the properties we might like to check are undecidable, many useful properties can be captured in a decidable type system. |
I have a wondrous solution! You know how we have I hereby propose
This actually generalizes nicely to refutable patterns in
|
(Granted, this has the slightly non-intuitive behavior that you have to write |
|
Graydon: this looks OK to me, though I would prefer to call it something different from "alt check" (I'm not sure what). A construct like "alt check" would be necessary anyway for introducing values that have refined types. The only thing non-standard in what you're suggesting is that, basically, the "failure" case would be implicit (the compiler would insert an implicit fail in the case where none of the listed constructors match). |
I think the name "alt check" is confusing because "if check" is the version of "check" that never aborts the program if the check fails -- whereas an "alt check" can abort the program. |
Yeah. It's true, the thing is not exactly analogous to |
I strongly disagree with @marijnh. I want to know when my alts are not exhaustive (it can be a warning, as long as we have -Werror :). Further, I want a way to prove to the type system that my alt IS in fact exhaustive, because I only need to consider a subset of cases here. We have neither of those things right now. I've found that Scala (which does support this sort of thing, through subtyping) made enums substantially more expressive than O'Caml. @pcwalton @dherman and I have been kicking around ideas on this for a while. An early proposal was something like:
However, this was not satisfactory for several reasons. First, rightward drift. Second, it only supported trees. What if I had a case that was both an lvalue and tuple-like? @pcwalton just proposed an interesting idea where you can "expand" an enum, so you might write:
Note that
This also neatly solves the problem of "open enums" that forced Scala to add This does imply subtyping between enums: but no solution to this problem gets around that. (Refinement types will have subtyping relations based on their predicates) Anyway, this is a straw man. |
If I may ask to calm the conversation a bit; the third adjective in our language summary is "practical", so I don't want to consider scala as our target-to-beat. It's blown the user complexity budget several times over. We have managed to avoid introducing general subtyping so far and I would like to continue trying to avoid it. Part of the purpose of the typestate system is to let off steam that would otherwise push us towards more-elaborate static reasoning and over-burdening the type system. Inference and exactness can remain out of reach; the motivated user can add Further consideration of |
@graydon (or anyone) -- any comment on separating the implementation of datasort refinements from the existing typestate mechanism (so that they can be used to eliminate checks)? |
Long story short: I would like to solve this problem. If it can be done well through typestate I'm interested. Otherwise, I think extensible enums seem promising. I think both refined types and extensible enums are a form of subtyping and are more-or-less equivalent in that regard.
I agree. I just am not above borrowing a good idea.
I don't understand how having subtyping only in refinements is not subtyping.
Why is this any less true here? The inference may not always pick the right thing. Maybe you have to specify the type manually or write a "downcast" (let x = check x as tuple_like). I mean, a check predicate is a downcast too. All that said, I am sympathetic to wanting to use typestate to solve this problem so that we don't have too many tools in the toolshed. I'd like to see a more specific proposal: I imagine we'd want to have special "per-variant" predicates and then some way to declare a predicate that says "it's one of these 5 variants". These variant predicates would not be opaque to the compiler (unlike other predicates). I haven't read up on the past work you cited; I will soon (bit busy right now). I could imagine it being useful to have the dataflow propagation. @catamorphism I am not too worried about the soundness issue. If you write a bogus |
@nikomatsakis What I'm worried about with bogus predicates is the compiler taking advantage of knowledge about impossible cases, and generating code that results in a segfault if the predicates return misleading results. |
@catamorphism Yes, I know. That is something to keep in mind for sure. I just mean that allowing us to prune down exhaustiveness checking will not, in and of itself, lead to seg faults. I can imagine that other optimizations might. |
@catamorphism A separate pass is plausible, I guess? Predicates are all supposed to be pure though. Redundant check elimination ought equally well to apply to any predicate, if it's statically holding in the typestate of the check.
This gets a bit language-theoretical, but I'll try to state my beliefs as concisely as possible. Sorry if they're a bit unorthodox. Mainly I disagree with the broadest interpretation of the phrase "types are theorems". I think it is a bit of an oversimplification of language technology. Types certainly are theorems, but I think they're not just "any old theorems", and I think a "type checker" is not the right place to prove every theorem. Specifically, the theorems I think "belong" in types are those that describe the correctness of actual machine-level instructions such as arithmetic (are we operating on the right kind of number) and load/store (are we following pointers into dead memory). Correctness at that level is usually plausible to build a type system for with very nice properties:
Lots of type systems don't have those properties, but I think those languages are abusing their type systems and users often pay the price: by mixing "nice" and also very necessary machine-representation questions ("can I navigate the data structure") with higher-level program-correctness issues that have been "encoded" into types. Everyone has some favourite haskell, ML, scala or C++ error messages they can mention from such delightful "encoding" of deeper semantic checks in the type system. I think the cost in program comprehensibility and compile time by mixing those issues into the type system is not the right balance, and it's better to add secondary layers of static analysis beyond the types. Even though in a strict Curry-Howard sense the secondary layers are just "more types", they're types that may be expressed through a completely different language-level UI (unordered predicate lists), inference and checking rules (flow-sensivity), set of manual overrides (runtime predicate-checks vs. reinterpret_cast) and error-reporting system. IMO those differences matter. |
We should try to avoid debates getting too abstract in bug threads. Let's discuss feature concerns and proposals in our group meetings. |
Neither of these conflict with my proposal, so maybe you just didn't read it very well. |
I read your proposal. I'm not necessarily opposed to having some sort of way to say that an alt is intentionally non-exhaustive. However, I think if we do our job right, it should be very rarely used. Therefore, I am not sure that I want to make it easy to write non-exhaustive alts, I'd rather encourage people to use the type system and prove that those cases are impossible. p.s., @graydon, I somehow missed your post regarding |
Also, I don't see why we have to limit the forms of patterns that are allowed in an
That said, I think @dherman is probably right that we ought to continue this in another forum---which I am not respecting, sorry about that---and anyhow I'd like to let the various ideas sit for a bit. So I'll stop writing now. I am mostly just turning over this idea about Anyway, I think the critical question is: "Besides datasort refinements, what else do we plan to use typestate for?" So far it hasn't seen much (productive) use, unfortunately. If we do find that we use it for a lot, it probably makes sense to use it for datasorts as well. If datasorts are all we use it for, I am not sure it's the simplest way to get these benefits. |
It is only a way to flag an alt as intentionally non-exhaustive right now. Issue #1679
I see that I am a little late to the party. I won't comment on the various ways of dealing with nonexhaustiveness, but I think it is very important to have arbitrary refinements for subtyping enums, so the type system could let us match exhaustively when we want to match exhaustively on a subset of things. @nikomatsakis wrote:
I propose the following syntax, which allows you to write one definition per useful subset instead of one per intersection-of-subsets (2^n?). (said perhaps more intuitively: one per circle in a venn diagram, instead of one per area in a venn diagram.)
All you have to write in the refinements are the arms - no need to repeat the stuff they carry (since they have to be the same). If @pcwalton adds his case-class-y idea of having fields shared in all arms, you wouldn't need to write duplicates of that in the refinements either for the same reason. The |
@catamorphism I am not sure what you meant by this:
I guess I am not sure what you mean by "this". If we use the case class-like approach, in particular, this kind of correctness seems to just fall out of the standard soundness of the type system. |
@nikomatsakis By "this" I meant any sort of datasort refinements, but I guess I'm still not familiar with the case class proposal (has it been written up somewhere?), so I wasn't thinking of that approach in particular. |
I'm more curious about how you would actually use arbitrary-subset forms, not declare them. As in: what would the type checker infer for the Alternatively, if it's really just about missed pattern cases, you could have possibly gotten by with other mechanisms: Macros:
Pattern guards:
The former gets you static exhaustiveness checking, the latter gets you turing-complete refinements. As far as the suggestion that all our current I agree that, in general, the reason 'effectively-non-exhaustive" Since it didn't get written down above, here's what @pcwalton suggested on IRC:
This combination lets you encode "tree-shaped" refinements, so to speak, by factoring your I'm curious about an honest comparison between this and arbitrary-subsets. What's easy, what's hard. Try modeling cases that matter to you as each, see how they fare. |
I think @graydon summarized the situation pretty well. I definitely agree that For what it's worth, in my PhD work I built a compiler in Scala which does support arbitrary subset refinements. They do this by using a |
@graydon Neither the macro nor the pattern-guard approach actually allows the compiler to omit checks safely, so far as I can see. (That is, even if you know you're matching on a I'd like to see a code example for the "partial support for refinements" proposal before I pass judgment on it. |
@catamorphism the macro one would, assuming you mentioned enough subsets to totally cover the cases. And if you didn't, the compiler would tell you. That's why I said it gives static exhaustiveness checking. Of course the |
@graydon Re: the macro one, I meant where you know (I'll confess that I'm not in a huge rush to implement anything that's been discussed on this page, so it doesn't seem like a pressing question. Unless someone else wants to do it.) |
I would still like to implement something along these lines, but not until after 1.0. |
I was taking a look at #2896, and then I ran across this bug which looked really similar. Most of this syntax/discussion far predates when I came into rust, but it looks to be mostly outdated now. It seems as if this issue boils down to what #2896 wants to do, and if so one of them should probably be closed. |
No, refinement types are different from pattern checking. The first one requires the programmer to make their intentions clear by writing their types differently. The second one is more about using fancy static analysis to let the programmer write something that looks like it might fail with a non-exhaustive match error, but potentially doesn't. Both are useful, but both are wishlist items that aren't required for 1.0. In other words, refinement types are about giving programmers the tools to express invariants and prove them to the compiler; whereas tools like Catch are about saying "I know it's safe and I want the compiler to do the work of proving that." |
@catamorphism is this still relevant even though typestate is gone? |
This is a different feature from typestate. |
Ah, reading the full thread this becomes obvious; I had glanced through some of the earlier comments that heavily involved typestate. |
Here is my latest proposal for this, for reference: http://smallcultfollowing.com/babysteps/blog/2012/08/24/datasort-refinements/ Also, I've been thinking that this would be fairly straightforward to implement and may be worth the trouble. There are a few thorny special cases that would go away if we could give One concern I have is that while the design I laid out above is quite straightforward, it lacks the ability to parameterize types over the refinement. In other words, imagine I wanted to make a newtype'd option
Inelegant but perhaps adequate in practice. Anyway, I wanted to get this (rather old at this point) blog post linked to the issue. |
A change like this needs to go through the (relatively new) RFC process: https://github.com/rust-lang/rfcs/ (And it is actually possible that one of the existing RFC's may cover the use case(s) documented here.) I am closing this ticket; if anyone wants to revive the request for this feature, please write up a proper RFC for it. |
I think this issue should be moved to the RFCs repo (as an issue). cc @nick29581 |
@P1start it's already been closed. As @pnkfelix says:
|
@steveklabnik Yes, but it was closed before we started having issues in the RFCs repo. I am under the impression that issues that should have RFCs written for them (such as this one) should be open issues in the RFCs repo, not closed issues in rust-lang/rust. |
@P1start so, my thought process was basically "It's not like we're gonna dig through all the old issues and move any one that's relevant over, let's wait until someone says something," but that ignores that, uh, here, you're basically saying something. So you're right. Sorry about that! |
I'm pulling a massive triage effort to get us ready for 1.0. As part of this, I'm moving stuff that's wishlist-like to the RFCs repo, as that's where major new things should get discussed/prioritized. This issue has been moved to the RFCs repo: rust-lang/rfcs#754 |
Looking through the rustc source code, it seems like many potential areas for typestate predicates involve cases where a value has an enum type and is known to be built with one of a subset of that enum type's constructors. I'm especially noticing this now that I'm going through and making all
alt
s exhaustive. For example:We'd like to give
whee
a type that saysx
must be constructed with constructorbar
-- in other words, a type that's a refinement onfoo
-- so that we don't need to write thefail
case (because we can statically check thatx
is in the right subset offoo
). We can do this right now with typestate predicates. However, datasort refinements seem to be a common enough case that it might be worth treating them separately. Also, datasort refinements can be implemented without needing to worry about any of the purity issues that have made typestate difficult, because once something ofenum
type is constructed, its tag is immutable.The other reason to handle this separately from typestate is that the compiler never uses typestate information to eliminate any checks (since predicates may be unsound), whereas datasort refinements can be checked statically and used to eliminate unnecessary tests in
alt
s.Thoughts?
The text was updated successfully, but these errors were encountered: