-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsafe comparison traits (PartialEq, Eq, PartialOrd, Ord) #956
Conversation
- `a.le(b) <=> a.lt(b) || a.eq(b)` | ||
- `a.ge(b) <=> a.gt(b) || a.eq(b)` | ||
- `a.lt(b) <=> b.gt(a)` | ||
- `a.lt(b) && b.lt(c) => a.lt(c)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably add this form of antisymmetry:
a.lt(b) => !(b.lt(a))
This is in the library documentation already. Of course, the implication must be one-way, because for IEEE NaNs a.lt(b)
and b.lt(a)
will both be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quantheory It is implied by the other rules.
Proof:
Suppose that a.lt(b)
and b.lt(a)
From Rule 6 of PartialOrd
(with a
and b
swapped) and b.lt(a)
, we get a.gt(b)
From Rule 1 of PartialOrd
and a.lt(b)
, we get a.partial_cmp(b) == Some(Less)
From Rule 2 of PartialOrd
and a.gt(b)
, we get a.partial_cmp(b) == Some(Greater)
Therefore, a.partial_cmp(b) == Some(Less)
and a.partial_cmp(b) == Some(Greater)
, which is absurd.
Thus, the assumption is false, and that one of a.lt(b)
and b.lt(a)
is false.
Therefore, a.lt(b) => !(b.lt(a))
Correct me if this proof is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I misread the partial_cmp
bits as =>
. (Well, there is a tiny problem with your proof, which is that b.lt(a)
doesn't have to return at all, but that's not what I meant and not relevant here, probably.)
On a different note, I'm pretty sure that the transitive property is no good here (and as written in the current docs). The reason is that you can define two types in different crates that can compare to the same type in some third crate, but not to each other. Then if you use all three together there's suddenly a problem. It's a modularity hazard.
I think that transitivity should only be required if all of the comparisons are defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a.lt(b) && b.lt(c) => a.lt(c)
It's impossible to guarantee this unless adding an implementation of PartialOrd
is a breaking change.
Consider two crates A
and B
. B
defines a type Y
. A
defines type X
and implements the <=
operator between X
and Y
. Now B
defines a new type Z
and implements the <=
operator for Y
and Z
. Then X <= Y
and Y <= Z
are defined but X <= Z
is not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see that @quantheory already mentioned this.
I don't think I could get behind this without a more clear motivating example at a minimum, since:
|
|
Under "drawbacks" you're missing this case:
What about adding an additional trait which is safe: trait SafeCmp {
type Image;
fn map(&self) -> <Self as SafeCmp>::Image;
} Implementing SafeCmp for a type T is equivalent to implementing whichever of PartialEq, Eq, PartialOrd, Ord are implemented by This allows a good amount of flexibility for safe code to implement these traits by simply composing more primitive, unsafe implementations. By returning a tuple, for example, safe code can order by several fields. In addition, the standard library can provide some useful building blocks, such as a Reverse type which is a newtype of T and just inverts the ordering operations. In combination with tuples this allows the full set of orderings across a set of fields. |
I don't know what would happen, but I'm asserting that it shouldn't cause a memory safety issue (at worst a panic, unless an unsafe interface is used). Anyway, there are reasons to compare things that have nothing to do with sorted containers, so defining a comparison on internally mutable types is not such a ridiculous thing to do, even if they wouldn't play nice with some structures.
I don't see how this follows. Lots of containers provide both safe functions that perform lots of checks, and much faster but unsafe ones for niche cases where the extra speed trumps ergonomics. I'm not sure what cases cause this approach to fail. This is why I think a good example would help.
The comparison operators are different in that they have no inherent semantic relationship to representations in memory, and they have many uses outside of sorted structures. I think that it's much less intuitive, and less likely to aid debugging, to mark them as unsafe. |
@Diggsey Your @quantheory My point is that most types will simply |
👍 in general but if we're doing a breaking change anyway then we should fix the current operator system. I've written down why the current system is fundamentally broken and how I'd like to fix it: Why the current implementation is flawedRust has
The last two are designed to correspond to the mathematical structures called An operator
An operator
Intuitively, total orders look like lines and partial orders look like Unfortunately these concepts cannot be translated to a programming language The equality operatorIn mathematics the In programming languages, however, there are other concerns that prevent us from We can see that the definition of partial order uses the mathematical comparison Rust works around this problem by introducing the
Before you are allowed to implement let a = std::f64::NAN;
assert!(a != a); Since we want to implement The one partial order and coherence rulesAbove we wrote the following definition:
In mathematics you can have many different partial orders and call them This is impossible It's impossible because there is no after. I can implement a type Consider the following dependency graph:
I cannot implement To fix this we would have to either change the graph to this:
or the other way around. Obviously this is impossible. ConclusionI believe we have demonstrated that trying to implement the fundamental How to improve the situationWhat we are trying to accomplish:
Unfortunately the rust trait system is not really powerful enough to express the The
|
I am not sure what do you mean. However I think I can define
|
After some discussion in IRC, I have put the conditions for a trait to require
Are the Note: |
I'm starting to think that the |
I'm not sure of my current opinion on this. Some scattered thoughts:
|
@nikomatsakis: The two types of compiler magic mentioned above are
The second one already seems to be possible without compiler magic: impl<U, T: StrictOrd<U>> WeakOrd<U> for T {
fn f(&self, u: &U) {
<T as StrictOrd<U>>::f(self, u)
}
} But making the implementation symmetric does not seem to work: impl<T, U: WeakOrd<T>> WeakOrd<U> for T {
fn f(&self, u: &U) {
<U as WeakOrd<T>>::f(u, self)
}
} because it also affects impl<T: !WeakOrd<U>, U: WeakOrd<T>> WeakOrd<U> for T { Then no compiler magic would be required. |
In any case, the compiler magic doesn required doesn't look like something that cannot (later) be moved into the general language. |
As a datapoint nothing in the stdlib that I'm aware of would do something unsafe for an awful Ord impl like "false". This is in spite of the fact that much of the std code that works with orderings is unsafe. All ordering-based algorithms/structures I've ever seen have to be prepared for things like "this is the biggest/smallest" which is generally want incoherent Ord impls will slam into. Of course they will produce incoherent results like producing unsorted results or "losing" values for the purpose of lookup, but this will always happen in a safe way. In particular for a collection an all-element iterator will yield all inserted values because iterators are always structural and don't actually invoke comparisons. Bad impls can also cause collections to have degenerately bad perf. Still not a safety issue. I would be very interested to see some real code that empirically runs faster on correct Ord impls by relying on Ord to be coherent. |
@gankro As en example, I think that a fast sorting algorithm implementation could do an array out-of-bounds index given a bad Ord implementation. Dealing with those would likely lead to a slow-down (although by a constant factor). I don't have the time to empirically test this yet. Anybody want to try? |
You'll need to be more concrete than that. QuickSort, MergeSort, InsertionSort, BubbleSort; none of those have this problem. All of them need to handle that an arbitrary element is the biggest/smallest for some subrange, which would prevent any out-of-bounds access. Non-comparison algorithms like RadixSort rely on something more powerful than Ord (bitwise representations). As for fancier algorithms like WikiSort and GrailSort, I'll be the first to admit I'm not the most experienced with them. What I recall of WikiSort there isn't anywhere where you "know" that you'll find an x with x < y or something. The sketchiest thing is maybe a sorting network that it uses as a subroutine. However it's just based on hard-coded comparisons and swaps. It doesn't search at all. GrailSort's literature is pretty impenetrable, so I really have no idea. I think it's pretty similar to WikiSort, though. |
The only thing I can think of is honestly that some other unsafe code is relying on a data structure or algorithm to not behave entirely insane. Like if it doesn't find the given key in the map, or the element doesn't sort to be the minimum it does something unsafe. But that's... super theoretical. |
Another major problem here actually seems to be that it's extremely hard to figure out exactly what code is relying on what guarantees, and that code is likely to be extremely unpredictable when those guarantees are broken. In other languages, this is no big deal since the worst you can do is throw an exception, but we have to be careful with this situation, because in Rust you can cause memory unsafety. What we have to be most careful of is ruining our main guarantee: if you write safe rust code, your code will be memory safe. If we have to amend that to "if you write safe rust code, your code will be memory safe, as long as you don't write any incorrect code" then we've done ourselves a heavy blow. I think the only way this is really feasible is for us to create a guideline that unsafe code behind a safe interface (an unsafe interface can do anything) must not violate memory safety even if safe code does crazy things. |
@gankro I am thinking about a quicksort implementation that goes out of bounds if @reem I suppose that your argument is in favor of this RFC, right? I think that the simplest way to solve this problem is to rely on code in an unsafe trait not doing crazy things. We already rely on sane |
@gankro I think I do not have to write a proof-of-concept any more, since I found this real world example with C++. Now, you cannot say that it is just theoretical. |
@theemathas the difference is that |
It occurred to me just now that this could be handled in the same way that we handle "trusted iterator length" for |
Thinking about this, what would be wrong with unsafe trait StrictEq: Eq {}
unsafe trait StrictOrd: StrictEq+Ord {}
// ...
unsafe impl StrictEq for i32 {}
unsafe impl StrictOrd for i32 {} I guess that this is basically the same thing as the If someone forgets to implement #[derive(Ord, PartialOrd, Eq, PartialEq)]
struct Wrapper(ExternalType);
unsafe impl StrictEq for Wrapper {}
unsafe impl StrictOrd for Wrapper {} Of course there's the whole issue with "derive", but I kind of feel like we should fix that anyway. Maybe the simplest thing would be to have a marker that says "derive not only this, but all derivable super-traits, recursively"). E.g. to pick an arbitrary syntax (I'm sure that something like this has been proposed before): // Instead of `#[derive(Ord, PartialOrd, Eq, PartialEq, Copy, Clone, Debug)]`
#[derive(Ord:*, Copy, Clone, Debug)]
struct MyType;
// Instead of `#[derive(StrictOrd, Ord, PartialOrd, StrictEq, Eq, PartialEq, Copy, Clone, Debug)]`
#[derive(StrictOrd:*, Copy, Clone, Debug)]
struct MyOtherType; |
@huonw The problem is that I still don't know how did we / are we going to solve the iterator problem "whatever way that is". I also believe that the problem with comparisons is easier to solve than the problem with iterators. |
@theemathas Could you go into more detail about why you think this problem is easier? AFAIK, the problems are isomorphic: there's a constraint about the behaviour of certain methods that E.g. if trait Eq: PartialEq {
unsafe fn is_correct() -> bool { false }
}
// in std:
impl Eq for u8 {
unsafe fn is_correct() -> bool { true }
}
// in an external lib:
impl Eq for Dubious {}
|
@huonw The main differences are that making |
I didn't address this only for the simple reason that I don't have any better ideas than what have already been floated. Basically the question is about where to put the I think I might prefer If we had the Glasgow Haskell-like ability to store evidence of trait bounds in datatypes and retrieve it by pattern matching (i.e.
Thinking out of the box, while it's not possible to prove that |
@huonw I retract my claim that the safety of comparisons is easier to deal with than of iterators, due to rust-lang/rust#23452 I now think that comparison traits are harder than the |
For amusement purposes, you might like to read about this CTF challenge, where participants had to actually exploit the GNU libstdc++ |
That seems to offer a fairly reasonable way to tackle the comparison traits too, via something like the sketch in #956 (comment) . |
@huonw Your alternative is interesting, but should we put it in By the way, it might be better for |
So we have a demonstrated 10% performance gain when removing the checks. What's the worst case if the comparator is not correct? If it's a security hole or seg-fault, I'm not sure if it's worth it... That said, this (very roughly, probably wrong property syntax) is what I'd propose as an alternative to unsafe traits:
There's an added (unnecessary) feature here: |
I like the @huonw 's approach with |
I would give it a different name though. |
I've been reading and re-reading this comment thread and RFC and I confess to feeling somewhat torn. On the one hand, I think that the current On the other hand, I am uncomfortable with inching the line on where I guess what I think I really want is to go the opposite direction from this RFC and simplify the current hierarchy, since the "guarantees" we're trying to enforce aren't real guarantees. Drop the All that said, what I really don't want to do is detail the 1.0 effort over this question, whichever way it is decided. It feels like with specialization we could probably move to the simple, collapsed hierarchy approach via deprecation after 1.0, which is probably a more practical path. (For that matter, depending on the details, some form of specialization might also allow us to have a sort that takes advantage of unsafe traits when they are available and falls back to something else otherwise.) |
@nikomatsakis you can get quite more troublesome results using NaNs in a sorted map (unable to find things that are inserted). |
To be clear, this isn't unsafe, it's just more nasty than NaN's giving a stupid sort. |
@gankro oh, I know, things can go totally wrong, same as if you are use a hashtable key that changes its hash mid-execution. |
@nikomatsakis The difference is that a hashtable implementation would probably want to hash the value only once. |
Shouldn't this rfc explicitly state "never panic" as part of the rules for the unsafe comparisons? Regarding the connection to the "finally" issue. |
The discussion on this RFC seems to have stalled without a clear consensus. In any case, now that beta is released, I don't think breaking changes like this are in the cards. It seems likely we can layer these kinds of improvements on later if necessary. That end, I've opened issue #1051 and I'm going to close this RFC as postponed. Thanks all for the discussion and thoughts, and thanks @theemathas for the well-reasoned RFC. |
rendered
See previous discussion at #926