-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New panics from sort detecting Ord/PartialOrd violations #129561
Comments
I mean, you can remove the quotes, they can only occur if the user implemented
I believe @Voultapher wrote that, and that he meant 'with high probability'. That said, I don't think we've ever properly evaluated the probability of the panic occurring if there's a bug in the In essence, if your I do remember asking people for advice which of the two we should choose, but not where or if the entire T-libs-api team was involved. At least @thomcc and @workingjubilee are aware, from the top of my head.
The advice is to fix the |
There is an open PR #129003 that aims to improve the |
"failure to comply with the requirements of The first time that the relnotes mention these requirements, they should also mention something like "and |
This has been included in the release notes and libs is satisfied with the wording, closing. |
FWIW I'm a bit surprised by this new behavior and worried that it will cause quite a few crashes in production where it was previously just returning buggy output. Did we consider shipping with a debug-only assertion for a couple of versions before going into full panic mode? |
@Turbo87 I am not sure if you are aware of the following, so apologies if I am belaboring you with things you already know: We cannot enable normal I do not mean to bag on any of Saethlin's work by saying this! We've wanted this power for some time, and we're rapidly getting comfortable with the idea it is an option, but it is still not a peer option with "simply put an So, my personal opinion is if I were maintaining a codebase that was gotcha'd by this, I'd rather just get the current situation. Maybe it's a punch in the mouth, but at least I could clearly see it coming. |
I see, thanks for the explanation! :) |
I am maintaining a codebase that kinda intentionally abused this. The crate in question is I am obviously going to change this, but from my experience there is a lot of code that relies on this behavior (for not ideal reasons). From my experience a lot of this comes from I do think that it's acceptable to make this change but I believe that at least the change to Old documentation:
New documentation:
From where I'm standing this is a backwards incompatible change that I did not expect. |
@mitsuhiko If we're being pedantic, you were violating the contract as well in the old wording. It doesn't say "you may use a non-total ordering, in such a scenario the order is unspecified", it says "the comparator function must define a total ordering for the elements in the slice". How I see it is that you were always using library UB, and the order of the elements being unspecified was just a clarification on what this could entail, not a guarantee that the library UB will always solely consist of this effect. |
Arguably the relevant docs are those for the trait
So it's not completely unreasonable to say users could have known that they are not getting some guaranteed behavior when violating |
@Voultapher The quoted docs was from |
For what it's worth, if we're being extra pedantic, |
I don't care much about the My point is on If you look for |
@mitsuhiko For the record, I understand your frustration, and I was just being pedantic. It's not like we actively sought out to insert a panic for bad ordering, but we ended up in a code path in our implementation where it would not be sound to continue the sort if we detected an invariant was not kept after a loop, which could only occur if the order was not total. At this point we had two options, have a code path in the sort that with full knowledge silently returns badly ordered data to the user, or panic. I initially chose the latter for
And they are all incorrect. I'm sure some uses of this pattern were done with full knowledge that (e.g. in the context of floats) if there exists a single NaN then the entire ordering is unspecified, but I doubt it's even a majority. I expect that most users don't even realize it's wrong, or if they do, that they believe only the position of NaNs would be corrupted rather than the entire order. |
@mitsuhiko I am kind of curious about this:
What was the intentional total order violation, and what is the "mostly stable order" you expected? |
Going at it a little more empirically, using grep.app to get some idea of the scale of the problem, the baseline search They want to sort In effect we have ~0.4% of the |
@mitsuhiko could you please elaborate why you think the |
(Note that everything in the public API here in question is safe functions and safe traits. There is no way to invoke UB using only safe functions and safe traits, or it is a bug in the library. If by "library UB" you mean something other than "misuse of a library that can lead to (language) UB but is not necessarily immediate (language) UB" (e.g. non-UTF-8 data in a The prior documentation of |
@zachs18 By "library UB" I mean violating the requirements of the standard library, in which case in principle any sound behavior is permitted (or any behavior at all for |
As I said before, the implementation on I'm happy if I'm the only idiot that did this wrong :)
In my case anything other than a panic. You would have noticed that the snapshots were probably somewhat unstable. There are a few things that make snapshots unstable and I deal with them as they appear. My general expectation would have been arbitrary order since this is also what other languages do. I'm generally not aware of a sufficiently high level language (please ignore C++ here :)) that errors if a comparator does not have total order but maybe I missed it. Usually you just get nonsense. Given how rare NaNs show up I think in practice you will most likely run into those panics very infrequently and end up surprised. |
Genuine question, why sort at all when arbitrary order is fine? |
I don't want to pile up on this too much but in an ideal world the way sorting would work is that you have a fallible operation you can respond to when it fails. Both random order and panics are bad because they happen rarely so you will have written code that is buggy for a long time until it starts doing the wrong thing once by either crashing or doing the wrong thing. A fallible operation makes it explicit what went wrong. Cases where I think crashing is worse than just not sorting would be computer games. Quite often in simulations you end up getting a It's not so much that it can be in arbitrary order in all times, but the failure more of it just being not sorted is preferable to outright crashing immediately. To be clear: in general I'm not against this change, I just was surprised that it was not seen as backwards incompatible as I would assume quite a few people took it for granted that at worse you get a bad sorting order, and not a panic. Insta is a special case I don't think is worth thinking too much about. It's a bug that needs fixing and in part is fixed, it just happens to be the crate I remembered as being affected so that's where I started. |
For what it's worth, if it turns out the 'panic on invalid I'd like to think we did our due diligence but perhaps we should've done more. Perhaps there could be a better formal process in the future for adding panics to existing function calls, although I guess it doesn't come up a lot as it's a very rare scenario in which it could even be considered. |
I see, thanks for explaining. In my experience, coming from C++ it's a breath of fresh air that Rust prides itself on being explicit and surfacing bugs to users earlier than later, and while a compile-time error for bad The word "crash" is a little unfortunate in this conversation, as it usually implies UB and a segfault, panics can be recovered from and don't necessarily kill a process. |
We add new panics all the time to the standard library. To not have the freedom to do so would prevent internal maintenance, nevermind actually reviewing issues with the API and solving them. I do not believe anything the standard library documents should be considered as a promise not to panic. I say this because there are quite a number of functions in the standard library that panic on certain inputs that do not document that they do so because what happens is they call code that then decides to panic on that input. We don't necessarily consider these calls to be stable interface details, so we don't document that panic. To believe that we have even documented all existing panics is incorrect. |
Explicitly throws: Merely ominously "requires": |
I think it's fine to panic, but I suspect that this change will be incredibly invasive for quite a while. On the one hand because of a lot of pre-existing code. Ruffle for instance recompiled with the latest version of rustc and seems to already cause failures in production: ruffle-rs/ruffle#17785 @workingjubilee since Java and C# defines total ordering for floats for a while I suppose I did not run into that. However for C# in particular I'm fairly certain that I have implemented non total compare order functions before and I don't think I have every run into an exception being thrown. Again, my memory might be hazy, but I have been writing code for .NET for quite a few years on and off and I definitely do not remember getting exceptions from sort. In general creating non total order functions is very simple so I would expect this to come up more more commonly if throwing exceptions/failing would be common. I do absolutely remember bad sort results a ton. But just to I guess hammer down this point: I'm not suggesting this should be changed, at least I do not have a strongly held opinion. I'm primarily surprised about how swift this was changed without much of a discussion ahead of it as I do expect regressions through this change. |
Hm. I'm curious, @mitsuhiko, did you primarily use |
I think both. I looked into this a bit more now out of curiosity and the ones with comparer can definitely fail. The sort code in roslyn basically appears to just catch when the sort algorithm would attempt to index out of bounds and then raises an "comparer is bogus" error. So I guess the situation is similar to here. |
Is there any chance for the old algorithm to be copied into a separate crate, so that people who explicitly relied on the old behavior could easily migrate to it, instead of freezing their Rust version at 1.80? |
@adrian17 If the only behavior you find concerning is the panic, you can catch the panic with At least if you don't run with panic = abort. |
Does WASM allow unwinding? Ruffle compiles to WASM and relied on this behavior. |
I don't know. Regardless, if we could detect |
IIUC as of #111322 one can configure their WASM build to use |
Dinnerbone said on the Ruffle Discord "Even if we could just catch the panic as they suggest, the behaviour will have changed and we'd just go from content working, to content panicking, to content not working" |
@danielhjacobs I don't follow. If you catch the panic the result is the same you would get in the past when the comparator is ill-formed: an array with an unspecified order. If you relied on some specific order being given back when sorting with ill-formed comparators... I'm sorry to say we've never guaranteed that behavior and I can safely say we never will. E.g. I quote from the 1.79
|
@orlp that's correct as per documentation but in reality that's usually not what happens. I'm not aware of what ruffle is doing but there is a difference between "some subslices are ordered" vs "everything is in the order we started with". For instance if you try to sort things by z-index there is a significant rendering artifact difference between random order and just some items being sorted wrong. There can be a significant difference between "random order" and "mostly sorted". And for ruffle in particular if I understand the code correctly they do not control the comparision function as it's user supplied. So the only choice here is to copy the/a sorting algorithm into a crate and to remove the panic. |
@mitsuhiko Even if we wanted to continue the sort instead of panicking, we can't. I'll remind you of this:
It currently only occurs in the smallsort I believe, so I suppose we could try to sort the rest of the recursive calls (which is what a return instead of a Besides, the old algorithms also by no means always returned a "mostly sorted" output, e.g. consider this running with the old sort on 1.80: use std::cmp::Ordering;
fn main() {
#[rustfmt::skip]
let v = vec![
f32::NAN, 85.0, 47.0, 17.0, 34.0, 18.0, 75.0, f32::NAN, f32::NAN, 22.0,
41.0, 38.0, f32::NAN, 36.0, 42.0, 91.0, f32::NAN, 62.0, 84.0, 31.0,
59.0, 31.0, 76.0, 77.0, f32::NAN, 22.0, 56.0, 26.0, 34.0, 81.0,
f32::NAN, 33.0, 92.0, 69.0, 27.0, 14.0, f32::NAN, 29.0, 33.0, 25.0,
81.0, f32::NAN, 98.0, 77.0, 89.0, 67.0, 84.0, 79.0, 33.0, 34.0
];
{
let mut v_clone = v.clone();
v_clone.sort_by(|a, b| a.partial_cmp(b).unwrap_or(Ordering::Equal));
println!("stable: {v_clone:?}\n");
}
{
let mut v_clone = v.clone();
v_clone.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap_or(Ordering::Equal));
println!("unstable: {v_clone:?}");
}
} This gives output: stable: [NaN, 17.0, 18.0, 31.0, 34.0, 38.0, 41.0, 47.0, 59.0, 75.0, 76.0,
77.0, 85.0, NaN, NaN, 22.0, NaN, 36.0, 42.0, 91.0, NaN, 31.0, 62.0, 84.0, NaN,
22.0, 26.0, 34.0, 56.0, 81.0, NaN, 14.0, 27.0, 33.0, 69.0, 92.0, NaN, 25.0,
29.0, 33.0, 81.0, NaN, 33.0, 34.0, 67.0, 77.0, 79.0, 84.0, 89.0, 98.0]
unstable: [NaN, 36.0, 18.0, 34.0, 34.0, 33.0, 22.0, NaN, NaN, 22.0, NaN, 25.0,
NaN, NaN, 33.0, 29.0, NaN, NaN, 14.0, 31.0, 27.0, 31.0, 33.0, NaN, 17.0, 34.0,
26.0, 91.0, 47.0, 81.0, 77.0, 76.0, 62.0, 69.0, 59.0, 84.0, 56.0, 75.0, 85.0,
38.0, 81.0, 41.0, 42.0, 77.0, 89.0, 67.0, 84.0, 79.0, 92.0, 98.0] EDIT: I believe that specifically using |
For the record, ideally we shouldn't be using the stdlib sort anyway, we should eventually roll a custom algorithm. But so far we've been using stdlib sort so as a stopgap, we're indeed considering copying the old sorting algorithm.
From Ruffle's side, we're absolutely not suggesting changing the current algorithm, we're just looking for the easiest way to keep our current code working without freezing Rust to 1.80. |
Derive `Priority::Eq` from `Priority::PartialOrd`. After INC-875, the custom `Ord` implementation for `Priority` was a suspect for occurring panics rust-lang/rust#129561. It turned out the panic occurred somewhere else, but making `Eq` and `PartialOrd` consistent cannot hurt.
1.81 brings the new sort implementations, added in #124032. The new implementations will panic if they detect that Ord doesn't deliver a total order (with new
# Panics
sections added to documentation:These panics represent "bugs" in user code, but it seems plausible that the new source of panics in user programs may be rather unexpected. At minimum, it seems likely that we want some mention of this in the compatibility notes section of 1.81's release notes, but I'm not sure how to best phrase it; "total order violations" is probably considerably too opaque for most of our audience.
I didn't see any discussion of these panics in the PR landing this work, though it may have happened elsewhere, so opening this to (a) make T-libs-api aware, if they weren't already and (b) ask for guidance on the right way to communicate this and any longer-form advice we can recommend/provide on what to do when such a panic is encountered. (Note that this affected at least one sort in our own code, encountered during the bootstrap bump: #128083 (comment)).
I'd also love some clarity on the PR description's phrasing of "detecting strict weak ordering violations with a high chance" -- what does high chance here mean?
cc @Voultapher @orlp
The text was updated successfully, but these errors were encountered: