-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow comparisons between integers of different types #2021
Conversation
|
||
Implementation for signed-singed and unsigned-unsigned pairs should promote values to larger type first, then perform comparison. | ||
|
||
Implementation for signed-unsigned pairs should first check if signed value less than zero. If not, then it should promote both values to signed type with the same size as larger argument type and perform comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that promotion should be to the "unsigned" type since an unsigned type can hold values larger than a signed type of the same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both signed and unsigned types can hold values other type cannot.
You either have to check that signed value is non-negative then cast to unsigned, or have to check that unsigned value is not greater than maximum value of signed type.
Example:
fn less1(a: i32, b: u32) -> bool {
if a < 0 {
return true;
} else {
return (a as u32) < b;
}
}
fn less2(a: i32, b: u32) -> bool {
if b > std::u32::MAX {
return true;
} else {
return a < (b as i32);
}
}
I personally find check for non-negativeness a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean std::i32::MAX as u32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find check for non-negativeness a bit cleaner.
It's also less prone to the typo of checking whether a u32
is greater than std::u32::MAX
rather than std::i32::MAX
.
EDIT: ...and I just realized that's what @eddyb meant by his comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VFLashM You have a typo in the RFC. It says "promote both values to signed type", which actually overflows for b > std::i32::MAX
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, darn. Your're right. That's a typo in RFC. I absolutely meant promotion to unsigned.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC says:
If not, then it should promote both values to signed type
Your first code snippet says:
return (a as u32) < b;
I think this is all that @TimNN was saying: after checking for less-than-zero, the as
afterwards should be to the unsigned type, not the signed type as the RFC currently says.
EDIT: the previous two comments hadn't appeared when I wrote this :)
It is completely unnecessary (and impossible due to backwards compatibility) to change |
I suddenly found that operator overloading only requires you to implement partial traits. Makes sense in hindsight. You're absolutely right, changing It doesn't look impossible though. It obviously should be defined in backward compatible form:
But anyway, you're absolutely right, it's more about missing implementations. |
I didn't actually know that was possible. My bad. :) |
… questions Ord->PartialOrd typo is fixed
Any chance we could shoehorn in more operators? |
@le-jzr You just meant that informally as "those specific numeric types which implement |
Yes, I was thinking of primitive integer types only. Sorry for the confusion. :) |
I thought about compound operators too, but I think it's a separate change. |
Really old discussion about this and other ideas, for the record: https://internals.rust-lang.org/t/implicit-widening-polymorphic-indexing-and-similar-ideas/1141 |
By the way,
This is not entirely true. |
If the signed integer in a signed-unsigned pair is the larger type, converting the unsigned integer to the larger type and then comparing would avoid the branch. I don't know enough about assembly to know if this is what already happens at that level when comparing integers of different sizes, but it feels like it ought to be. There might be something to be said for always converting signed-unsigned pairs where the signed type isn't the larger type to an integer that's one level above it, then comparing. This would work for everything through 32 bits, though I imagine doing it from 64 to 128 might be prohibitively expensive. The solution proposed in this RFC concerns me because it adds an invisible branch inside a comparison of primitive types. Given that Rust is a systems language, being able to tell where your code might start causing tons of branch mispredictions is valuable for performance optimization. This approach may also prevent performance of autovecctorization through obfuscation of those patterns. It would be incredibly hard to teach people about this pitfall because we would have to explain these issues, neither of which are exactly newbie-level. The only way to see that it's happening is to read assembly. Putting it in the nomicon makes it hard to find. If I had my way, I would like Rust to allow comparison and mathematical operations between all pairs of signed types and all pairs of unsigned types, but not to invisibly cross between them. Essentially C's rules without the invisible viral infecting caused by unsigned. I'm mostly playing devil's advocate. This RFC would make my personal code more convenient. Nonetheless I have done some high performance programming and SIMD, and having otherwise-invisible branches just show up like this concerns me. |
Micro-optimization of generated assembly is not exactly a newbie-level concept either. I think that if someone is advanced enough to care about branch misprediction, they can easily understand that Other than that, most cases can be solved just by extending one or both sides, and I don't see a reason why that wouldn't happen whenever possible. |
You're right. Widening to signed will likely be faster up to machine word size. Branch prediction will likely be bad regardless. In most cases there's already a branch there, after all. |
…signed, more examples
Do not propose a particular implementation in the RFC and you’ll avoid the unnecessary yet infinite bikeshed. Instead list the requirements, expected behaviour and all that sort of stuff. Implementors will be able to figure out the best implementation for such a trivial feature themselves. |
Also, |
You're probably right. I just felt that the fact that it can be done without overhead in many cases is important enough.
Rust code snippets are just an illustration of approach, not an exact implementation. |
At first, I wanted to object to this, because I like how Rust prevents mixing integer types, and avoids automatic promotions/demotions, unlike C. However, if the compiler only does safe comparisons, never any that could allow overflow or similar, then I don't have a fundamental objection, and I like the idea of helping the user do the right thing rather than forcing a potentially incorrect use of What algorithm would this use to compare |
It has to be something along these lines:
Not sure about exact asm implementation, but it will be more expensive than regular comparisons. |
@le-jzr When looking at an integer comparison, it is useful to obviously know what is going on. I don't like the idea of implementations that do potentially nonobvious things with primitive types that presumably map directly to the hardware. Let's assume that I am advanced enough to read assembly. Where do we document this? How do I find out why my assembly is odd? When I look at Rust vs. equivalent C and have the knowledge to know what kinds of patterns should be efficient, how do I find out what strange difference of Rust makes it not work? How do I find the one spot that putting an explicit cast somehow has better performance than letting the compiler implicitly do it and understand why? If we want to do this, Rust should develop and have clearly documented implicit integer conversion rules. Dropping the implementation from the RFC is fine, just so long as what's going on remains as obvious as it does in C/C++ (or better, if we don't let unsigned start infecting and screwing up seemingly-straightforward computations). Specifically, I'd advocate for dropping i64 to u64 and potentially not allow promotion to i128. I wouldn't advocate for dropping implementation from the RFC or the discussion; this is the first time I've seen an RFC where I think that's actually important. |
You can also go with |
@camlorn |
@camlorn |
There likely won't be any difference at asm level. Then again, I'm not proposing an exact asm implementation. The important part is that it will be more expensive than regular cast. |
Can this maybe get some libs-team attention under the umbrella of the ergonomics initiative before the impl period begins? (I see @aturon just self-assigned it - possibly with that intent?) |
@glaebhoerl Yep, I will try to push on this. If anyone wants to summarize the current state (i.e. major open questions or tradeoffs), that'd be super helpful. cc @scottmcm :-) |
I think the major unknown here is still how to do this without huge inference breakage. To use an example from above, in The main back-and-forth is essentially whether this is useful. On one side is appreciation that this has intuitive behaviour and makes the safe-and-correct thing easy, since it's much easier to do this wrongly than correctly today, and comparison can be done without pitfalls like surprising overflows. Simultaneously there's criticism that this hides mistakes elsewhere (that are compilation errors today), makes the machine code emitted by the operators less obvious, leads Rust towards C-style fuzziness about types, and that the correct thing to do is to change types elsewhere so that there isn't a type mismatch. |
Providing a gist of my earlier comments: Integer types are different. Some can be negative, some positive, they have different overflowing behaviours. This is not something you can gloss over. I see the strong typing regarding to integer types as a feature of Rust, not a problem. This RFC would undermine that feature and is therefore bad. This strong typing of integer types allows to play "type lego". If you have a C/C++ doesn't follow that approach and I don't really like that. Of course, you sometimes have cases where you actually want to compare integers of different types. Right now this is not easy to do, e.g. comparing usize to isize. There is definitely demand for that, and having some way to do these comparisons is better than forcing people to solve the issue badly (there is no good way to compare usize to isize with casting only). However, for these cases I'd like if there was an opt-in way to do it, as in you somehow explicitly state you want a comparison of different types. End of gist. As an alternative proposal, you could have an |
@est31 I very much like the idea of having a convenient way to explicitly say "I'd like to compare these two numbers, please DTRT", while otherwise preserving integer type distinctions. |
Another alternative would be having the full set of |
@scottmcm ( and this actually answers to most points from @est31 as well )
This is a serious concern indeed. In theory type of integral literal should be polymorphic "any integral type wide enough to contain value specified". In practice I'd rather require explicit types for integrals not fitting into
I'd argue that it doesn't. It does help sometime, when it prevents you from comparing values of semantically different types that happened to be implemented by different integral types (as in comparing seconds stored as i32 and milliseconds stored as i64), but it's more of a coincidence than the rule (it won't prevent you from comparing seconds and milliseconds if both are stored as i32). The correct way to avoid such errors is to create a new distinct type for each semantically different type. On top of that, in reality there almost always will be some other operation, which is totally incorrect for different types, and that operation will be caught by type system (cross-type assignment at the very least).
That's true. On top of that it's not only less obvious, but also less efficient.
Strictness is only useful because it helps finding problems early. If change enhances ergonomics and don't introduce new classes of errors, then I see no reason in strictness for sake of strictness.
The whole point is that cross-type comparison is mathematically "correct". It may reduce performance, but overall your code must have some other problem for it to be incorrect. For example, in most such cases you'll also have cross-type assignment, which is still a compilation error. The only case that won't be caught somewhere else is logical type mishap, but then again, there isn't much you can do about it anyway. |
|
||
`PartialEq` and `PartialOrd` should be implemented for all pairs of signed/unsigend 8/16/32/64/(128) bit integers and `isize`/`usize` variants. | ||
|
||
Implementation for signed-singed and unsigned-unsigned pairs should promote values to larger type first, then perform comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "signed-singed" should be "signed-signed"
Thanks @VFLashM for the RFC and @scottmcm, for the summary, and others for adding their thoughts! We discussed this RFC in the lang team meeting, and while there's continued interest in improving ergonomics around numerics (without sacrificing reliability), the key pitfall here is unknowns around type inference. In particular, it seems feasible that we could implement something like this RFC with some kind of inference fallback scheme, but such schemes in the past have caused headaches. During the impl period, we will continue ongoing work overhauling the way type inference and the trait system work, one component of which is better support for "suggestions" as a general mechanism for fallback. At that point, we could experiment with adding more comparison impls with a designated "default" one to apply. Until then, though, we're really not in a position to make the proposed change. In the meantime, in the interest of keeping our tracker clear, I'm going to: @rfcbot fcp postpone |
Team member @aturon has proposed to postpone this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
I understand that the inference aspects are tricky, and that we probably can't make this change until we have some way to say "but default them to being the same if there are no other constraints are in play." That said, I have wanted this a lot considering the amount of Rust code I've written, and it's frustrating every single time because it just feels like the compiler is getting in my way for no reason. In every case, behavior described here is what I wanted, and it was quite annoying to have to jump through hoops to get it. It also came up when I was trying to introduce a friend to Rust by showing how I would translate some Python code of theirs into Rust. It was rather embarrassing to explain why I had to write my own function just to properly compare a signed integer to an array index. I would definitely rate this as one of my top paper cuts using the language, and I would rather have this before most if not all of the other currently proposed features (as much as I like some of them). |
Why did you have to write your own function? That is, isn't it an |
It's been a while, but I believe it was because I was comparing an i32 to a usize, and needed to do the right thing if the i32 was less than zero. Thus, it was something like if i32_val < 0 {
return Less;
} else {
return (i32_val as usize).cmp(&usize_val);
} Now, I think in this particular case (again, it's been a while), it would have been okay to instead cast both numbers to |
I'm personally pretty happy that this got postponed, because I think it shouldn't be done at all, at least not without an actual opt-in into the feature. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Well, technically it's just a trait implementation. It doesn't have to be imported by default. Manual import to opt-in is reasonable, even though I personally would rather have an opt-out. |
You can't import trait impls, you get them all for free when you import the trait. |
Interesting. I thought that if |
@VFLashM To expand on it: |
Any concerns for threading? Aren't all primitive numeric types 'sync' and 'send'? Doesn't that mean I could send a primitive to another thread. Then couldn't that other thread change the value of the primitive between the checks and casts, etc? For example: if ( someI64 < 0 ) {
return true;
} else {
(someI64 as u64) < someU64;
} So, if another thread changes the value right after the first comparison, would that not potentially give the wrong result? |
@gbutler69 That's not possible – Rust prevents mutable aliasing except for some special cases (Values inside |
@golddranks - Ah, yes! Thanks. I'm still solidifying my understanding of the Rust rules and what they do and don't allow. I figured there was something I was missing. |
Closing as postponed -- thanks, all, for this discussion! We'll look forward to revisiting this topic after the dust settles with changes to type inference and the trait system during the impl period! |
Rendered
Edit: Fixed typo in implementation description (signed->unsigned), added an examples.
Edit: Mentioned that
Ord
andEq
changes as optional, added an example.Edit: Added widening to signed as a more efficient way to compare small signed/unsigned pairs.
Edit: Added note about branching in implementation. Explicit branching removed from example. Potential security risks mentioned.