Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow comparisons between integers of different types #2021

Closed
wants to merge 9 commits into from

Conversation

VFLashM
Copy link

@VFLashM VFLashM commented Jun 7, 2017

Rendered

Edit: Fixed typo in implementation description (signed->unsigned), added an examples.
Edit: Mentioned that Ord and Eq changes as optional, added an example.
Edit: Added widening to signed as a more efficient way to compare small signed/unsigned pairs.
Edit: Added note about branching in implementation. Explicit branching removed from example. Potential security risks mentioned.


Implementation for signed-singed and unsigned-unsigned pairs should promote values to larger type first, then perform comparison.

Implementation for signed-unsigned pairs should first check if signed value less than zero. If not, then it should promote both values to signed type with the same size as larger argument type and perform comparison.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that promotion should be to the "unsigned" type since an unsigned type can hold values larger than a signed type of the same size.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both signed and unsigned types can hold values other type cannot.
You either have to check that signed value is non-negative then cast to unsigned, or have to check that unsigned value is not greater than maximum value of signed type.

Example:

fn less1(a: i32, b: u32) -> bool {
    if a < 0 {
        return true;
    } else {
        return (a as u32) < b;
    }
}

fn less2(a: i32, b: u32) -> bool {
    if b > std::u32::MAX {
        return true;
    } else {
        return a < (b as i32);
    }
}

I personally find check for non-negativeness a bit cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean std::i32::MAX as u32?

Copy link

@ssokolow ssokolow Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find check for non-negativeness a bit cleaner.

It's also less prone to the typo of checking whether a u32 is greater than std::u32::MAX rather than std::i32::MAX.

EDIT: ...and I just realized that's what @eddyb meant by his comment.

Copy link

@le-jzr le-jzr Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VFLashM You have a typo in the RFC. It says "promote both values to signed type", which actually overflows for b > std::i32::MAX.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, darn. Your're right. That's a typo in RFC. I absolutely meant promotion to unsigned.
Thanks.

Copy link
Contributor

@glaebhoerl glaebhoerl Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VFLashM

The RFC says:

If not, then it should promote both values to signed type

Your first code snippet says:

return (a as u32) < b;

I think this is all that @TimNN was saying: after checking for less-than-zero, the as afterwards should be to the unsigned type, not the signed type as the RFC currently says.

EDIT: the previous two comments hadn't appeared when I wrote this :)

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

It is completely unnecessary (and impossible due to backwards compatibility) to change Ord and Eq. Rather, the RFC should be just about adding the missing implementations of PartialOrd<T> and PartialEq<T>.

@VFLashM
Copy link
Author

VFLashM commented Jun 7, 2017

It is completely unnecessary (and impossible due to backwards compatibility) to change Ord and Eq.

I suddenly found that operator overloading only requires you to implement partial traits. Makes sense in hindsight. You're absolutely right, changing Ord and Eq is unnecessary.

It doesn't look impossible though. It obviously should be defined in backward compatible form:

pub trait Eq<Rhs = Self>: PartialEq<Rhs> { }

pub trait Ord<Rhs = Self>: Eq<Rhs> + PartialOrd<Rhs> {
    fn cmp(&self, other: &Rhs) -> Ordering;
}

But anyway, you're absolutely right, it's more about missing implementations.

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

It doesn't look impossible though. It obviously should be defined in backward compatible form

I didn't actually know that was possible. My bad. :)

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

Any chance we could shoehorn in more operators?
In particular, there is no reason whatsoever for AddAssign<T> and friends not to be implemented for at least the types where T: Into<Self>.

@glaebhoerl
Copy link
Contributor

@le-jzr You just meant that informally as "those specific numeric types which implement Into<Self>, i.e. which are cleanly promotable", right? Actually using where T: Into<Self> in the impl sounds like potentially a bad idea if there are any surprising impls for non-numeric types (as far as I can tell the IpV4Addr impl only goes in the other direction, which is what I originally thought of, but there may be others).

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

Yes, I was thinking of primitive integer types only. Sorry for the confusion. :)

@VFLashM
Copy link
Author

VFLashM commented Jun 7, 2017

I thought about compound operators too, but I think it's a separate change.
You can find some discussion here.

@glaebhoerl
Copy link
Contributor

Really old discussion about this and other ideas, for the record: https://internals.rust-lang.org/t/implicit-widening-polymorphic-indexing-and-similar-ideas/1141

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

By the way,

The rest of rust math prohibits mixing different types and requires explicit casts. Allowing heterogeneous comparisons (and only comparisons) makes rust math somewhat inconsistent.

This is not entirely true. Shl and Shr traits are implemented for all right-hand integer types. Div is very similar to those, in asymmetry and closedness, but is only implemented for Self. So the math is already somewhat inconsistent in this regard.

@ahicks92
Copy link

ahicks92 commented Jun 7, 2017

If the signed integer in a signed-unsigned pair is the larger type, converting the unsigned integer to the larger type and then comparing would avoid the branch. I don't know enough about assembly to know if this is what already happens at that level when comparing integers of different sizes, but it feels like it ought to be.

There might be something to be said for always converting signed-unsigned pairs where the signed type isn't the larger type to an integer that's one level above it, then comparing. This would work for everything through 32 bits, though I imagine doing it from 64 to 128 might be prohibitively expensive.

The solution proposed in this RFC concerns me because it adds an invisible branch inside a comparison of primitive types. Given that Rust is a systems language, being able to tell where your code might start causing tons of branch mispredictions is valuable for performance optimization. This approach may also prevent performance of autovecctorization through obfuscation of those patterns. It would be incredibly hard to teach people about this pitfall because we would have to explain these issues, neither of which are exactly newbie-level. The only way to see that it's happening is to read assembly. Putting it in the nomicon makes it hard to find.

If I had my way, I would like Rust to allow comparison and mathematical operations between all pairs of signed types and all pairs of unsigned types, but not to invisibly cross between them. Essentially C's rules without the invisible viral infecting caused by unsigned.

I'm mostly playing devil's advocate. This RFC would make my personal code more convenient. Nonetheless I have done some high performance programming and SIMD, and having otherwise-invisible branches just show up like this concerns me.

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

It would be incredibly hard to teach people about this pitfall because we would have to explain these issues, neither of which are exactly newbie-level.

Micro-optimization of generated assembly is not exactly a newbie-level concept either. I think that if someone is advanced enough to care about branch misprediction, they can easily understand that u64 < i64 requires two branches to work correctly.

Other than that, most cases can be solved just by extending one or both sides, and I don't see a reason why that wouldn't happen whenever possible.

@VFLashM
Copy link
Author

VFLashM commented Jun 7, 2017

If the signed integer in a signed-unsigned pair is the larger type, converting the unsigned integer to the larger type and then comparing would avoid the branch. I don't know enough about assembly to know if this is what already happens at that level when comparing integers of different sizes, but it feels like it ought to be.

You're right. Widening to signed will likely be faster up to machine word size.

Branch prediction will likely be bad regardless. In most cases there's already a branch there, after all.

@nagisa
Copy link
Member

nagisa commented Jun 7, 2017

Do not propose a particular implementation in the RFC and you’ll avoid the unnecessary yet infinite bikeshed. Instead list the requirements, expected behaviour and all that sort of stuff.

Implementors will be able to figure out the best implementation for such a trivial feature themselves.

@nagisa
Copy link
Member

nagisa commented Jun 7, 2017

Also, PartialEq and PartialOrd are not exactly what is used to do comparisons on primitive built-in types. Instead the compiler translates the operations directly. This means that this RFC is only half-done if you are proposing an exact implementation.

@VFLashM
Copy link
Author

VFLashM commented Jun 7, 2017

Do not propose a particular implementation in the RFC and you’ll avoid the unnecessary yet infinite bikeshed. Instead list the requirements and all that sort of stuff.

You're probably right. I just felt that the fact that it can be done without overhead in many cases is important enough.

Also, PartialEq and PartialOrd are not exactly what is used to do comparisons on primitive built-in types.

Rust code snippets are just an illustration of approach, not an exact implementation.
I think it's quite obvious that this rfc changes the way built-ins comparisons are handled internally.

@joshtriplett
Copy link
Member

At first, I wanted to object to this, because I like how Rust prevents mixing integer types, and avoids automatic promotions/demotions, unlike C. However, if the compiler only does safe comparisons, never any that could allow overflow or similar, then I don't have a fundamental objection, and I like the idea of helping the user do the right thing rather than forcing a potentially incorrect use of as.

What algorithm would this use to compare i64 and u64?

@VFLashM
Copy link
Author

VFLashM commented Jun 7, 2017

What algorithm would this use to compare i64 and u64?

It has to be something along these lines:

fn less_than(a: i64, b: u64) -> bool {
    if a < 0 {
        true
    } else {
        (a as u64) < b
    }
}

Not sure about exact asm implementation, but it will be more expensive than regular comparisons.

@ahicks92
Copy link

ahicks92 commented Jun 7, 2017

@le-jzr
I kind of disagree. In C/C++, there are rules of thumb: don't put if statements inside for loops if you can avoid it, for instance. Simple loops that don't do funny tricks are more likely to vectorize. Etc. You don't have to read assembly to take advantage of these ideas. Benchmarks can show you if you're successful.

When looking at an integer comparison, it is useful to obviously know what is going on. I don't like the idea of implementations that do potentially nonobvious things with primitive types that presumably map directly to the hardware.

Let's assume that I am advanced enough to read assembly. Where do we document this? How do I find out why my assembly is odd? When I look at Rust vs. equivalent C and have the knowledge to know what kinds of patterns should be efficient, how do I find out what strange difference of Rust makes it not work? How do I find the one spot that putting an explicit cast somehow has better performance than letting the compiler implicitly do it and understand why?

If we want to do this, Rust should develop and have clearly documented implicit integer conversion rules. Dropping the implementation from the RFC is fine, just so long as what's going on remains as obvious as it does in C/C++ (or better, if we don't let unsigned start infecting and screwing up seemingly-straightforward computations). Specifically, I'd advocate for dropping i64 to u64 and potentially not allow promotion to i128. I wouldn't advocate for dropping implementation from the RFC or the discussion; this is the first time I've seen an RFC where I think that's actually important.

@eddyb
Copy link
Member

eddyb commented Jun 7, 2017

You can also go with (a < 0) | ((a as u64) < b).

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

@camlorn
Comparing any signed integer with register-sized unsigned integer needs more instructions. Using anything bigger than gp register (e.g. i128) needs more (or at least different) instructions. Those are your rules of thumb, if you need them. It has nothing to do with Rust in particular, it's just general knowledge.

@le-jzr
Copy link

le-jzr commented Jun 7, 2017

@camlorn
It is especially worth pointing out that you are making the assumption that assembly translation is or should be obvious and 1:1 to language constructs. It is not, and it has never been. Not even in C. For example, any use of u64/uint64_t on 32b platforms will result in more complex code. How do you know? You need to learn how translation to assembly works. Can't/don't want to learn it? Someone else did and wrote a book with those rules of thumb. So what if rules for Rust are slightly different from C? It's a different language, it needs different rules.

@VFLashM
Copy link
Author

VFLashM commented Jun 7, 2017

You can also go with (a < 0) | ((a as u64) < b).

There likely won't be any difference at asm level. Then again, I'm not proposing an exact asm implementation. The important part is that it will be more expensive than regular cast.

@aturon aturon self-assigned this Aug 22, 2017
@glaebhoerl
Copy link
Contributor

Can this maybe get some libs-team attention under the umbrella of the ergonomics initiative before the impl period begins? (I see @aturon just self-assigned it - possibly with that intent?)

@aturon aturon added the Ergonomics Initiative Part of the ergonomics initiative label Aug 30, 2017
@aturon
Copy link
Member

aturon commented Aug 30, 2017

@glaebhoerl Yep, I will try to push on this. If anyone wants to summarize the current state (i.e. major open questions or tradeoffs), that'd be super helpful.

cc @scottmcm :-)

@scottmcm
Copy link
Member

I think the major unknown here is still how to do this without huge inference breakage. To use an example from above, in x == 1_000_000_000_000 how can the rules still use the type information from x to know what the literal should be? Defaulting the literal to i32 and changing behaviour due to truncation despite compiling is clearly unacceptable, worse even than compilation failure.

The main back-and-forth is essentially whether this is useful. On one side is appreciation that this has intuitive behaviour and makes the safe-and-correct thing easy, since it's much easier to do this wrongly than correctly today, and comparison can be done without pitfalls like surprising overflows. Simultaneously there's criticism that this hides mistakes elsewhere (that are compilation errors today), makes the machine code emitted by the operators less obvious, leads Rust towards C-style fuzziness about types, and that the correct thing to do is to change types elsewhere so that there isn't a type mismatch.

@est31
Copy link
Member

est31 commented Aug 30, 2017

Providing a gist of my earlier comments:

Integer types are different. Some can be negative, some positive, they have different overflowing behaviours. This is not something you can gloss over.

I see the strong typing regarding to integer types as a feature of Rust, not a problem. This RFC would undermine that feature and is therefore bad.

This strong typing of integer types allows to play "type lego". If you have a get_counter() function and it returns u32, and then you compare it with some cached value of some earlier call to check whether it increased, it works. You will get an error if you swap the cached value with an u64 from some get_timestamp() function.

C/C++ doesn't follow that approach and I don't really like that.

Of course, you sometimes have cases where you actually want to compare integers of different types. Right now this is not easy to do, e.g. comparing usize to isize. There is definitely demand for that, and having some way to do these comparisons is better than forcing people to solve the issue badly (there is no good way to compare usize to isize with casting only). However, for these cases I'd like if there was an opt-in way to do it, as in you somehow explicitly state you want a comparison of different types.

End of gist.

As an alternative proposal, you could have an ComparableInt<T>(T); type which wraps an int and provides ComparableInt<u32> <-> ComparableInt<isize> and similar type comparisons (also possibly including ComparableInt<u32> <-> isize). The proposed way of obtaining a wrapped int is to call a comparable(self) function on it.

@joshtriplett
Copy link
Member

@est31 I very much like the idea of having a convenient way to explicitly say "I'd like to compare these two numbers, please DTRT", while otherwise preserving integer type distinctions.

@le-jzr
Copy link

le-jzr commented Aug 30, 2017

Another alternative would be having the full set of less_than<T>(), less_eq<T>(), equal<T>(), etc. either as methods on integer types, or as functions in core::cmp. The compiler can then suggest use of these methods/functions on type-mismatched comparisons.

@VFLashM
Copy link
Author

VFLashM commented Aug 30, 2017

@scottmcm ( and this actually answers to most points from @est31 as well )

I think the major unknown here is still how to do this without huge inference breakage. To use an example from above, in x == 1_000_000_000_000 how can the rules still use the type information from x to know what the literal should be? Defaulting the literal to i32 and changing behavior due to truncation despite compiling is clearly unacceptable, worse even than compilation failure.

This is a serious concern indeed. In theory type of integral literal should be polymorphic "any integral type wide enough to contain value specified". In practice I'd rather require explicit types for integrals not fitting into i32. But then again, it will complicate type inference regardless of anything.

hides mistakes elsewhere (that are compilation errors today)

I'd argue that it doesn't. It does help sometime, when it prevents you from comparing values of semantically different types that happened to be implemented by different integral types (as in comparing seconds stored as i32 and milliseconds stored as i64), but it's more of a coincidence than the rule (it won't prevent you from comparing seconds and milliseconds if both are stored as i32). The correct way to avoid such errors is to create a new distinct type for each semantically different type. On top of that, in reality there almost always will be some other operation, which is totally incorrect for different types, and that operation will be caught by type system (cross-type assignment at the very least).

makes the machine code emitted by the operators less obvious

That's true. On top of that it's not only less obvious, but also less efficient.

leads Rust towards C-style fuzziness about types

Strictness is only useful because it helps finding problems early. If change enhances ergonomics and don't introduce new classes of errors, then I see no reason in strictness for sake of strictness.

correct thing to do is to change types elsewhere

The whole point is that cross-type comparison is mathematically "correct". It may reduce performance, but overall your code must have some other problem for it to be incorrect. For example, in most such cases you'll also have cross-type assignment, which is still a compilation error. The only case that won't be caught somewhere else is logical type mishap, but then again, there isn't much you can do about it anyway.


`PartialEq` and `PartialOrd` should be implemented for all pairs of signed/unsigend 8/16/32/64/(128) bit integers and `isize`/`usize` variants.

Implementation for signed-singed and unsigned-unsigned pairs should promote values to larger type first, then perform comparison.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "signed-singed" should be "signed-signed"

@aturon
Copy link
Member

aturon commented Aug 31, 2017

Thanks @VFLashM for the RFC and @scottmcm, for the summary, and others for adding their thoughts!

We discussed this RFC in the lang team meeting, and while there's continued interest in improving ergonomics around numerics (without sacrificing reliability), the key pitfall here is unknowns around type inference. In particular, it seems feasible that we could implement something like this RFC with some kind of inference fallback scheme, but such schemes in the past have caused headaches.

During the impl period, we will continue ongoing work overhauling the way type inference and the trait system work, one component of which is better support for "suggestions" as a general mechanism for fallback. At that point, we could experiment with adding more comparison impls with a designated "default" one to apply. Until then, though, we're really not in a position to make the proposed change.

In the meantime, in the interest of keeping our tracker clear, I'm going to:

@rfcbot fcp postpone

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Aug 31, 2017
@rfcbot
Copy link
Collaborator

rfcbot commented Aug 31, 2017

Team member @aturon has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rkjnsn
Copy link
Contributor

rkjnsn commented Sep 7, 2017

I understand that the inference aspects are tricky, and that we probably can't make this change until we have some way to say "but default them to being the same if there are no other constraints are in play."

That said, I have wanted this a lot considering the amount of Rust code I've written, and it's frustrating every single time because it just feels like the compiler is getting in my way for no reason. In every case, behavior described here is what I wanted, and it was quite annoying to have to jump through hoops to get it.

It also came up when I was trying to introduce a friend to Rust by showing how I would translate some Python code of theirs into Rust. It was rather embarrassing to explain why I had to write my own function just to properly compare a signed integer to an array index.

I would definitely rate this as one of my top paper cuts using the language, and I would rather have this before most if not all of the other currently proposed features (as much as I like some of them).

@steveklabnik
Copy link
Member

why I had to write my own function just to properly compare a signed integer to an array index.

Why did you have to write your own function? That is, isn't it an as?

@rkjnsn
Copy link
Contributor

rkjnsn commented Sep 7, 2017

Why did you have to write your own function? That is, isn't it an as?

It's been a while, but I believe it was because I was comparing an i32 to a usize, and needed to do the right thing if the i32 was less than zero. Thus, it was something like

if i32_val < 0 {
  return Less;
} else {
  return (i32_val as usize).cmp(&usize_val);
}

Now, I think in this particular case (again, it's been a while), it would have been okay to instead cast both numbers to isize, but that's not true in the general case, and it's annoying, nonetheless.

@est31
Copy link
Member

est31 commented Sep 8, 2017

I'm personally pretty happy that this got postponed, because I think it shouldn't be done at all, at least not without an actual opt-in into the feature.

@rfcbot
Copy link
Collaborator

rfcbot commented Sep 12, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Sep 12, 2017
@VFLashM
Copy link
Author

VFLashM commented Sep 12, 2017

at least not without an actual opt-in into the feature

Well, technically it's just a trait implementation. It doesn't have to be imported by default. Manual import to opt-in is reasonable, even though I personally would rather have an opt-out.

@matthewjasper
Copy link

It doesn't have to be imported by default.

You can't import trait impls, you get them all for free when you import the trait.

@VFLashM
Copy link
Author

VFLashM commented Sep 13, 2017

You can't import trait impls, you get them all for free when you import the trait.

Interesting. I thought that if impl is in a separate module, then it has to be imported. My bad.

@eddyb
Copy link
Member

eddyb commented Sep 13, 2017

@VFLashM To expand on it: impl doesn't have a name so there's nothing you could import.

@gbutler69
Copy link

Any concerns for threading? Aren't all primitive numeric types 'sync' and 'send'? Doesn't that mean I could send a primitive to another thread. Then couldn't that other thread change the value of the primitive between the checks and casts, etc? For example:

if ( someI64 < 0 ) {
    return true;
} else {
    (someI64 as u64) < someU64;
}

So, if another thread changes the value right after the first comparison, would that not potentially give the wrong result?

@golddranks
Copy link

@gbutler69 That's not possible – Rust prevents mutable aliasing except for some special cases (Values inside UnsafeCell). Being Sync and Send doesn't change that.

@gbutler69
Copy link

@golddranks - Ah, yes! Thanks. I'm still solidifying my understanding of the Rust rules and what they do and don't allow. I figured there was something I was missing.

@aturon
Copy link
Member

aturon commented Sep 19, 2017

Closing as postponed -- thanks, all, for this discussion! We'll look forward to revisiting this topic after the dust settles with changes to type inference and the trait system during the impl period!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ergonomics Initiative Part of the ergonomics initiative final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.