Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Checked integer conversions #1218

Closed
wants to merge 7 commits into from

Conversation

petrochenkov
Copy link
Contributor

Implement several new conversion methods for primitive integer types.

Rendered

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jul 20, 2015
@cesarb
Copy link
Contributor

cesarb commented Jul 21, 2015

A few suggestions.

First, the bikeshedding ones:

  • Name it NumCast (as in std::num) instead of IntegerCast, and make it work on floats too. If it can't work for floats, I'd suggest IntCast (as in std::num::ParseIntError).
  • How about truncating_cast instead of wrapping_cast?

Now, the non-bikeshedding ones:

  • You could add a checked_cast based on checked_add (that is, fn checked_cast(self) -> Option<Target>). That way, you can have a "panics when the conversion is lossy even in release mode" (similar to an assert! but often simpler to use) by doing a .checked_cast().unwrap(), or you can chain several operations using the Option monad (chaining with .and_then() and/or .map()).
  • The overflowing_cast variant doesn't say in which direction it overflowed (positive or negative). As far as I know, the main use for the "overflowing" operations is for multi-limb numbers (for instance, add the lower limbs, then add the upper limbs plus one if the previous addition overflowed, like in the extprim crate); I don't see how overflowing_cast would be used (other than uses which would be better served by the checked_cast I just suggested).

I have no opinion so far about the From/Into part, only about the IntegerCast/NumCast/IntCast/whatever trait.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 21, 2015

I feel this RFC is tackling too many things at once, many of which can be done in external crates.

I'm all for the Into/From for lossless and overflow-safe casts

@petrochenkov
Copy link
Contributor Author

Name it NumCast (as in std::num) instead of IntegerCast, and make it work on floats too. If it can't work for floats, I'd suggest IntCast (as in std::num::ParseIntError).

Floating point is a separate non-trivial problem and I intentionally don't touch it in this RFC, so I don't use Num. I've chosen Integer instead of Int as a generic name for all integer types because I still have memories about now renamed int and uint. If int and uint never reappear in the language, then using Int instead of Integer is a good idea.

How about truncating_cast instead of wrapping_cast?

I try to keep direct correspondence between arithmetic and conversion methods. Moreover, something like -1i8 as u8 doesn't look like truncation.

You could add a checked_cast based on checked_add (that is, fn checked_cast(self) -> Option).

Oh, I've missed it. I thought it is now called overflowing_add. These two are the same semantically with overflowing_add looking a bit more promising performance-wise. I'll keep both to, again, keep the direct correspondence.

@cesarb
Copy link
Contributor

cesarb commented Jul 21, 2015

These two are the same semantically with overflowing_add looking a bit more promising performance-wise.

I've used the .checked_add(...).unwrap() idiom. On x86-64, it compiled to just two instructions in the "non-panic" case: an add and a "jump on overflow" instruction, which can be predicted not-taken. That's very efficient.

@kornelski
Copy link
Contributor

👍 I wholeheartedly agree with the Motivation section and think it's an important problem to solve.

Implementation of Into/From seems like a good uncontroversial step forward, and it could be a basis for further syntactic sugar if needed.

I like consistency with wrapped_add, checked_div, etc.

I don't have an opinion on conversion to pointers.

@DanielKeep
Copy link

I generally support more and better-defined conversion traits, but I'm not convinced that this design (one trait with a bunch of specific-to-numbers methods) is necessarily a good idea.

The other day (I only just found out about this RFC from TWiR), I got fed up with the lack of conversion traits and started a crate to try and address this. The current work is at https://github.com/DanielKeep/rust-conv (docs are in the code; no generated docs yet).

The first two traits I've implemented are ApproxFrom and ValueFrom (plus their *Into duals). Note that although the first targets are the builtin integer and float types, I am deliberately designing the traits to not be specific to numbers or the builtin types.

ValueFrom is for exact, value-preserving conversions. So, f32::value_from(16_777_216i32) works, but f32::value_from(16_777_217i32) returns an Err.

ApproxFrom is for approximate conversions. At the moment, because I can't see any way to actually change the floating point modes in Rust, it's limited to a single approximation scheme, but the goal is to allow things like <i32 as ApproxFrom<_, RoundToNearest>>::approx_from(some_f64).

Also, I've separated the "what to do on overflow/underflow" issue entirely: the result of conversions returns that in the error type (if possible), and there are some helper methods that let you do things like conv_result.unwrap_or_saturate() or conv_result.unwrap_or_inf().

So to map the above to your trait, they're effectively:

  • cast: no direct equivalent; I'm not a huge fan of "fails in debug builds, corrupts state in release builds" behaviours in general.
  • wrapping_cast: no equivalent yet; I've actually just had an idea on how to do this: ApproxFrom<_, Wrapping>. :)
  • checked_cast: src.value_into().ok()
  • overflowing_cast: no simple equivalent, but I'm not entirely sure when you'd want this: either a wrapped result is what you want (in which case you've already done it), or it isn't what you want (in which case, you don't care what the wrapped result is); what's the use case for this?
  • saturating_cast: src.value_into().unwrap_or_saturate() or src.approx_into().unwrap_or_saturate() depending on how lenient you want to be in the conversion.

Aside from the "implement From/Into for integer types", which I agree with, I think it might be easier to just have a crate-off. New traits can be implemented and tested externally. I don't want to give too much feedback on the other details since I feel I'm a bit too close to the subject right now (this person is doing it differently to me; burn the heretic!).

Either way, I'm glad this is being brought up. Having recently suffered through writing some pathologically generic code that had to deal with value conversions, I'm very much in favour of the situation being improved! :)

@petrochenkov
Copy link
Contributor Author

@DanielKeep

I'm not convinced that this design (one trait with a bunch of specific-to-numbers methods) is necessarily a good idea.

The motivation is that all the introduced operations - overflows, wrapping, saturation - lie in the domain of bounded integer arithmetic. If a type can implement some of these methods, then it can implement others too, so its logical to gather them under one trait. In fact, I think the good way to stabilize the analogous arithmetic operations is to gather them into traits too:

trait IntegerAdd<Rhs = Self>: Add<Rhs> {
    fn wrapping_add(self) -> Self::Output;
    fn checked_add(self) -> Option<Self::Output>;
    fn overflowing_add(self) -> (Self::Output, bool);
    fn saturating_add(self) -> Self::Output;
}

Floating point numbers have sufficiently distinct set of problems - wrapping doesn't exist, overflows / underflows are totally different, saturation is irrelevant (?), a lot of new stuff appears - inexact calculation, rounding modes, infinities / nans - a lot of rust-conv deals specifically with it. Note, that floats currently provide none of that wrapping-checking methods emerged after the adoption of #560. I think integers and floats are sufficiently different entities to not generalize over both of them.
And in general, this RFC is more about closing a hole in the existing arithmetic operations, than about creating a generic conversion framework.

cast: no direct equivalent; I'm not a huge fan of "fails in debug builds, corrupts state in release builds" behaviours in general.

I'll add an alternative to always panic in cast, but it's already possible to do it with checked_cast().unwrap()

overflowing_cast: no simple equivalent, but I'm not entirely sure when you'd want this

Yeah, I'm not sure either. It's there for consistency. If overflowing_* is removed (in favor of checked_*, for example), then it should be removed from all arithmetic operations.

@kornelski
Copy link
Contributor

I'm wondering whether overflowing_cast is a nice-to-have that isn't essential. AFAIK it can be replaced by (v.wrapping_cast(), v.checked_cast().is_none()), which is much more verbose, so I wonder how common that pattern is.

@kornelski
Copy link
Contributor

I'm not a huge fan of "fails in debug builds, corrupts state in release builds" behaviours in general.

I'm a fan of zero overhead. For example in graphics algorithms it's typical to do calculations on 16 or 32-bit values and save results in u8 pixels. The cast to u8 is not supposed to overflow if the algorithm is implemented correctly, so checks in final builds are not needed, but of course bugs often happen during development. Like with arithmetic overflow in Rust, it's a compromise to have extra checking and also zero overhead in release builds. I like it (I wouldn't object to rename of --release to --reckless ;)

@petrochenkov
Copy link
Contributor Author

@pornel
I guess the main reason why overflowing_ops exist is because LLVM intrinsics have this form:
http://llvm.org/docs/LangRef.html#arithmetic-with-overflow-intrinsics
They are used (almost) only to implement checked_ops.

@petrochenkov
Copy link
Contributor Author

I've made some editorial tweaks to the RFC and renamed IntegerCast to IntCast- there's enough precedent for abbreviating integers in general as Int.

@aturon
Copy link
Member

aturon commented Sep 15, 2015

@petrochenkov Nice RFC, thanks!

I am in support of implementing the normal conversion traits and the type parameter for sure -- these are things that only std can do, and I see little reason not to do them.

Regarding the new trait, while it seems like a reasonable design, I agree with others that this may be best prototyped out of tree, and perhaps executed in a more general way before landing in std.

At some point we would really like to revamp the rust-lang num crate (as per #1242); this kind of thing would be a good candidate for such a crate.

That said, I'm not strongly against landing such a trait directly in std, since we already have a similar suite of methods for other operations (adding etc).

@aturon
Copy link
Member

aturon commented Sep 15, 2015

cc @wycats @rust-lang/libs

@alexcrichton
Copy link
Member

I'd personally like to see some usage of From or Into before we add too many of those trait impls, but I agree with @aturon that they're pretty easy and harmless to add regardless. I also agree that we'd probably want to develop the trait outside of std first, and one concern I'd have is that a lot of times you may want to do something like:

let foo = bar.cast::<u16>();

Basically put the type parameter on the method call instead of the destination. The trait as-is doesn't currently allow for this (like From and Into also don't today). This isn't necessarily a blocker or anything, just food for thought!

@petrochenkov
Copy link
Contributor Author

you may want to do something like: let foo = bar.cast::<u16>();

cast is modeled after into and both are supposed to use type ascription:

let foo = bar.cast(): u16;

I still have hope, that "upcoming rebase by @nikomatsakis" is just around the corner.

Regarding the choice between std and out of tree, I prefer std because I'd like this trait to go through stabilization process synchronously with other wrapping/checking methods.

@aturon
Copy link
Member

aturon commented Oct 8, 2015

The libs team discussed this RFC a bit yesterday, which led to the following:

  • Everyone agrees that adding the From/Into impls makes sense, and the general feeling is that this doesn't even require an RFC (we often land direct PRs filling out impls for existing traits like that).
  • There was not consensus around whether the casting part of this RFC should start out of tree or not. Several people felt that they would not bother with an out-of-tree dependency for this functionality, and felt that it should be considered directly for std.

Would you consider making a PR for the basic conversions, and adjusting this RFC to focus purely on casting?

@petrochenkov
Copy link
Contributor Author

Would you consider making a PR for the basic conversions, and adjusting this RFC to focus purely on casting?

Sure

@petrochenkov
Copy link
Contributor Author

you may want to do something like: let foo = bar.cast::();

I haven't mentioned it here, but if we employ D-style parentheses-elision (discussed here) in addition to type ascription, then safe casting will feel like a breeze!

let foo = bar.into: u16;
let foo = bar.cast: u16;

@cristicbz
Copy link

@alexcrichton I agree that bar.cast::<i16> would be nice. How about making the trait IntCast<Source> instead with fn cast(source: Source) -> Self etc. enabling:

  let foo = u16::cast(bar);

Or maybe IntCastFrom<Source> for more consistency and clarity as to source vs dest:

  let foo = u16::cast_from(bar);

@alexcrichton
Copy link
Member

Hm I can see how avoiding as at all costs (to signify an audit was done) may be nice, but from an API surface area perspective I'd still prefer to take the conservative route of fewer methods. Just a personal opinion though!

I guess if saturating casts are more complication and they're indeed wanted then they can exist, which in some sense motivates all the others as you've got 3/4 methods anyway...

@comex
Copy link

comex commented Feb 27, 2016

FWIW, I'm not a big fan of the idea that foo.try_into() could mean, depending on the context, operations as different as a checked integer cast and parsing a string. Just because multiple things can be generalized into a single trait doesn't mean they should be.

@photino
Copy link

photino commented Feb 27, 2016

+1 for fewer inherent methods: cast and checked_cast are useful functionalities to have. Good programs should avoid wrapping, overflowing and saturation casts.

@petrochenkov
Copy link
Contributor Author

Good programs should avoid wrapping ... casts.

In the standard library wrapping casts are used for hashing/serialization (i32 is converted to u32 and then reuses its hashing/serialization methods) or for turning 64 bits of randomnes (u64) produced by an RNG into 32 bits (u32) by dropping the excessive bits.

@photino
Copy link

photino commented Feb 27, 2016

There are only a few cases like hashing/serialization which do not care about the excessive bits. These wrapping casts can also be done using as. Any use cases for overflowing and saturation casts?

@petrochenkov
Copy link
Contributor Author

These wrapping casts can also be done using as.

This is exactly what I'd like to avoid.

Any use cases for overflowing and saturation casts?

Personally I'm not aware of any. AFAIK, saturating arithmetic is widely used in few specific domains, but not much outside of them.
@mahkoh, why did you needed saturating casts here?

@photino
Copy link

photino commented Feb 27, 2016

If as is always there, some users maybe still tend to use it. Since cast is the same as wrapping_cast in the released mode, isn't it better to use cast?

@ticki
Copy link
Contributor

ticki commented Feb 27, 2016

@photino No, since in debug builds cast() panics, if the cast is lossy. Thus, depending on what you're doing, it can be undesirable.

@ticki
Copy link
Contributor

ticki commented Feb 27, 2016

@petrochenkov

Personally I'm not aware of any. AFAIK, saturating arithmetic is widely used in few specific domains, but not much outside of them.

Saturating arithmetic is extremely useful in many cases, since you get the best possible lossy result, by simply saturating the bound.

@alexcrichton
Copy link
Member

As a quick update on this RFC, we talked about it in triage yesterday and the conclusion was that @sfackler and @aturon are going to draft up an RFC for a generic trait to operate with fallible conversions. We believed that if we had such a trait that these methods may be much less motivated.


My personal take on the a fallible conversion trait is that I think it'd almost replace the need for these methods altogether. Going over the methods again in the mindset that we've got a fallible conversion trait:

  • cast - it seems like it's agreed on this thread that this is a relatively dangerous implementation for the "best name", and while useful it's not necessarily that much better than a fallible conversion + .unwrap()
  • checked_cast - this'd be replaced by a fallible conversion
  • wrapping_cast - the case for purely replacing all as casts in a codebase with something that's no as is somewhat weak in my opinion, and given a lack of other casting methods on integers I don't think it's worth having only this one.
  • overflowing_cast - this has been agreed that it's just added for consistency I believe?
  • saturating_cast - we're also currently lacking a strong motivating use caes for this.

So all in all, once you lose the motivation of checked_cast, I think you quickly lose motivation for {overflowing,saturating}_cast, after which you've basically lost motivation for cast and wrapping_cast as well. This allows us to sidestep the question about this unstable trait as well!

Note, however, that this is all pinned on an ergonomic fallible conversion trait (which doesn't exist). Depending on how that plays out we still may want these methods for ergonomics. This is when I at least eagerly await the RFC from @sfackler :)

@ollie27
Copy link
Member

ollie27 commented Mar 3, 2016

Both rust-lang/rust#31825 and rust-lang/rust#31858 are cases of replacing an incorrect wrapping_cast with a saturating_cast so maybe that's some motivation for saturating_cast.

@nagisa
Copy link
Member

nagisa commented Mar 4, 2016

Wrapping casts could be implemented as impl From<Wrapping<F>> for Wrapping<T>. Similarly for saturating casts, but would need to introduce Saturating<T>.

@digama0
Copy link
Contributor

digama0 commented Mar 22, 2016

I would also argue for saturating_cast.

@alexcrichton

saturating_cast - this seems so eerily similar to checked_cast::()>.unwrap_or(u32::max_value()) that we may want to just use that instead

Not quite. Depending on the source and target, such as i32 as u8, there may be two bounds to check, not just one. This makes it significantly more complicated than just using a simple function applied to checked_cast, because at that point the overflow/underflow information is already lost. It could be done with a fallible conversion, but since this info is not coming out of LLVM I'd hate to calculate the extra data when it is usually not going to be used (maybe you guys know how to optimize that out, though). But more to the point there is not much use for an Ok(n) / Err(Overflow) / Err(Underflow) result if there is not already a plan for handling these results, and saturating_cast is the only case I can think of where you would want to handle them differently.

My personal use-case, that lead to rediscovering this RFC via rust-lang/rust#32421, will be satisfied by just checked_cast, but I like the consistency of the original RFC approach.

I don't really think overflowing_cast is necessary, although it should only go if the other overflowing_* methods do. If you (intentionally) want wrapping, you won't care about overflow. Conversely, if you care about overflow, wrapping is almost never the desired response. Having the two together in one function is no more useful than checked_cast + wrapping_cast.

@DanielKeep
Copy link

@digama0 conv handles this by baking the necessary information into the error type; if a conversion fails due to a representable range violation and the target type can be saturated, you can directly saturate the Result.

Specifically, you can use .unwrap_or_saturate() if the range violation is the only issue (i.e. integer to integer conversions), or .saturate() to handle just the range error (i.e. float to integer conversions).

@digama0
Copy link
Contributor

digama0 commented Mar 24, 2016

@DanielKeep I agree that that's the right approach for an "everything you ever wanted" conversion library like conv, but I think the standard library should take the more light-weight approach with just checked_cast and saturating_cast (+wrapping_cast?) directly, even if this ends up as a Result-returning function, because that's what the hardware gives you. Keep the fancy stuff in the fancy library. (Not that I don't think it's a great crate, it just isn't necessarily appropriate for the standard library.)

@Zoxc
Copy link

Zoxc commented Mar 25, 2016

Assuming we add a conv::ValueFrom-like TryFrom. I'd like CoerceFrom and CoerceInto traits which is implemented for all types using TryFrom and TryInto.

Example of CoerceFrom:

pub trait CoerceFrom<T>: Sized {
    fn coerce_from(from: T) -> Self;
}

impl<A, T: TryFrom<A>> Coerce<T> for A {
    fn coerce_from(from: T) -> Self {
        match from.try_from() {
            Ok(value) => value,
            Err(value, error) => {
                panic!("Cannot coerce value {} of type {} into type {} because {}", display(value), type_name::<A>(), type_name::<B>(), error)
            }
        }
    }
}

It would replace cast in this RFC and it is shorter and gives better error messages than T::try_from(x).unwrap() making it more likely to be used.

I would add a truncate method which would allow conversions between unsigned numbers or between signed numbers, but no unsigned <-> signed conversions. It should sign-extend for signed numbers.

I would also add coerce_unsigned and coerce_signed methods which returns the integer bitcast to the type with sign requested that has the same size as the source type.

@DanielKeep conv needs something like this, if it doesn't have it already :)

I'd get rid of cast, wrapping_cast, checked_cast, overflowing_cast. I'm not sure about saturating_cast.

@DanielKeep
Copy link

@Zoxc Most of what coerce_* does can be done just by adjusting how the error displays itself. The one problem you'll have is that what you've written require T: Display, which might not be implemented. That was actually why I removed support for displaying the input value in conv's errors: I'd rather have it work for all types, rather than just those that impl Display. Specialisation will probably solve this in time, though.

Just, as an aside, conv deals with truncation/wrapping by making that a type parameter on generic approximation casts (so that it can also support the different floating point rounding modes).

@glaebhoerl
Copy link
Contributor

I'd avoid using the word "coerce" for this, just to avoid (further) confusion. It already has a specific meaning in Rust's type system, and a specific (different!) meaning in some type-theoretical circles, and this would be a third. (Lord knows we've already had enough confusion between those two, largely out of my own fault.)

@alexcrichton
Copy link
Member

🔔 This RFC is now entering its week-long final comment period 🔔

The libs team discussed this during triage and the conclusion was that with the recently-added TryFrom and TryInto traits we may not want to merge this RFC just yet. This does indeed include functionality that's not easily provided by the new fallible conversion traits, but it's up in the air how much that extra functionality belongs in libstd itself.

@alexcrichton alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label May 24, 2016
@alexcrichton
Copy link
Member

Ok, the libs team got a chance to discuss this RFC yesterday and the conclusion was that we feel the same where TryFrom and TryInto should help tie us over with fallible conversions in the meantime, and otherwise while this is a possible extension to the numerics we would prefer to postpone it at this time.

Thanks regardless though for the RFC @petrochenkov!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.