Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding integer overflow #45

Closed
wants to merge 1 commit into from

Conversation

bill-myers
Copy link

This is an RFC for avoiding integer overflow, which first discusses various options, and then focuses on and explains a specific proposal that does not make use of failure, and introduces a new kind of integer type to handle things like loop counters.

@brendanzab
Copy link
Member

One idea I had for making using the Checked* traits easier to work with was to add a Checked<T> adaptor struct. For example:

pub struct Checked<T>(pub T);

impl<T> Checked<T> {
    /// Retrieves the stored value
    pub fn get(self) -> T {
        match self { Checked(x) => x }
    }
}

impl<T: CheckedAdd> Add<Checked<T>, Option<T>> for Checked<T> {
    fn add(&self, other: &Checked<T>) -> Option<T> {
        match (self, other) { (&Checked(ref x), &Checked(ref y)) => x.checked_add(y) }
    }
}

impl<T: CheckedSub> Sub<Checked<T>, Option<T>> for Checked<T> {
    fn sub(&self, other: &Checked<T>) -> Option<T> {
        match (self, other) { (&Checked(ref x), &Checked(ref y)) => x.checked_sub(y) }
    }
}

impl<T: CheckedMul> Mul<Checked<T>, Option<T>> for Checked<T> {
    fn mul(&self, other: &Checked<T>) -> Option<T> {
        match (self, other) { (&Checked(ref x), &Checked(ref y)) => x.checked_mul(y) }
    }
}

impl<T: CheckedDiv> Div<Checked<T>, Option<T>> for Checked<T> {
    fn div(&self, other: &Checked<T>) -> Option<T> {
        match (self, other) { (&Checked(ref x), &Checked(ref y)) => x.checked_div(y) }
    }
}

Could this be a simpler solution?

Anyway, despite the above, I think both range and mod types would be extremely valuable for a safety conscious language like Rust.

I think it might be worth looking at how other languages do range and mod types. For example Ada:

type Score is range 1 .. 10;
type Hours is mod 24;

I asked a question on stack overflow a while back regarding how these are implemented. This is probably already known to you though.

@pczarn
Copy link

pczarn commented Apr 19, 2014

Don't forget about arithmetic with saturation, there's very basic Saturating trait.

@akdjka
Copy link

akdjka commented Apr 28, 2014

Nimrod has 2 types on integers. The default ones are not allowed to overflow. If they do in a debug build, there's warranted exception. In the release mode behaviour is undefined.
To be precise, these are just like in C, signed and unsigned. I call signed the default because unsigned is not even a part of the language, it's in the stdlib.

I like this dependable crash in debug with the default integer type, but no check in release. Very fast when it matters and much more secure than simple wrapping.

Also, I fully agree with bjz that both range and mod types would be very valuable.

@ghost
Copy link

ghost commented Apr 28, 2014

Relevant: the isl library used in GCC's graphite optimizations. I believe it's only used for optimizations because the generic case is exactly the Halting Problem.

As for alternatives, an i64 is big enough that an accidental overflow is a bug somewhere else. If it was a domain where such overflows are even possible, the programmer would use checks or a bigint. We could just warn/unsafe 32-bit types, and allow/wrap the unsafety in Checked<T>.

@rpjohnst
Copy link

The paper Dependent Types in Practical Programming (http://www.cs.cmu.edu/~fp/papers/popl99.pdf) is a good example of this (and some other things it would be useful for), although it doesn't really deal with the size of the integer types (especially the issues with runtime-closed range types, which feel rather unpleasant).

@thestinger
Copy link

@akdjka: Rust is never going to have checks that are left out at runtime and lead to undefined behaviour. It doesn't belong in the language outside of unsafe blocks. I don't think checks for debug builds are really relevant here because forgetting to handle overflow is almost by definition something you're not going to be testing for. A subset of these bugs could be caught by fuzzing, but not all. Even C has checks on overflow for debug builds via the GCC/clang sanitizers and it doesn't really change the safety of the language.

@akdjka
Copy link

akdjka commented Apr 29, 2014

@thestinger
I have not said that I'm for undefined behaviour but for detecting overflow in debug builds.
If you're testing edge conditions, you're likely to find a large part of your overflows.

@thestinger
Copy link

Rust isn't going to allow catching undefined behaviour only in debug builds and allowing it in release builds. Adding sanitizer-like flags to rustc for unsafe code is possible, but adding undefined behaviour to release builds is really not.

@dobkeratops
Copy link

Could rust get another build mode with equivalent (dangerous) functionality to c++ release, but in rust terminology it would be an 'unsafe-performance' or something.

I get that rusts focus is safe, you might add yet more checks for a 'debug' build' .. but for me it would be interesting to consider rust's default semantics as basically a "debug build on steroids" with lots of extra "assertions" built into the language semantics.. but for "unsafe-performance" matching C++ you dont want a single branch out of place.

.. changes would be things like unwrap not testing for failiure, no bounds-check on array acess, probably some different numeric assumptions ..

IMO Rust has a lot to offer besides safety - I looked for this language mostly because I dont like the way classes and headers work in c++, and was interested in 'a more FP style', whilst not being able to tolerate a GC language.
I'm actually ok with C++'s dangerous behaviour - its essential for some domains. Whats limiting is how the language has everything shoe-horned into C syntax (eg so you cant have propper tuples because of how ,'s are used.. or multiple return values.. issues with [] syntax..), header files, the asymetry between methods and functions.

Bear in mind the language shootouts... perhaps it would be nice to show that rust can match C++ 1:1, if you want it to. It shouldn't involve having to write uglier code than C++ (unwraps, not getting nice array sugar, extra nesting for unsafe or making your code incompatible with mainstream rust code, having to use nonstandard types like 'real' instead of float, etc)

@bstrie
Copy link
Contributor

bstrie commented May 6, 2014

@dobkeratops , an "unsafe-performance" compiler flag is not acceptable. Thoughtlessly disabling safety features for the sake of performance is contrary to the language's philosophy.

@nrc
Copy link
Member

nrc commented Jun 6, 2014

I would love some data on the performance impact of checked arithmetic. I don't think using the Clang flag is close enough. I would like to have fail on overflow implemented in Rust and profile Servo (or some other large code base) to see the performance impact. The downside to this is we might need some non-trivial optimisation in Rust code to make checking performant.

@thestinger
Copy link

Nothing short of hardware support is going to make it perform well. LLVM can't even deal with totally obvious value propagation within a local scope with only local pointers. It struggles just to optimize out the iterators built out of pairs of raw pointers because it's unable to optimize out a null check right next to code proving it's not null by dereferencing it. Doing something almost reasonable with code that looks like what a C programmer would write is about all you can expect from LLVM.

@thestinger
Copy link

Here's a trivial test case using the most efficient implementation of checked arithmetic that's available (LLVM intrinsics explicitly performing it and compiling down to near optimal machine code):

extern crate test;

use std::intrinsics::{abort, u32_add_with_overflow};
use test::Bencher;

#[inline(never)]
pub fn no_overflow_check_inner(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        *x += 1
    }
}

#[inline(never)]
pub fn overflow_check_inner(xs: &mut [u32]) {
    for x in xs.mut_iter() {
        let i = *x;
        let (result, overflow) = unsafe { u32_add_with_overflow(i, 10) };
        if overflow { unsafe { abort() } }
        *x = result;
    }
}

#[bench]
pub fn no_overflow_check(b: &mut Bencher) {
    let mut xs = [0, ..300000];
    b.iter(|| {
        no_overflow_check_inner(xs);
    })
}

#[bench]
pub fn overflow_check(b: &mut Bencher) {
    let mut xs = [0, ..300000];
    b.iter(|| {
        overflow_check_inner(xs);
    })
}
test no_overflow_check ... bench:     63441 ns/iter (+/- 1123)
test overflow_check    ... bench:    291665 ns/iter (+/- 2326)

This is not atypical of what the hot loops in an application look like. Servo will likely have many loops like this (loops or series of scalar integer operations), and performing these checks makes it far slower than Java or even JavaScript. It's not possible to do this and claim Rust is a systems language - it will be fighting for the performance crown with PyPy, not C++. It's all well and good to pretend there's a sufficiently smart compiler but there is not; hardware support is required to make this perform well and even if it became available on modern x86 CPUs it wouldn't be used by the x86 binaries rustc needs to produce by default.

If someone fixes this test case, and the next one, and the next one, and many far more complicated cases with all kinds of pointer aliasing then perhaps it would start making more sense. This case should be trivial to optimize, because it's clear that every one of these is going to overflow at the same time from some interprocedural. Real world cases are far more complex than this, and LLVM isn't even able to begin tackling the Hello World of interprocedural optimization.

@nrc
Copy link
Member

nrc commented Jun 6, 2014

I don't think that a compiler will help us, not easily at least (the kind of optimisation I imagine are not the kind that LLVM would do, but that we would do in rustc, e.g., hoist the overflow check out of the loop in overflow_check_inner since all inputs are unsigned, but agree such things are difficult bordering on impossible).

I know that adding an overflow check for this kind of micro-benchmark will be ruinous to performance, but the question is really how many tight loops like this exist in Servo and how often they get executed. Even if performance in such a loop degrades 100% or more in the worst case, we have no idea how this will affect real performance in real software.

@thestinger
Copy link

don't think that a compiler will help us, not easily at least (the kind of optimisation I imagine are not the kind that LLVM would do, but that we would do in rustc

Optimizing in rustc means making another compiler IR, and compile-time will be even worse. It's already slower than C++, which is known as having incredible slow compile-time.

hoist the overflow check out of the loop in overflow_check_inner since all inputs are unsigned, but agree such things are difficult bordering on impossible)

You first have to prove that they're all equal via interprocedural analysis, which is trivial in this case but impossible in most cases.

Even if performance in such a loop degrades 100% or more in the worst case, we have no idea how this will affect real performance in real software.

This is far from the worst case. I can write a bigger program where the code bloat from all the branches on nearly every primitive operation causes far pipeline stalls.

@thestinger
Copy link

I don't see a point in Rust existing if it doesn't take performance seriously. In my opinion, Rust is already missing the performance and compile-time speed it needs to be a competitor to C and C++. At the moment, the abstractions hit too many missed optimizations and optimal code ends up requiring lots of custom unsafe blocks. I don't think the value prospect for the complexity it comes with (references with lifetimes, move/liveness checking) is there yet.

The array bounds checks are a killer performance issue and working around it is very ugly. The extensive use of unwinding is leading to ridiculously long compile-time and causing missed optimizations all over the place too. There's a very good reason for performance-conscious projects like LLVM disabling unwinding and Rust isn't viable in that niche right now. As far as I can tell, Swift completely avoids exceptions and unwinding and as a language designed by the lead developer of LLVM that says a lot.

Rust can bury its head in the sand and ignore the current compile-time and performance issues, but C++ programmers won't care about the excuses. They won't take a second look at the language if it branches on every single integer arithmetic operation. It's already pretty hard to sell the sacrifices the languages is making. I'm not sure how viable a C++ replacement the language is when anything doing array indexing is going to perform poorly without lots of unsafe code.

@nrc
Copy link
Member

nrc commented Jun 6, 2014

@thestinger if you are right (about the performance impact), than I would not support this, but I want to see evidence. Currently we have no data to indicate any performance hit in real software (and likewise no data to indicate there won't be a performance hit). Dismissing the feature because of poor performance in a micro-benchmark is premature optimisation of every program ever to be written in Rust.

Comments on array bounds checks, unwinding, and other performance concerns are off-topic for this RFC, please take them elsewhere.

@thestinger
Copy link

You're proposing unwinding via a branch on every arithmetic operation, so comments about the performance and compile-time woes caused by unwinding are very relevant. It's already a 2x compile-time hit, and adding more unwinding and lots of new branches will make it worse.

poor performance in a micro-benchmark is premature optimisation of every program ever to be written in Rust.

Rust should stop being advertised as a C competitor / systems language if it doesn't care about the obvious code mapping down to good machine code. If it's easier to write fast numeric code in Java, then the language isn't a systems language.

@bill-myers
Copy link
Author

In addition to performance, and the fact that it makes operations not associative, overflow checking is bad because it solves the wrong problem.

If one adds two integers that don't fit into a 32/64-bit integer, the ideal behavior is generally not to abort the program, but rather to simply store the result in a type that is large enough, even if that means the integer type is larger than 32/64 bits.

The RFC does propose that, and applied to @thestinger test case, would allow to easily express a program that at the LLVM bytecode level uses an array of 64-bit to 128-bit integers (depending on settings and whether the 1/10 addend is constant or not) instead of 32-bit integers, making overflow impossible.

@ghost
Copy link

ghost commented Jun 6, 2014

@bill-myers wasting half the registers and cache is not much better than excessive branching.

@bstrie
Copy link
Contributor

bstrie commented Jun 6, 2014

I'm in agreement with @thestinger. We've long since decided to eat the cost of bounds checking (which nobody is proposing that we remove), and that eats up the entirety of our imposed-overhead budget. I'm also starting to be convinced that the cost of unwinding will be unacceptable in the long run. Given these, we just don't have space for a decadent feature like this, especially one that doesn't close any memory safety holes. Moving optimization passes into the frontend sounds like a bad idea as well.

Having well-defined integer overflow is already a step forward from C and C++. Baby steps.

@brson
Copy link
Contributor

brson commented Jul 3, 2014

Per today's RFC triage we're going to close this. The default unchecked behavior is unlikely to change, and there are several other related proposals whose approach would be preferred over this.

@brson brson closed this Jul 3, 2014
withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Make stream::Fuse return None after finishing, not NotReady
wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
Solicit feedback about IE8 and IE9 support in Ember 2.x
wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants