Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change abs() to return unsigned integers #1017

Closed
wants to merge 3 commits into from
Closed

Change abs() to return unsigned integers #1017

wants to merge 3 commits into from

Conversation

pitdicker
Copy link
Contributor

Change the function abs() to return unsigned integers, instead of signed integers as it does today.
See http://internals.rust-lang.org/t/pre-rfc-change-abs-to-return-unsigned-integers/1747/5

@pitdicker pitdicker changed the title http://internals.rust-lang.org/t/pre-rfc-change-abs-to-return-unsigned-i... Change abs() to return unsigned integers Mar 26, 2015
@oli-obk
Copy link
Contributor

oli-obk commented Mar 27, 2015

Another alternative: return an Option<T> that only contains the value if a positive value could be obtained (e.g. fails for float NaN, or int min value). Note: I'm not advocating this alternative, just listing it for completeness

@Gankra
Copy link
Contributor

Gankra commented Mar 28, 2015

Do any languages with a signed-unsigned dichotomy actually do this? Java and C++ both do Self -> Self (Java only recently got support for dealing with unsigned values, though).

@oli-obk
Copy link
Contributor

oli-obk commented Mar 28, 2015

Ada's standard library complex numbers return a real value from the abs function.

   -- the interface definition
   function "abs"   (Right : Complex) return Real'Base;
   function "abs"   (Right : Imaginary) return Real'Base;

Then there are the vectors and matrices (http://www.adaic.org/resources/add_content/standards/05rat/html/Rat-7-6.html)

There are two operations "abs" applying to a Real_Vector thus 
function "abs"(Right: Real_Vector) return Real_Vector;
function "abs"(Right: Real_Vector) return Real'Base;

One returns a vector each of whose elements is the absolute value of the corresponding element of the parameter (rather boring) and the other returns a scalar which is the so-called L2-norm of the vector. This is the square root of the inner product of the vector with itself or √(Σxixi) – or just √(xixi) using the summation convention (which will be familiar to those who dabble in the relative world of tensors). This is provided as a distinct operation in order to avoid any intermediate overflow that might occur if the user were to compute it directly using the inner product "*".

Note: the Integer type's abs function does not yield a Natural type result. (Natural is defined as a subtype of Integer, which is like restricting an i32 to only zero or positive values by automatically inserting runtime-checks). This is most likely an oversight and still there for backwards compatibility, but I could not find any sources proving that.

@oli-obk
Copy link
Contributor

oli-obk commented Mar 30, 2015

Another alternative: insert debug checks similar to overflow checks to prevent Int::min_value() to be used in abs. Note: you get this for free if abs returns a unsigned type, since you then have to cast, which again checks for overflow.

@pitdicker
Copy link
Contributor Author

Interestingly, Firefox does not use the abs function of the C standard library, but a custom one that returns unsigned integers:
https://bugzilla.mozilla.org/show_bug.cgi?id=847480
http://whereswalden.com/2013/04/30/introducing-mozillaabs-to-mfbt/

These links also explain why C returns signed integers by default (because when the result is used in a formula, other values are automatically promoted to unsigned). This is a problem that Rust does not have.

@llogiq
Copy link
Contributor

llogiq commented Apr 1, 2015

@gankro: Java not only has incomplete support for working with unsigned integers. As of Java 8 there is no notion of (un)signedness in the type system, and the defined operations on java.lang.Long/java.lang.Integer don't even fully support the usual arithmetics.

C/C++ has the problem of implicit coercions, as pitdicker already pointed out.

This change would make the result strictly more correct, also the type would give a strong indication of the possible return values. Of course this would require explicit casts in those cases where calculations should continue with signed values, but this serves to make the problematic case (min_value) more obvious. I can also imagine it would help debugging if the failure comes from the cast operation instead of the abs function.

@rkjnsn
Copy link
Contributor

rkjnsn commented Apr 2, 2015

👍 I have seen some of the problems abs can cause in C. It can even lead to subtle security vulnerabilities when an algorithm assumes a positive result, and an attacker is able to construct an input that results in INT_MIN being passed. I think the proposal here is definitely the way to go. It makes the default behavior correct, while still leaving all of the other options open to the programmer.

@aturon
Copy link
Member

aturon commented Apr 2, 2015

@pnkfelix @nikomatsakis Thoughts?

@nikomatsakis
Copy link
Contributor

👍 I've been bitten by abs(INT_MIN) == INT_MIN causing real-world bugs before as well, and an unsigned result seems fairly logical. The analysis of actual uses suggests that this wouldn't really break much code "in the wild" -- and it's pretty easy to workaround.

@pnkfelix
Copy link
Member

pnkfelix commented Apr 7, 2015

Regarding the RFC itself, this seems ... incomplete. In particular, support for an fn abs method like this seems like it would require adding an associated type to the trait, something like this:

trait SignedInt {
    type UnsignedVariant;
    fn abs(&self) -> Self::UnsignedVariant;
    ...
}

right?

@pnkfelix
Copy link
Member

pnkfelix commented Apr 7, 2015

@aturon my gut reaction is that I usually expect operations like abs to be transformations that stay within their domain: abs: N -> N.

But it seems like a fair number of people here are in favor of this change, and it sounds like the real world bugs here may be relevant.

@nikomatsakis when you say "pretty easy to workaround", I wonder if you are expecting people to use as, or a checked-variant ... (where using as just reinjects the bug here ...)

Anyway, I do not object to this RFC, assuming the oversight noted above is addressed in some way.

@barosl
Copy link
Contributor

barosl commented Apr 8, 2015

I think that we should make sure whether we want the unsigned result or not. I don't quite remember, but in most cases the results have been more likely to be casted to signed. In that case, the occurrence of min_value() will be catched only in a debug build, as the overflow behavior is only enabled in it.

I'm more fond of doing a runtime check and panicking, or even making abs() return Option<T>. These two methods are a bit more expensive, but it certainly ensures that you will never meet such an invalid value even in a debug build.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 8, 2015

@barosl if you look at http://internals.rust-lang.org/t/pre-rfc-change-abs-to-return-unsigned-integers/1747/5?u=ker you can see that there's not a single case in rustc that casts it back to signed (except for the abs unit tests, but that's not a use case).

I could check out the most popular crates and make the same analysis on them...

I'm more fond of doing a runtime check and panicking, or even making abs() return Option. These two methods are a bit more expensive, but it certainly ensures that you will never meet such an invalid value even in a debug build.

could this be combined? by defining a trait for the abs function that is implemented twice. once returning a signed version with all the necessary checks in place, and once returning the unsigned version without any checks?

@ftxqxd
Copy link
Contributor

ftxqxd commented Apr 8, 2015

I've always thought that abs was the most obvious use of unsigned integers in any language, so it's consistently surprised me that basically every language seems to have abs return a signed integer instead. In fact, I remember trying to use abs as a way of converting a signed integer to an unsigned one, where the exact means wasn't important. (I don't quite remember the exact reason, but it was probably for testing purposes where I was, for example, trying to draw a sine wave onto a canvas that expected positive coordinates.)

@pnkfelix SignedInt and the other numeric traits are deprecated in favour of inherent methods on the built-in numeric types, so I don't think the RFC really needs any more detail in that regard.

@pnkfelix
Copy link
Member

pnkfelix commented Apr 8, 2015

The team discussed this in the weekly meeting last night: https://github.com/rust-lang/meeting-minutes/blob/master/weekly-meetings/2015-04-07.md#change-abs-to-return-unsigned

There was opposition to changing the type signature of abs.

(I did suggest the more conservative option of treating INT_MIN.abs() as an overflow condition -- i.e. it would be allowed to panic when debug-assertions are enabled. The response to this was that it would not accomplish the actual goal here, which I can accept.)

It seems like this is not going to be accepted, and instead one should write abs(a) as u32, which I believe will always work going forward. (Note that as does not check for overflow; see the updated RFC 560 for discussion.)

@nikomatsakis
Copy link
Contributor

(I did suggest the more conservative option of treating INT_MIN.abs() as an overflow condition -- i.e. it would be allowed to panic when debug-assertions are enabled. The response to this was that it would not accomplish the actual goal here, which I can accept.)

It's true that we talked about this in the meeting, but I think we didn't give it the consideration it deserves. It feels to me like abs should employ a debug-assert that the input is not INT_MIN.

@pnkfelix
Copy link
Member

pnkfelix commented Apr 9, 2015

@nikomatsakis but then at that point we should probably offer both variants (one that returns a signed int that may panic when given the minimum value, and another that returns an unsigned int that is guaranteed not to panic), right?

@bombless
Copy link

bombless commented Apr 9, 2015

more like

trait SignedInt {
    fn abs(&self) -> <Self as ToUnsigned>::Target where Self: ToUnsigned {
        let x = if self < 0 { -self } else { self };
        ToUnsigned::to_unsigned(x)
    }
    ...
}

@rkjnsn
Copy link
Contributor

rkjnsn commented Apr 9, 2015

I can understand not wanting to change the return type of abs now that beta has been released, though it's unfortunate. I still think the best solution is to change the return type of abs. It brings the edge case to the user's attention, but the user can easily cast back to signed if that's what he/she needs.

If that is not possible at this stage, I think it is very important that abs does a debug assert, and that we provide a uabs or similar variant that returns an unsigned integer. Furthermore, the documentation for abs should make it clear that the user needs to check for INT_MIN before calling it, and strongly recommend that he/she uses uabs instead if at all possible.

@pitdicker
Copy link
Contributor Author

Just added the alternatives discussed above to the rfc rendered.

@pnkfelix You are right that the rfc seems incomplete. But I did not know what to put in the detailed design section, as the implementation is only a couple of lines :).

@nrc nrc assigned brson Apr 9, 2015
@iopq
Copy link
Contributor

iopq commented Apr 10, 2015

I think this is a very good idea. This statically guarantees no overflow vs. the debug assert solution that is SOMETIMES triggered by tests, but sometimes not covered.

@theemathas
Copy link

I believe that uabs is very important, since you can easily implement the old abs in terms of uabs (just add as), but not vice versa.

@Gankra
Copy link
Contributor

Gankra commented Apr 10, 2015

@theemathas this is simply incorrect. uabs is easily implemented by abs due to the fact that two's complement INTEGER_TYPE::MIN has the same bitwise representation as the positive unsigned value. That is -128i8 and 128u8 are the same bitwise representation: 1000...000.

This runs successfully:

use std::i8;

fn main() {
    for (i, u) in (i8::MIN..1).rev().zip(0u8..) {
        assert_eq!(u, i.abs() as u8);
    }
}

@liigo
Copy link
Contributor

liigo commented Apr 10, 2015 via email

@pnkfelix
Copy link
Member

@gankro if we adopt the potential-panic-on-int-min semantics for abs, then one will indeed not be able to implement uabs in terms of abs. (I admit that the text of @theemathas was ambiguous about whether this was what was meant.)

@theemathas
Copy link

@gankro @pnkfelix I thought that the plan is that as will panic on debug builds in the case of overflow.

In any case, you can always have a branch checking for the minimum integer, but if you are going that far, you might as well implement abs() yourself

@pnkfelix
Copy link
Member

@theemathas No, plans changed. as does not and will not check for overflow; see the updated RFC 560 for discussion.

@pnkfelix
Copy link
Member

So, while mistakenly looking for this thread on the rust repo, I did find these old links:

which is mostly a historical curiosity.


My immediate question is this: Should be mark the fn abs method as unstable now on the 1.0 beta channel, so that we do not rush into a decision on this matter now?

Or should we just make a decision, period?

(Update: Or should we add stable fn uabs method, mark fn abs as unstable, and plan to change fn abs to potentially panic?)

@pitdicker
Copy link
Contributor Author

@pnkfelix
What would be the effects of marking abs as unstable? It would be unavailable on beta and stable. Probably libraries that use abs would just implement it themselves (I have already seen 2 that provide their own wrapper).
And it may break more crates than implementing this RFC. I would vote for a decision :)

I have my doubts about adding a seperate fn uabs. Why would integer types need two functions, but float and big integers only one? How would this work together with a possible trait for abs?
And, altough I started the RFC and would like to see it 'right', I think this function is not important enough to have two versions of.

@nikomatsakis
Copy link
Contributor

@pitdicker well, fixed-size integers are the only ones vulnerable to this discontinuity. One can make the same argument but about types: why should integer types change to "unsigned" variances but not floating-point or bigint?

My current feeling is that abs should be implemented as:

if value < 0 { -abs } else { abs }

this implies that it may be panic on overflow.

I have no objection to adding uabs. I do agree this a lot of fns for such a simple thing, but I like that the name is different and the type signature is different (uabs is not T -> T, it's T -> U).

@pitdicker
Copy link
Contributor Author

Previous weekend brson was kind enough to test what impact this change would have on the crates on crates.io.
Of the 989 crates tested, 492 build succesfully before the change. I had to modify 7 crates to get all to build again:

  • glm-0.1.16
  • image-0.3.7
  • nalgebra-0.2.10
  • num-0.1.22
  • pitch_calc-0.9.3
  • quickcheck-0.2.12
  • utp-0.2.3

glm and nalgebra contain a wrapper for abs, so if the return type of abs changes, so must the return type of these functions (or not, depending on the goals of the authors).

num uses abs for the implementations of fn gcd. This function is plagued by the same corner case as abs, as gcd(MIN, MIN)==MAX+1. So this highlights a potential error.

image, pitch_calc and utp use abs when calculating the difference between two integers (e.g. (a-b).abs). The values are supposed to be very small compared to the integer size, so changing abs to unsigned does not have much benefit here. However, this way of calculating a difference can easily overflow if the values are larger, and I have a WIP crate just for cases like this.

For quickcheck, this code goes over my head... :)

Conclusion: I think the impact is small, but could be worth it especially when a crate fails to build.
And as a quick fix for a crate, a single cast is all that is needed to restore the old behaviour.

@rkjnsn
Copy link
Contributor

rkjnsn commented Apr 17, 2015

I think there should definitely be an abs that returns unsigned. A version correct for all inputs should be easily accessible and usable. I think changing the current abs function makes the most sense. The default behavior will be correct, and any user who needs to can still easily cast back to a signed it, explicitly acknowledging the range limitations.

If, however, the current return type of abs isn't changed (due to proximity to our final release or any other reason), I think we absolutely should introduce a separate uabs function. Again, correct behavior should be easily accessible.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 18, 2015
@alexcrichton
Copy link
Member

This RFC unfortunately did not reach consensus before the 1.0 release, and these methods have now been stabilized. The abs method does indeed panic on overflow (following the panic semantics of the rest of the standard library), however. Due to this API being unable to change now I'm going to close this RFC.

Thanks regardless for the RFC @pitdicker!

@rkjnsn
Copy link
Contributor

rkjnsn commented Jun 2, 2015

To propose the addition of a uabs method, would it be better to start with an RFC or a pull request? (I know there was some discussion about major vs. minor changes, etc.)

@alexcrichton
Copy link
Member

I'd recommend an RFC for that API. The libraries subteam hasn't quite settled on what concretely needs an RFC and what doesn't, but I suspect that the guidelines would recommend an RFC for a method like uabs.

@Gankra
Copy link
Contributor

Gankra commented Jun 3, 2015

(I would personally accept a PR for it; but that's just me)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.