Integer Guidelines RFC #741

wycats · 2015-01-27T05:11:06Z

aturon · 2015-01-27T05:28:34Z

1fish2 · 2015-01-27T05:32:03Z

text/0000-int-guidelines.md

+The goal of this RFC is to help people decide what integer types to use when they need
+to make a decision for a new API.
+
+It builds on [https://github.com/rust-lang/rfcs/pull/560](the integer overflow RFC),


^^ the URL and link text are reversed in this markup

Good catch.

petrochenkov · 2015-01-27T07:43:21Z

Bit sizes are not covered. (rust-lang/rust#20211 (comment))
There should be a convention - to keep using usize for them or switch to the "usize * 2" types

mahkoh · 2015-01-27T10:21:32Z

text/0000-int-guidelines.md

+  addressable memory.
+* Using of unsigned integers is traditionally thought to be error-prone, and
+  style guides often suggest avoiding them. That said, Rust's unsigned integers
+  have built-in underflow assertions, which changes the analysis.


You seem to forget that that RFC is still open. Or maybe that's just distraction.

arielb1 · 2015-01-27T19:59:43Z

@thestinger

Big integers mean that all your structs become non-Copy, which is for many applications a cure worse than the disease.

nrc · 2015-01-27T20:01:43Z

text/0000-int-guidelines.md

+  be appropriate, but use of `usize` in general can introduce portability
+  hazards when the use-case is not proportional to the amount of
+  addressable memory.
+* Using of unsigned integers is traditionally thought to be error-prone, and


I am somewhat cynical of this constraint - I have never come across this school of thought outside the Google style guide linked above. Sure they have invariants you need to be careful not to violate, but no more so than signed integers (they both overflow and underflow, just at different places). Using an unsigned int inappropriately is error prone, but the same goes for signed ints.

More aspects about unsigned integers:

Unsigned integers are fitting for bit patterns, hash codes, wraparound mod 2^N, and values that aren't really numbers.

Since Rust requires explicit conversions from signed to unsigned integers, some C/C++ cautionary cases don't apply. [My mistake in not realizing this sooner.] E.g. it's tempting to make a C function take an uint32_t so it doesn't have to deal with negative inputs but it's easy in C to accidentally pass in a negative integer. Now the function can't detect that bad input unless it has a suitable upper bound.
[BTW, "polymorphic array indexing" would make it OK to index a Rust array with a signed integer. Rustc could implement a signed bounds check with a single unsigned comparison (at usize or larger width) since the upper bound cannot exceed isize::MAX.]

Hopefully rustc will always complain about i >= 0 for unsigned i.

There are plenty of cases to be careful about exceeding the unsigned conceptual domain. Making a value unsigned looks better than it works out.

It's easy for values (esp. intermediate values) to go negative.

Be careful. Don't code like my brother.

thestinger · 2015-01-27T20:03:19Z

@arielb1: Incorrect code doesn't qualify as a viable option in the first place.

arielb1 · 2015-01-27T20:03:36Z

If you are using integers for counting, you essentially can't overflow an i64. If you're using integers with a size-limited datastore, you will have problems when they get too large anyway. It isn't so easy to accidentally overflow an i64.

thestinger · 2015-01-27T20:04:34Z

The fact that the overflows won't occur in the common case but rather in edge cases with pathological inputs doesn't mean it is correct.

nagisa · 2015-01-27T20:05:11Z

text/0000-int-guidelines.md

+value.
+
+For example, it is generally safe to cast an `i32` to an `i64`. It is also
+generally safe to case a `usize` to a `u64`, since it will never be bigger


Not expected to exceed u64 in a foreseeable future, but certainly will… at some time.

arielb1 · 2015-01-27T20:21:23Z

@thestinger

Actually, I meant more like s/anyone/a typical Python or Go developer/.

arielb1 · 2015-01-27T20:23:20Z

Also, in systems programming, 99% of integers are used for counting things, in which case i64 is a definitely correct type (and u32 is an incorrect-with-decent-probability type).

thestinger · 2015-01-27T20:24:12Z

No, just you. The default state is non-Copy and most types are non-Copy. There is also a push to make things non-Copy when they don't have to be for style reasons (which is stupid, but it will win). There are very few or no additional clone calls required by big integers in most cases, and they don't have a significant impact on the code cleanliness.

arielb1 · 2015-01-27T20:33:15Z

@thestinger

If we introduce a decent += operator (and make the regular impl of Add take both arguments by reference), my opinion might change. However, with the current state of Rust everything becomes lost in the noise.

On the "intuition" front, my integer-sizing method is:

Is it going to take a sizable amount of memory? Then do a detailed range analysis and pick the smallest type that fits.
Can I think of it as being bigger than a million ("laughably smaller")? If not, use a 32-bit type.
Is it bounded by a count of things in address space? Then use usize.
Is it bounded by a count of things? Use a 64-bit type
Can I introduce a bound? Add it and continue.
Use a bigint.

thestinger · 2015-01-27T20:38:48Z

That's a perfectly sane system as long as by count you mean an integer that you only ever increment (as in += 1 or similarly tiny values) and decrement. It can be very wrong if you apply it to counts of things in the real world.

thestinger · 2015-01-27T20:41:37Z

@arielb1: I'm not recommending anything different than you just stated. I'm simply stating that when you don't have a clear understanding of the bounds, you need to fix that by implementing clear bounds or use a big integer. In many cases, bailing out at the 64-bit boundaries isn't an option. This proposal pushes for using intuition to pick the types, which is a poor substitute for correctness.

arielb1 · 2015-01-27T20:41:38Z

I guess it won't work for a count of atoms ;-), and it won't work if you can receive "counts" of things from untrusted servers, but (of course) incrementing it by more than 1 at a time is OK, as long as you did "access" that many things (e.g. a count of bytes received over some connection).

thestinger · 2015-01-27T21:29:53Z

@arielb1: Sure, it's reasonable as long as the work is closely tied to the count. It can rapidly become a real problem as the ratio of work (in some measure like a CPU instruction) to magnitude of the increments drops below 1.

brson · 2015-01-27T21:39:31Z

@thestinger Please try to keep your arguments in good faith. Accusing others of 'vague nonsense', writing 'brain-dead' guidelines, and not being 'sane' are overly vitriolic and don't help your argument.

thestinger · 2015-01-27T21:56:42Z

@brson: I'm not interested in your self-righteous bullshit. People who endlessly spread FUD and play nasty political games don't deserve any respect. You're not a neutral party and I don't consider any of you as having any credibility, so you're wasting your time.

thestinger · 2015-01-27T21:58:26Z

Being passive aggressive and manipulative in ways that seriously damage the project is the modus operandi of the core developers. Then you dare come here and act as if you're superior because clearly nasty politics is so much better than honest speech.

Valloric · 2015-01-27T22:21:27Z

The guidelines make sense to me, but I am concerned about the following:

unsigned integers have underflow checking assertions built-into the type

From what I understand, those assertions will be in debug-only builds so I wouldn't rely on it. In my experience, it makes the most sense to treat unsigned integers as just a bit-sequence. While I've never been bit by an integer overflow, I've been bit by underflow hundreds of times, always in esoteric ways.

The general rule of thumb I (and others I work with) use is "use a signed integer unless you have a really good reason to use an unsigned one." The "mental default" is a signed 32 bit.

WRT comments here regarding "intuition" being the wrong approach, that's a strawman. There's tons of use-cases where the programmer knows their data will fit in 32 bits. Examples:

number of RPCs my system has outstanding,
number of users of my CRUD app,
number of HTML nodes in a document,
number of retry attempts I'll need to make until my data is committed to the database,
number of TPS reports the user wants to create in the application at once,
number of entries in the user's email contact list,
number of logical CPU cores on the machine my program can talk to,
etc.

All of these numbers are "laughably smaller" than the ~2 billion that fit in a 32 bit signed int. For all of them, you could say "I imagine a ridiculous situation where a 32bit int isn't enough" but it would be absurd. A document with 2 billion HTML nodes is going to crash your browser no matter what your counter is. 2 billion outstanding RPCs will crash your server (and will probably consume all of your RAM). 2 billion retry attempts to commit data and you already have bigger problems. A user trying to create 2 billion TPS reports at once in a webapp can't achieve this without crashing their browser. Nobody has 2 billion emails in their contact list.

In other words, the corner cases aren't reasonable when looked at holistically. If someone writes a script to add more than 2 billion emails to their contact list, other parts of your system should catch such malicious behavior. If you don't have that, a 64bit int won't help you. You'll have bigger problems much sooner than the overflow.

The other thing I'd also change is I'd add a stronger admonishment against the usage of isize/usize. They should be used in extremely rare circumstances and I'm worried they'll be more widely used than that, leading to code portability problems.

Valloric · 2015-01-27T22:23:35Z

@thestinger Comments like the following:

I'm not interested in your self-righteous bullshit. People who endlessly spread FUD and play nasty political games don't deserve any respect. You're not a neutral party and I don't consider any of you as having any credibility, so you're wasting your time.

do not motivate others to agree with you and are just unkind. Even if you truly believe you have been so wronged that such a comment would be appropriate, please refrain from making it either way. Nothing good will come of it.

thestinger · 2015-01-27T22:25:39Z

Ignoring edge cases because they aren't seen as problems in practice is why software sucks so much.

In other words, the corner cases aren't reasonable when looked at holistically. If someone writes a script to add more than 2 billion emails to their contact list, other parts of your system should catch such malicious behavior. If you don't have that, a 64bit int won't help you. You'll have bigger problems much sooner than the overflow.

Explicitly enforcing bounds that make overflow impossible was exactly what I recommended. If you aren't enforcing the bounds and there aren't clear limits that are within the range of the integer, then the code isn't correct and needs to be fixed.

The other thing I'd also change is I'd add a stronger admonishment against the usage of isize/usize. They should be used in extremely rare circumstances and I'm worried they'll be more widely used than that, leading to code portability problems.

Data limited in size by the available address space is not rare.

mahkoh · 2015-01-27T22:28:00Z

The other thing I'd also change is I'd add a stronger admonishment against the usage of isize/usize. They should be used in extremely rare circumstances and I'm worried they'll be more widely used than that, leading to code portability problems.

You've given lots of examples where isize or usize are perfect because you know that the value will be limited by the maximal size of an array:

number of RPCs my system has outstanding,

number of users of my CRUD app,

number of HTML nodes in a document,

number of TPS reports the user wants to create in the application at once,

number of entries in the user's email contact list,

thestinger · 2015-01-27T22:28:17Z

do not motivate others to agree with you and are just unkind. Even if you truly believe you have been so wronged that such a comment would be appropriate, please refrain from making it either way. Nothing good will come of it.

I don't respect people who don't respect me and have treated me like shit. I don't expect any good to come out of any of my involvement with Rust, but I'm here anyway. If someone wants to seek out confrontation then that's what they can have.

nrc · 2015-01-27T22:30:35Z

@Valloric I think your point about underflow assertions only being on in debug builds is important. Programmers should be reasoning based on checks which are always available, and any extra checks regarded as a safety net.

re 'intuition' your argument here going in to the detail of why these examples are 'laughably smaller' is pretty much what I think programmers should be doing and is the opposite of what the 'laughably smaller' guideline suggests to me - to take one of your examples a bit further, a programmer might think "lol, no one has 2 million" emails in their inbox, but of course then the code gets used in some automatic mailer and there are more than that. Put it another way, it is easy to mistakenly believe a number is laughably smaller than the limit, when it is not.

Valloric · 2015-01-27T22:38:02Z

Ignoring edge cases because they aren't seen as problems in practice is why software sucks so much.

I'm not at all supporting ignoring edge cases; I am talking about risk management. It is neither reasonable nor useful to worry about a database retry commit counter overflowing 32 bits. There's like 50 different much worse things that would happen before that, from the client hanging for what must be forever to the database dropping your connection etc.

Same thing with 2 billion nodes in an HTML document. Your node counter overflowing will be the least of your worries. Etc.

to take one of your examples a bit further, a programmer might think "lol, no one has 2 million" emails in their inbox,

That's explicitly an example I chose not to use, because I can easily see that happening.

Programmers should be reasoning based on checks which are always available, and any extra checks regarded as a safety net.

Agreed 100%. I don't trust debug-only checks (although I use a ton of them and they save me lots of time). They're a nice-to-have, not something to rely on.

You've given lots of cases where isize or usize are perfect because you know that the value will be limited by the maximal size of an array:

If the data is being put in an array, then I agree. But that's not a given.

thestinger · 2015-01-27T22:47:42Z

If the data is being put in an array, then I agree. But that's not a given.

It's large enough to count in-memory objects that aren't zero-size, which is a very common case. It works for collection sizes and uses like reference counting in general.

thestinger · 2015-01-27T22:52:31Z

I'm not at all supporting ignoring edge cases; I am talking about risk management. It is neither reasonable nor useful to worry about a database retry commit counter overflowing 32 bits. There's like 50 different much worse things that would happen before that, from the client hanging for what must be forever to the database dropping your connection etc.

If you know what happens when it does overflow and the consequences are not serious, then you've already done the right thing. You could even make test cases to verify that it's a sane soft failure instead of something disastrous. Ignoring the case because it doesn't seem like it would happen in practice is designing for insecurity and unreliability. The usual design paradigm is to throw a 32-bit or 64-bit integer at the problem because it feels like it's enough, and then it turns out to be a catastrophic bug.

Valloric · 2015-01-27T23:01:20Z

I don't respect people who don't respect me and have treated me like shit.

I can understand that, but do you think your "scorched earth" approach is doing you or anyone else any good? I doubt it's making you happier and it certainly isn't making people come around to your position. So what's the point? How is it useful? Plenty of others (myself included) have told you over time that seeing those kinds of responses makes them less likely to agree with you even if they think you have a point.

Further, people make mistakes. Even if the core team has treated you disrespectfully in the past, they aren't doing it now. You are treating them that way currently and have been for a while now. Is there really no room for "forgive, forget, move on"? Are you just going to treat them like crap until they finally ban you? Because it's probably going to happen; I've banned people from projects I personally manage for far less. Will that do you any good? Will it make you happier?

Daniel, you have excellent technical insight but you then wrap it in such caustic delivery that you might as well not have that insight to begin with, for all the good it's doing you.

Life is about managing when others treat you like shit.

I don't expect any good to come out of any of my involvement with Rust, but I'm here anyway. If someone wants to seek out confrontation then that's what they can have.

Nobody is seeking out a confrontation here, but you certainly seem to be trying to instigate one with your behavior. The only thing brson asked you was to try to remain civil, which is an entirely reasonable request and one that shouldn't be necessary in the first place. I commend him for the attempt; it was obviously needed.

You keep mentioning how you have been treated poorly by the core team and all I have ever witnessed in the ~2 years I have been (closely) following Rust is you repeatedly treating them like crap and them taking it stoically. All I continue to see is them looking past your unkind behavior and not banning you, even though the behavior has warranted it on countless occasions (I don't think anyone can objectively disagree with that).

So just tone it down a notch. That's all.

thestinger · 2015-01-27T23:36:16Z

I can understand that, but do you think your "scorched earth" approach is doing you or anyone else any good? I doubt it's making you happier and it certainly isn't making people come around to your position. So what's the point? How is it useful? Plenty of others (myself included) have told you over time that seeing those kinds of responses makes them less likely to agree with you even if they think you have a point.

I'm not trying to win a popularity contest. I'm not interested in playing two-faced public relations games like them.

Further, people make mistakes. Even if the core team has treated you disrespectfully in the past, they aren't doing it now. You are treating them that way currently and have been for a while now. Is there really no room for "forgive, forget, move on"? Are you just going to treat them like crap until they finally ban you? Because it's probably going to happen; I've personally banned people from projects I managed for far less. Will that do you any good? Will it make you happier?

They certainly are still doing it now. You obviously buy into their portrayal of the situation, and that's your prerogative. I'm not sure why you're talking to me about it when you're starting off with the assumption it's a one-sided situation. I'm obviously not interested in the opinion of someone who thinks that I'm an idiot.

Daniel, you have excellent technical insight but you then wrap it in such caustic delivery that you might as well not have that insight to begin with, for all the good it's doing you.

Life is about managing when others treat you like shit.

It seems to be working out just fine.

Nobody is seeking out a confrontation here, but you certainly seem to be trying to instigate one with your behavior. The only thing brson asked you was to try to remain civil, which is an entirely reasonable request and one that shouldn't be necessary in the first place. I commend him for the attempt; it was obviously needed.

Calling me incompetent and trashing my contributions is looking for confrontation. I really don't care what someone thinks when their sole mechanism of interaction with me is to come up with ways of blocking my contributions and talking down to me, so @brson doing that yet again is not a very interesting event.

You keep mentioning how you have been treated poorly by the core team and all I have ever witnessed in the ~2 years I have been (closely) following Rust is you repeatedly treating them like crap and them taking it stoically. All I continue to see is them looking past your unkind behavior and not banning you, even though the behavior has warranted it on countless occasions (I don't think anyone can objectively disagree with that).

This only demonstrates that being passive aggressive and two-faced has been successful. It's sad that you worship authority to the point where you can't see through their nonsense, but it doesn't really bother me.

So just tone it down a notch. That's all.

You're the one escalating things to a personal level. That's toning it up a notch, not down. I'm really not sure what you hope to accomplish by patronizing me. At this point, you're just trolling.

thestinger · 2015-01-27T23:50:22Z

If you can't win on the facts, speak down to people from a feigned position of impartiality and imagined authority. It's really a joyous experience to participate in this community. Stating that you think an idea is stupid is off limits, but making condescending personal attacks on people is totally acceptable. Rust Logic.

srh · 2015-01-28T03:57:59Z

text/0000-int-guidelines.md

+
+In Rust, unsigned integers have underflow checking assertions built-into
+the type (assuming that RFC XXX is accepted), so using a `u32` is equivalent
+to the advice in Google style guide (with a larger maximum value).


You can often find situations where a break statement in such a for loop means that it rarely hits zero (and thus rarely decrements below zero). Where, by "often"... I've seen such happen. Other examples include correct code that "subtracts first", e.g. x - 1 + y, where y is positive, where the ephemeral negative value isn't a bad thing. Signed types avoid such edge cases. You can consider me firmly in the camp of signed favoritism. Ideally, use signed types everywhere and never encounter unsigned types. Unfortunately the real situation is that you should use whatever matches best with the libraries and interfaces you're using.

It's ridiculous that some RFC would deign to decide this question.

It's reasonable for there to be general guidelines for stuff like this when there's a strong consensus, but I don't think there can be one for these issues. There is little to no evidence in favour of any specific choices, just a lot of dubious claims from every side.

sanxiyn · 2015-01-28T06:24:02Z

I am strongly in favor of "decide not to issue guidance here" alternative listed in the RFC, because the RFC does not seem to be well thought out enough to have universal agreement. The drawback of that alternative is real, but in my opinion it is best addressed, at the very least, after we decide on widening, for example.

aturon · 2015-02-13T03:39:31Z

I'm going to close this RFC for the time being. We'll hash out working guidelines as part of the ongoing std integer audit, which may shed some light on the right avenue to take here.

Create 0000-int-guidelines.md

2ec59ed

1fish2 reviewed Jan 27, 2015
View reviewed changes

mahkoh reviewed Jan 27, 2015
View reviewed changes

nrc reviewed Jan 27, 2015
View reviewed changes

nagisa reviewed Jan 27, 2015
View reviewed changes

nrc assigned wycats Jan 27, 2015

aturon mentioned this pull request Jan 28, 2015

Stabilization for 1.0-alpha2 rust-lang/rust#20761

Closed

38 tasks

srh reviewed Jan 28, 2015
View reviewed changes

alexcrichton mentioned this pull request Feb 9, 2015

Make Int::pow() take exp as u32 instead usize rust-lang/rust#22087

Merged

aturon closed this Feb 13, 2015

Integer Guidelines RFC #741

Integer Guidelines RFC #741

Conversation

wycats commented Jan 27, 2015

aturon commented Jan 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petrochenkov commented Jan 27, 2015

Choose a reason for hiding this comment

arielb1 commented Jan 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thestinger commented Jan 27, 2015

arielb1 commented Jan 27, 2015

thestinger commented Jan 27, 2015

Choose a reason for hiding this comment

arielb1 commented Jan 27, 2015

arielb1 commented Jan 27, 2015

thestinger commented Jan 27, 2015

arielb1 commented Jan 27, 2015

thestinger commented Jan 27, 2015

thestinger commented Jan 27, 2015

arielb1 commented Jan 27, 2015

thestinger commented Jan 27, 2015

brson commented Jan 27, 2015

thestinger commented Jan 27, 2015

thestinger commented Jan 27, 2015

Valloric commented Jan 27, 2015

Valloric commented Jan 27, 2015

thestinger commented Jan 27, 2015

mahkoh commented Jan 27, 2015

thestinger commented Jan 27, 2015

nrc commented Jan 27, 2015

Valloric commented Jan 27, 2015

thestinger commented Jan 27, 2015

thestinger commented Jan 27, 2015

Valloric commented Jan 27, 2015

thestinger commented Jan 27, 2015

thestinger commented Jan 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanxiyn commented Jan 28, 2015

aturon commented Feb 13, 2015