Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer Guidelines RFC #741

Closed
wants to merge 1 commit into from
Closed

Conversation

wycats
Copy link
Contributor

@wycats wycats commented Jan 27, 2015

@aturon
Copy link
Member

aturon commented Jan 27, 2015

cc @Valloric @CloudiDust @1fish2

The goal of this RFC is to help people decide what integer types to use when they need
to make a decision for a new API.

It builds on [https://github.com/rust-lang/rfcs/pull/560](the integer overflow RFC),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ the URL and link text are reversed in this markup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@petrochenkov
Copy link
Contributor

Bit sizes are not covered. (rust-lang/rust#20211 (comment))
There should be a convention - to keep using usize for them or switch to the "usize * 2" types

addressable memory.
* Using of unsigned integers is traditionally thought to be error-prone, and
style guides often suggest avoiding them. That said, Rust's unsigned integers
have built-in underflow assertions, which changes the analysis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to forget that that RFC is still open. Or maybe that's just distraction.

@arielb1
Copy link
Contributor

arielb1 commented Jan 27, 2015

@thestinger

Big integers mean that all your structs become non-Copy, which is for many applications a cure worse than the disease.

be appropriate, but use of `usize` in general can introduce portability
hazards when the use-case is not proportional to the amount of
addressable memory.
* Using of unsigned integers is traditionally thought to be error-prone, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat cynical of this constraint - I have never come across this school of thought outside the Google style guide linked above. Sure they have invariants you need to be careful not to violate, but no more so than signed integers (they both overflow and underflow, just at different places). Using an unsigned int inappropriately is error prone, but the same goes for signed ints.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More aspects about unsigned integers:

  • Unsigned integers are fitting for bit patterns, hash codes, wraparound mod 2^N, and values that aren't really numbers.
  • Since Rust requires explicit conversions from signed to unsigned integers, some C/C++ cautionary cases don't apply. [My mistake in not realizing this sooner.] E.g. it's tempting to make a C function take an uint32_t so it doesn't have to deal with negative inputs but it's easy in C to accidentally pass in a negative integer. Now the function can't detect that bad input unless it has a suitable upper bound.
    [BTW, "polymorphic array indexing" would make it OK to index a Rust array with a signed integer. Rustc could implement a signed bounds check with a single unsigned comparison (at usize or larger width) since the upper bound cannot exceed isize::MAX.]
  • Hopefully rustc will always complain about i >= 0 for unsigned i.
  • There are plenty of cases to be careful about exceeding the unsigned conceptual domain. Making a value unsigned looks better than it works out.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It's easy for values (esp. intermediate values) to go negative.
  • Be careful. Don't code like my brother.

@thestinger
Copy link

@arielb1: Incorrect code doesn't qualify as a viable option in the first place.

@arielb1
Copy link
Contributor

arielb1 commented Jan 27, 2015

If you are using integers for counting, you essentially can't overflow an i64. If you're using integers with a size-limited datastore, you will have problems when they get too large anyway. It isn't so easy to accidentally overflow an i64.

@thestinger
Copy link

The fact that the overflows won't occur in the common case but rather in edge cases with pathological inputs doesn't mean it is correct.

value.

For example, it is generally safe to cast an `i32` to an `i64`. It is also
generally safe to case a `usize` to a `u64`, since it will never be bigger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not expected to exceed u64 in a foreseeable future, but certainly will… at some time.

@arielb1
Copy link
Contributor

arielb1 commented Jan 27, 2015

@thestinger

Actually, I meant more like s/anyone/a typical Python or Go developer/.

@arielb1
Copy link
Contributor

arielb1 commented Jan 27, 2015

Also, in systems programming, 99% of integers are used for counting things, in which case i64 is a definitely correct type (and u32 is an incorrect-with-decent-probability type).

@thestinger
Copy link

No, just you. The default state is non-Copy and most types are non-Copy. There is also a push to make things non-Copy when they don't have to be for style reasons (which is stupid, but it will win). There are very few or no additional clone calls required by big integers in most cases, and they don't have a significant impact on the code cleanliness.

@arielb1
Copy link
Contributor

arielb1 commented Jan 27, 2015

@thestinger

If we introduce a decent += operator (and make the regular impl of Add take both arguments by reference), my opinion might change. However, with the current state of Rust everything becomes lost in the noise.

On the "intuition" front, my integer-sizing method is:

  • Is it going to take a sizable amount of memory? Then do a detailed range analysis and pick the smallest type that fits.
  • Can I think of it as being bigger than a million ("laughably smaller")? If not, use a 32-bit type.
  • Is it bounded by a count of things in address space? Then use usize.
  • Is it bounded by a count of things? Use a 64-bit type
  • Can I introduce a bound? Add it and continue.
  • Use a bigint.

@thestinger
Copy link

That's a perfectly sane system as long as by count you mean an integer that you only ever increment (as in += 1 or similarly tiny values) and decrement. It can be very wrong if you apply it to counts of things in the real world.

@thestinger
Copy link

@arielb1: I'm not recommending anything different than you just stated. I'm simply stating that when you don't have a clear understanding of the bounds, you need to fix that by implementing clear bounds or use a big integer. In many cases, bailing out at the 64-bit boundaries isn't an option. This proposal pushes for using intuition to pick the types, which is a poor substitute for correctness.

@arielb1
Copy link
Contributor

arielb1 commented Jan 27, 2015

I guess it won't work for a count of atoms ;-), and it won't work if you can receive "counts" of things from untrusted servers, but (of course) incrementing it by more than 1 at a time is OK, as long as you did "access" that many things (e.g. a count of bytes received over some connection).

@thestinger
Copy link

@arielb1: Sure, it's reasonable as long as the work is closely tied to the count. It can rapidly become a real problem as the ratio of work (in some measure like a CPU instruction) to magnitude of the increments drops below 1.

@brson
Copy link
Contributor

brson commented Jan 27, 2015

@thestinger Please try to keep your arguments in good faith. Accusing others of 'vague nonsense', writing 'brain-dead' guidelines, and not being 'sane' are overly vitriolic and don't help your argument.

@thestinger
Copy link

@brson: I'm not interested in your self-righteous bullshit. People who endlessly spread FUD and play nasty political games don't deserve any respect. You're not a neutral party and I don't consider any of you as having any credibility, so you're wasting your time.

@thestinger
Copy link

Being passive aggressive and manipulative in ways that seriously damage the project is the modus operandi of the core developers. Then you dare come here and act as if you're superior because clearly nasty politics is so much better than honest speech.

@Valloric
Copy link

The guidelines make sense to me, but I am concerned about the following:

unsigned integers have underflow checking assertions built-into the type

From what I understand, those assertions will be in debug-only builds so I wouldn't rely on it. In my experience, it makes the most sense to treat unsigned integers as just a bit-sequence. While I've never been bit by an integer overflow, I've been bit by underflow hundreds of times, always in esoteric ways.

The general rule of thumb I (and others I work with) use is "use a signed integer unless you have a really good reason to use an unsigned one." The "mental default" is a signed 32 bit.

WRT comments here regarding "intuition" being the wrong approach, that's a strawman. There's tons of use-cases where the programmer knows their data will fit in 32 bits. Examples:

  • number of RPCs my system has outstanding,
  • number of users of my CRUD app,
  • number of HTML nodes in a document,
  • number of retry attempts I'll need to make until my data is committed to the database,
  • number of TPS reports the user wants to create in the application at once,
  • number of entries in the user's email contact list,
  • number of logical CPU cores on the machine my program can talk to,
  • etc.

All of these numbers are "laughably smaller" than the ~2 billion that fit in a 32 bit signed int. For all of them, you could say "I imagine a ridiculous situation where a 32bit int isn't enough" but it would be absurd. A document with 2 billion HTML nodes is going to crash your browser no matter what your counter is. 2 billion outstanding RPCs will crash your server (and will probably consume all of your RAM). 2 billion retry attempts to commit data and you already have bigger problems. A user trying to create 2 billion TPS reports at once in a webapp can't achieve this without crashing their browser. Nobody has 2 billion emails in their contact list.

In other words, the corner cases aren't reasonable when looked at holistically. If someone writes a script to add more than 2 billion emails to their contact list, other parts of your system should catch such malicious behavior. If you don't have that, a 64bit int won't help you. You'll have bigger problems much sooner than the overflow.

The other thing I'd also change is I'd add a stronger admonishment against the usage of isize/usize. They should be used in extremely rare circumstances and I'm worried they'll be more widely used than that, leading to code portability problems.

@Valloric
Copy link

@thestinger Comments like the following:

I'm not interested in your self-righteous bullshit. People who endlessly spread FUD and play nasty political games don't deserve any respect. You're not a neutral party and I don't consider any of you as having any credibility, so you're wasting your time.

do not motivate others to agree with you and are just unkind. Even if you truly believe you have been so wronged that such a comment would be appropriate, please refrain from making it either way. Nothing good will come of it.

@thestinger
Copy link

Ignoring edge cases because they aren't seen as problems in practice is why software sucks so much.

In other words, the corner cases aren't reasonable when looked at holistically. If someone writes a script to add more than 2 billion emails to their contact list, other parts of your system should catch such malicious behavior. If you don't have that, a 64bit int won't help you. You'll have bigger problems much sooner than the overflow.

Explicitly enforcing bounds that make overflow impossible was exactly what I recommended. If you aren't enforcing the bounds and there aren't clear limits that are within the range of the integer, then the code isn't correct and needs to be fixed.

The other thing I'd also change is I'd add a stronger admonishment against the usage of isize/usize. They should be used in extremely rare circumstances and I'm worried they'll be more widely used than that, leading to code portability problems.

Data limited in size by the available address space is not rare.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 27, 2015

The other thing I'd also change is I'd add a stronger admonishment against the usage of isize/usize. They should be used in extremely rare circumstances and I'm worried they'll be more widely used than that, leading to code portability problems.

You've given lots of examples where isize or usize are perfect because you know that the value will be limited by the maximal size of an array:

  • number of RPCs my system has outstanding,
  • number of users of my CRUD app,
  • number of HTML nodes in a document,
  • number of TPS reports the user wants to create in the application at once,
  • number of entries in the user's email contact list,

@thestinger
Copy link

do not motivate others to agree with you and are just unkind. Even if you truly believe you have been so wronged that such a comment would be appropriate, please refrain from making it either way. Nothing good will come of it.

I don't respect people who don't respect me and have treated me like shit. I don't expect any good to come out of any of my involvement with Rust, but I'm here anyway. If someone wants to seek out confrontation then that's what they can have.

@nrc
Copy link
Member

nrc commented Jan 27, 2015

@Valloric I think your point about underflow assertions only being on in debug builds is important. Programmers should be reasoning based on checks which are always available, and any extra checks regarded as a safety net.

re 'intuition' your argument here going in to the detail of why these examples are 'laughably smaller' is pretty much what I think programmers should be doing and is the opposite of what the 'laughably smaller' guideline suggests to me - to take one of your examples a bit further, a programmer might think "lol, no one has 2 million" emails in their inbox, but of course then the code gets used in some automatic mailer and there are more than that. Put it another way, it is easy to mistakenly believe a number is laughably smaller than the limit, when it is not.

@Valloric
Copy link

Ignoring edge cases because they aren't seen as problems in practice is why software sucks so much.

I'm not at all supporting ignoring edge cases; I am talking about risk management. It is neither reasonable nor useful to worry about a database retry commit counter overflowing 32 bits. There's like 50 different much worse things that would happen before that, from the client hanging for what must be forever to the database dropping your connection etc.

Same thing with 2 billion nodes in an HTML document. Your node counter overflowing will be the least of your worries. Etc.

to take one of your examples a bit further, a programmer might think "lol, no one has 2 million" emails in their inbox,

That's explicitly an example I chose not to use, because I can easily see that happening.

Programmers should be reasoning based on checks which are always available, and any extra checks regarded as a safety net.

Agreed 100%. I don't trust debug-only checks (although I use a ton of them and they save me lots of time). They're a nice-to-have, not something to rely on.

You've given lots of cases where isize or usize are perfect because you know that the value will be limited by the maximal size of an array:

If the data is being put in an array, then I agree. But that's not a given.

@thestinger
Copy link

If the data is being put in an array, then I agree. But that's not a given.

It's large enough to count in-memory objects that aren't zero-size, which is a very common case. It works for collection sizes and uses like reference counting in general.

@thestinger
Copy link

I'm not at all supporting ignoring edge cases; I am talking about risk management. It is neither reasonable nor useful to worry about a database retry commit counter overflowing 32 bits. There's like 50 different much worse things that would happen before that, from the client hanging for what must be forever to the database dropping your connection etc.

If you know what happens when it does overflow and the consequences are not serious, then you've already done the right thing. You could even make test cases to verify that it's a sane soft failure instead of something disastrous. Ignoring the case because it doesn't seem like it would happen in practice is designing for insecurity and unreliability. The usual design paradigm is to throw a 32-bit or 64-bit integer at the problem because it feels like it's enough, and then it turns out to be a catastrophic bug.

@Valloric
Copy link

I don't respect people who don't respect me and have treated me like shit.

I can understand that, but do you think your "scorched earth" approach is doing you or anyone else any good? I doubt it's making you happier and it certainly isn't making people come around to your position. So what's the point? How is it useful? Plenty of others (myself included) have told you over time that seeing those kinds of responses makes them less likely to agree with you even if they think you have a point.

Further, people make mistakes. Even if the core team has treated you disrespectfully in the past, they aren't doing it now. You are treating them that way currently and have been for a while now. Is there really no room for "forgive, forget, move on"? Are you just going to treat them like crap until they finally ban you? Because it's probably going to happen; I've banned people from projects I personally manage for far less. Will that do you any good? Will it make you happier?

Daniel, you have excellent technical insight but you then wrap it in such caustic delivery that you might as well not have that insight to begin with, for all the good it's doing you.

Life is about managing when others treat you like shit.

I don't expect any good to come out of any of my involvement with Rust, but I'm here anyway. If someone wants to seek out confrontation then that's what they can have.

Nobody is seeking out a confrontation here, but you certainly seem to be trying to instigate one with your behavior. The only thing brson asked you was to try to remain civil, which is an entirely reasonable request and one that shouldn't be necessary in the first place. I commend him for the attempt; it was obviously needed.

You keep mentioning how you have been treated poorly by the core team and all I have ever witnessed in the ~2 years I have been (closely) following Rust is you repeatedly treating them like crap and them taking it stoically. All I continue to see is them looking past your unkind behavior and not banning you, even though the behavior has warranted it on countless occasions (I don't think anyone can objectively disagree with that).

So just tone it down a notch. That's all.

@thestinger
Copy link

I can understand that, but do you think your "scorched earth" approach is doing you or anyone else any good? I doubt it's making you happier and it certainly isn't making people come around to your position. So what's the point? How is it useful? Plenty of others (myself included) have told you over time that seeing those kinds of responses makes them less likely to agree with you even if they think you have a point.

I'm not trying to win a popularity contest. I'm not interested in playing two-faced public relations games like them.

Further, people make mistakes. Even if the core team has treated you disrespectfully in the past, they aren't doing it now. You are treating them that way currently and have been for a while now. Is there really no room for "forgive, forget, move on"? Are you just going to treat them like crap until they finally ban you? Because it's probably going to happen; I've personally banned people from projects I managed for far less. Will that do you any good? Will it make you happier?

They certainly are still doing it now. You obviously buy into their portrayal of the situation, and that's your prerogative. I'm not sure why you're talking to me about it when you're starting off with the assumption it's a one-sided situation. I'm obviously not interested in the opinion of someone who thinks that I'm an idiot.

Daniel, you have excellent technical insight but you then wrap it in such caustic delivery that you might as well not have that insight to begin with, for all the good it's doing you.

Life is about managing when others treat you like shit.

It seems to be working out just fine.

Nobody is seeking out a confrontation here, but you certainly seem to be trying to instigate one with your behavior. The only thing brson asked you was to try to remain civil, which is an entirely reasonable request and one that shouldn't be necessary in the first place. I commend him for the attempt; it was obviously needed.

Calling me incompetent and trashing my contributions is looking for confrontation. I really don't care what someone thinks when their sole mechanism of interaction with me is to come up with ways of blocking my contributions and talking down to me, so @brson doing that yet again is not a very interesting event.

You keep mentioning how you have been treated poorly by the core team and all I have ever witnessed in the ~2 years I have been (closely) following Rust is you repeatedly treating them like crap and them taking it stoically. All I continue to see is them looking past your unkind behavior and not banning you, even though the behavior has warranted it on countless occasions (I don't think anyone can objectively disagree with that).

This only demonstrates that being passive aggressive and two-faced has been successful. It's sad that you worship authority to the point where you can't see through their nonsense, but it doesn't really bother me.

So just tone it down a notch. That's all.

You're the one escalating things to a personal level. That's toning it up a notch, not down. I'm really not sure what you hope to accomplish by patronizing me. At this point, you're just trolling.

@thestinger
Copy link

If you can't win on the facts, speak down to people from a feigned position of impartiality and imagined authority. It's really a joyous experience to participate in this community. Stating that you think an idea is stupid is off limits, but making condescending personal attacks on people is totally acceptable. Rust Logic.


In Rust, unsigned integers have underflow checking assertions built-into
the type (assuming that RFC XXX is accepted), so using a `u32` is equivalent
to the advice in Google style guide (with a larger maximum value).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can often find situations where a break statement in such a for loop means that it rarely hits zero (and thus rarely decrements below zero). Where, by "often"... I've seen such happen. Other examples include correct code that "subtracts first", e.g. x - 1 + y, where y is positive, where the ephemeral negative value isn't a bad thing. Signed types avoid such edge cases. You can consider me firmly in the camp of signed favoritism. Ideally, use signed types everywhere and never encounter unsigned types. Unfortunately the real situation is that you should use whatever matches best with the libraries and interfaces you're using.

It's ridiculous that some RFC would deign to decide this question.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reasonable for there to be general guidelines for stuff like this when there's a strong consensus, but I don't think there can be one for these issues. There is little to no evidence in favour of any specific choices, just a lot of dubious claims from every side.

@sanxiyn
Copy link
Member

sanxiyn commented Jan 28, 2015

I am strongly in favor of "decide not to issue guidance here" alternative listed in the RFC, because the RFC does not seem to be well thought out enough to have universal agreement. The drawback of that alternative is real, but in my opinion it is best addressed, at the very least, after we decide on widening, for example.

@aturon
Copy link
Member

aturon commented Feb 13, 2015

I'm going to close this RFC for the time being. We'll hash out working guidelines as part of the ongoing std integer audit, which may shed some light on the right avenue to take here.

@aturon aturon closed this Feb 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.