-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer Guidelines RFC #741
Conversation
The goal of this RFC is to help people decide what integer types to use when they need | ||
to make a decision for a new API. | ||
|
||
It builds on [https://github.com/rust-lang/rfcs/pull/560](the integer overflow RFC), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ the URL and link text are reversed in this markup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Bit sizes are not covered. (rust-lang/rust#20211 (comment)) |
addressable memory. | ||
* Using of unsigned integers is traditionally thought to be error-prone, and | ||
style guides often suggest avoiding them. That said, Rust's unsigned integers | ||
have built-in underflow assertions, which changes the analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to forget that that RFC is still open. Or maybe that's just distraction.
Big integers mean that all your structs become non- |
be appropriate, but use of `usize` in general can introduce portability | ||
hazards when the use-case is not proportional to the amount of | ||
addressable memory. | ||
* Using of unsigned integers is traditionally thought to be error-prone, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat cynical of this constraint - I have never come across this school of thought outside the Google style guide linked above. Sure they have invariants you need to be careful not to violate, but no more so than signed integers (they both overflow and underflow, just at different places). Using an unsigned int inappropriately is error prone, but the same goes for signed ints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More aspects about unsigned integers:
- Unsigned integers are fitting for bit patterns, hash codes, wraparound mod 2^N, and values that aren't really numbers.
- Since Rust requires explicit conversions from signed to unsigned integers, some C/C++ cautionary cases don't apply. [My mistake in not realizing this sooner.] E.g. it's tempting to make a C function take an uint32_t so it doesn't have to deal with negative inputs but it's easy in C to accidentally pass in a negative integer. Now the function can't detect that bad input unless it has a suitable upper bound.
[BTW, "polymorphic array indexing" would make it OK to index a Rust array with a signed integer. Rustc could implement a signed bounds check with a single unsigned comparison (atusize
or larger width) since the upper bound cannot exceedisize::MAX
.] - Hopefully rustc will always complain about
i >= 0
for unsignedi
. - There are plenty of cases to be careful about exceeding the unsigned conceptual domain. Making a value unsigned looks better than it works out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's easy for values (esp. intermediate values) to go negative.
- Be careful. Don't code like my brother.
@arielb1: Incorrect code doesn't qualify as a viable option in the first place. |
If you are using integers for counting, you essentially can't overflow an |
The fact that the overflows won't occur in the common case but rather in edge cases with pathological inputs doesn't mean it is correct. |
value. | ||
|
||
For example, it is generally safe to cast an `i32` to an `i64`. It is also | ||
generally safe to case a `usize` to a `u64`, since it will never be bigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not expected to exceed u64 in a foreseeable future, but certainly will… at some time.
Actually, I meant more like s/anyone/a typical Python or Go developer/. |
Also, in systems programming, 99% of integers are used for counting things, in which case |
No, just you. The default state is non- |
If we introduce a decent On the "intuition" front, my integer-sizing method is:
|
That's a perfectly sane system as long as by count you mean an integer that you only ever increment (as in |
@arielb1: I'm not recommending anything different than you just stated. I'm simply stating that when you don't have a clear understanding of the bounds, you need to fix that by implementing clear bounds or use a big integer. In many cases, bailing out at the 64-bit boundaries isn't an option. This proposal pushes for using intuition to pick the types, which is a poor substitute for correctness. |
I guess it won't work for a count of atoms ;-), and it won't work if you can receive "counts" of things from untrusted servers, but (of course) incrementing it by more than 1 at a time is OK, as long as you did "access" that many things (e.g. a count of bytes received over some connection). |
@arielb1: Sure, it's reasonable as long as the work is closely tied to the count. It can rapidly become a real problem as the ratio of work (in some measure like a CPU instruction) to magnitude of the increments drops below 1. |
@thestinger Please try to keep your arguments in good faith. Accusing others of 'vague nonsense', writing 'brain-dead' guidelines, and not being 'sane' are overly vitriolic and don't help your argument. |
@brson: I'm not interested in your self-righteous bullshit. People who endlessly spread FUD and play nasty political games don't deserve any respect. You're not a neutral party and I don't consider any of you as having any credibility, so you're wasting your time. |
Being passive aggressive and manipulative in ways that seriously damage the project is the modus operandi of the core developers. Then you dare come here and act as if you're superior because clearly nasty politics is so much better than honest speech. |
The guidelines make sense to me, but I am concerned about the following:
From what I understand, those assertions will be in debug-only builds so I wouldn't rely on it. In my experience, it makes the most sense to treat unsigned integers as just a bit-sequence. While I've never been bit by an integer overflow, I've been bit by underflow hundreds of times, always in esoteric ways. The general rule of thumb I (and others I work with) use is "use a signed integer unless you have a really good reason to use an unsigned one." The "mental default" is a signed 32 bit. WRT comments here regarding "intuition" being the wrong approach, that's a strawman. There's tons of use-cases where the programmer knows their data will fit in 32 bits. Examples:
All of these numbers are "laughably smaller" than the ~2 billion that fit in a 32 bit signed int. For all of them, you could say "I imagine a ridiculous situation where a 32bit int isn't enough" but it would be absurd. A document with 2 billion HTML nodes is going to crash your browser no matter what your counter is. 2 billion outstanding RPCs will crash your server (and will probably consume all of your RAM). 2 billion retry attempts to commit data and you already have bigger problems. A user trying to create 2 billion TPS reports at once in a webapp can't achieve this without crashing their browser. Nobody has 2 billion emails in their contact list. In other words, the corner cases aren't reasonable when looked at holistically. If someone writes a script to add more than 2 billion emails to their contact list, other parts of your system should catch such malicious behavior. If you don't have that, a 64bit int won't help you. You'll have bigger problems much sooner than the overflow. The other thing I'd also change is I'd add a stronger admonishment against the usage of isize/usize. They should be used in extremely rare circumstances and I'm worried they'll be more widely used than that, leading to code portability problems. |
@thestinger Comments like the following:
do not motivate others to agree with you and are just unkind. Even if you truly believe you have been so wronged that such a comment would be appropriate, please refrain from making it either way. Nothing good will come of it. |
Ignoring edge cases because they aren't seen as problems in practice is why software sucks so much.
Explicitly enforcing bounds that make overflow impossible was exactly what I recommended. If you aren't enforcing the bounds and there aren't clear limits that are within the range of the integer, then the code isn't correct and needs to be fixed.
Data limited in size by the available address space is not rare. |
You've given lots of examples where isize or usize are perfect because you know that the value will be limited by the maximal size of an array:
|
I don't respect people who don't respect me and have treated me like shit. I don't expect any good to come out of any of my involvement with Rust, but I'm here anyway. If someone wants to seek out confrontation then that's what they can have. |
@Valloric I think your point about underflow assertions only being on in debug builds is important. Programmers should be reasoning based on checks which are always available, and any extra checks regarded as a safety net. re 'intuition' your argument here going in to the detail of why these examples are 'laughably smaller' is pretty much what I think programmers should be doing and is the opposite of what the 'laughably smaller' guideline suggests to me - to take one of your examples a bit further, a programmer might think "lol, no one has 2 million" emails in their inbox, but of course then the code gets used in some automatic mailer and there are more than that. Put it another way, it is easy to mistakenly believe a number is laughably smaller than the limit, when it is not. |
I'm not at all supporting ignoring edge cases; I am talking about risk management. It is neither reasonable nor useful to worry about a database retry commit counter overflowing 32 bits. There's like 50 different much worse things that would happen before that, from the client hanging for what must be forever to the database dropping your connection etc. Same thing with 2 billion nodes in an HTML document. Your node counter overflowing will be the least of your worries. Etc.
That's explicitly an example I chose not to use, because I can easily see that happening.
Agreed 100%. I don't trust debug-only checks (although I use a ton of them and they save me lots of time). They're a nice-to-have, not something to rely on.
If the data is being put in an array, then I agree. But that's not a given. |
It's large enough to count in-memory objects that aren't zero-size, which is a very common case. It works for collection sizes and uses like reference counting in general. |
If you know what happens when it does overflow and the consequences are not serious, then you've already done the right thing. You could even make test cases to verify that it's a sane soft failure instead of something disastrous. Ignoring the case because it doesn't seem like it would happen in practice is designing for insecurity and unreliability. The usual design paradigm is to throw a 32-bit or 64-bit integer at the problem because it feels like it's enough, and then it turns out to be a catastrophic bug. |
I can understand that, but do you think your "scorched earth" approach is doing you or anyone else any good? I doubt it's making you happier and it certainly isn't making people come around to your position. So what's the point? How is it useful? Plenty of others (myself included) have told you over time that seeing those kinds of responses makes them less likely to agree with you even if they think you have a point. Further, people make mistakes. Even if the core team has treated you disrespectfully in the past, they aren't doing it now. You are treating them that way currently and have been for a while now. Is there really no room for "forgive, forget, move on"? Are you just going to treat them like crap until they finally ban you? Because it's probably going to happen; I've banned people from projects I personally manage for far less. Will that do you any good? Will it make you happier? Daniel, you have excellent technical insight but you then wrap it in such caustic delivery that you might as well not have that insight to begin with, for all the good it's doing you. Life is about managing when others treat you like shit.
Nobody is seeking out a confrontation here, but you certainly seem to be trying to instigate one with your behavior. The only thing brson asked you was to try to remain civil, which is an entirely reasonable request and one that shouldn't be necessary in the first place. I commend him for the attempt; it was obviously needed. You keep mentioning how you have been treated poorly by the core team and all I have ever witnessed in the ~2 years I have been (closely) following Rust is you repeatedly treating them like crap and them taking it stoically. All I continue to see is them looking past your unkind behavior and not banning you, even though the behavior has warranted it on countless occasions (I don't think anyone can objectively disagree with that). So just tone it down a notch. That's all. |
I'm not trying to win a popularity contest. I'm not interested in playing two-faced public relations games like them.
They certainly are still doing it now. You obviously buy into their portrayal of the situation, and that's your prerogative. I'm not sure why you're talking to me about it when you're starting off with the assumption it's a one-sided situation. I'm obviously not interested in the opinion of someone who thinks that I'm an idiot.
It seems to be working out just fine.
Calling me incompetent and trashing my contributions is looking for confrontation. I really don't care what someone thinks when their sole mechanism of interaction with me is to come up with ways of blocking my contributions and talking down to me, so @brson doing that yet again is not a very interesting event.
This only demonstrates that being passive aggressive and two-faced has been successful. It's sad that you worship authority to the point where you can't see through their nonsense, but it doesn't really bother me.
You're the one escalating things to a personal level. That's toning it up a notch, not down. I'm really not sure what you hope to accomplish by patronizing me. At this point, you're just trolling. |
If you can't win on the facts, speak down to people from a feigned position of impartiality and imagined authority. It's really a joyous experience to participate in this community. Stating that you think an idea is stupid is off limits, but making condescending personal attacks on people is totally acceptable. Rust Logic. |
|
||
In Rust, unsigned integers have underflow checking assertions built-into | ||
the type (assuming that RFC XXX is accepted), so using a `u32` is equivalent | ||
to the advice in Google style guide (with a larger maximum value). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can often find situations where a break statement in such a for loop means that it rarely hits zero (and thus rarely decrements below zero). Where, by "often"... I've seen such happen. Other examples include correct code that "subtracts first", e.g. x - 1 + y, where y is positive, where the ephemeral negative value isn't a bad thing. Signed types avoid such edge cases. You can consider me firmly in the camp of signed favoritism. Ideally, use signed types everywhere and never encounter unsigned types. Unfortunately the real situation is that you should use whatever matches best with the libraries and interfaces you're using.
It's ridiculous that some RFC would deign to decide this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's reasonable for there to be general guidelines for stuff like this when there's a strong consensus, but I don't think there can be one for these issues. There is little to no evidence in favour of any specific choices, just a lot of dubious claims from every side.
I am strongly in favor of "decide not to issue guidance here" alternative listed in the RFC, because the RFC does not seem to be well thought out enough to have universal agreement. The drawback of that alternative is real, but in my opinion it is best addressed, at the very least, after we decide on widening, for example. |
I'm going to close this RFC for the time being. We'll hash out working guidelines as part of the ongoing |
Rendered