-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explicit wrapping integer operations instead of w types #159
Comments
instead of wrapping integer types closes #159
Not a fan of the syntax. |
Proposed alternative? |
checked { ... } or unchecked { ... } blocks. |
C# does those, works well |
There's a principle here, which is to communicate intent. Instead of @ofelas do you have an opinion on what would work better for you? a block that makes things have wraparound semantics, or the status quo of https://github.com/ofelas/zigtest/blob/master/securehash/securehash.zig It looks like you would mostly want to put entire functions inside a
The The line would be a bit awkward with wraparound blocks, but still, maybe worth it?
Actually, this expressiveness wouldn't even be possible without a way to undo the wraparound. So we'd need
On the other hand, the expression could be broken up like this: const x = (3 - (cidx & 3));
ctx.buffer[cidx >> 2] |= u32(ctx.charbuf[cidx]) << wraparound { u32( x <<% 3 });
|
I wouldn't call it wraparound { ... } but rather overflow { ... } if you prefer that. |
Also, mixing overflow and non-overflow math in the same expression seems like it wouldn't happen very often, and you could just split the expression up a bit to get cleaner code. |
agreed. I realized that after I posted, and edited ^ |
Also, the OP doesn't talk about overflowing with multiplication, which should also be possible. |
I started typing
So this is why I proposed |
I don't think your scenarios are entirely realistic. I think zig should play into people's expectations, and systems programmers expect a unsigned integer to overflow via scenario 1. |
Side note, for systems programming, not overflowing by default would make Zig unusably verbose. |
Zig's main three design goals are, in priority order:
Derived from these are more specific design principles (not all included):
The way I see it, it's slightly more important that programmers don't accidentally overflow integers than that they come from C and find that integer addition has exactly the same semantics. If verbosity is a huge problem, that seems to be an argument in favor of the status quo, where wrapping operations are a single additional character to each respective operation. |
Overflow is how numbers on computers actually work. It's not about how C works. If you did chose to pretend computers work differently than they actually do, then yes, you should use an operator. But that would be a bad design. |
To be honest I don't see preventing overflow as productive and I think it's not Zig's business to do anything about it. There are so many applications that are made so much more worse to implement in Zig by having Zig involved in this. Overflow is something you just need to learn about to write systems code. |
Zig, like C, is a portable language. This means that there is a small abstraction layer between the machine and the language. It's why clang has compiler-rt.a and gcc has libgcc.a. Zig has ways to express intent that closely match what most target hardware supports. The language maps programmer intent to hardware instructions. It's important that the available expressions of intent are convenient to use, and yet don't contain extra baggage, like unnecessarily requiring that the hardware has certain behavior that it very well might not have. Beyond portability, it limits optimization. The closer the intent the programmer is able to express to their actual intent, the more the optimizer is able to change while remaining confident that the semantics of the program are preserved. It's the solution to the undefined behavior problem that people are complaining about when they feel that the optimizer mangles their program. An extra character added to single character operations is not unusably verbose. |
All of those arguments are in favor of overflow by default. |
Writing past the bounds of a local array and clobbering your return address is also how computers actually work. That doesn't mean it's not a bug. There are some cases where you want wraparound semantics and some cases where you don't. If you're not thinking about overflow, then which one should be default? The "actual" behavior, or the safe behavior? One of Zig's design principles is to have safety when it doesn't compromise the more important objectives.
You must be talking about abstractions that have a runtime cost, like Java's ArrayOutOfBoundsException and NullPointerException. Even when programmers are careful to avoid ever causing those exceptions, the checks for them still slow you down. When Zig compiles for release mode, there are no runtime costs for non-wraparound arithmetic. Zig lets the undefined behavior on overflow be whatever the hardware says it is. The only time Zig would introduce a runtime cost is to guarantee wraparound semantics when the hardware doesn't support it, and I believe that in that scenario, you wouldn't really be able to do any better than a software wraparound implementation anyway. The assumption here is that you've got to know if you want wraparound or not on each operation that can cause it. If you want it, it's guaranteed to work. If you don't want it, then in debug mode there will be assertions to make sure it's not happening on accident. So the two different ways to handle overflow are:
We've got two different operators, one of them is the normal looking one, and one of them is special. Which behavior should be normal, and which should be special? I'd like to argue in favor of status quo in two ways: a) wraparound is more rarely required, so that should be the special operator. b) if you don't think about special operators or about overflow, you should get the behavior that tells you if you should have been thinking about overflow. |
Not having integer overflow doesn't save you from this.
I can't think of any time where I wouldn't want overflow. (Unsigned) integer overflow isn't unsafe. By the time you would hit overflow you're likely already far beyond the end of your buffer.
You know among the architectures that don't have non-overflowing arithmetic instructions are x86 and amd64, right? You'll have to implement the protection in software, which will be non-trivial. If it goes away in release mode it might as well not be there, because the software that gets attacked is the software in release mode. You only help if you can determine for certain that the integers are not going to overflow at compile time. And like I said, unsigned integer overflow has zip to do with buffer overflows, which is what you claim to be protecting against. |
Well I apologize for not communicating clearly. I think you understood nothing that I was trying to explain. I'm sorry. The reason I mentioned buffer overflows was to give an example of something that's easy for hardware to do, but that's certainly a bug if it ever happens. Zig has buffer overflow checks in debug mode as well as integer overflow checks in debug mode. Those are separate features; I'm not claiming they interact.
Understood. Debug mode is meant to have runtime costs for extra checks.
If you turn off the checks in release mode, then you can't rely on them to catch security bugs. That shouldn't surprise anyone. If you want safety checks at runtime that try to stop attacks, then there's going to be a runtime cost. It's up to you to decide if you want the checks on or off, and if you don't like the ones built into the language, you can write your own. Runtime safety has runtime costs.
How about this: {var i = usize(0); while (i < buf.len; i += 1) {
if (buf[i] == '"' && buf[i - 1] != '\\') {
return i;
}
}}
return -1; This code which is kinda like searching for the end of a quoted string literal checks for a |
You're telling me you want to increase the cost of all arithmetic 5-10x to get this "safety"?
I don't mind checking for out of bounds array access. Totally on board with that. But that integer should overflow and the array should do the bounds check. |
Precisely. And programmers can turn off this default safety for any given block of code with |
I agree that overflows should not be allowed by default and checked for: it prevents a range of vulnerabilities that you might not think of checking for. A 5-10x overhead (assuming it's actually that much) isn't a problem in my opinion. You can always turn it off in performance-critical code or when you discover that it's causing your program to run slow. As for the decision between a keyword and a special operator syntax, I favor the keyword (my favorite being |
What kind of purported C replacement is half as fast or worse than the equivalent C program? This doesn't even fix your example. Here's the example, slightly modified:
Now this is a buffer overflow without an integer overflow. You need to be checking array bounds, not integer overflows, to address this issue. |
It's a fair point that my example should really be a case for array bounds checking rather than arithmetic overflow checking. Here are a few examples of real-life overflow/underflow bugs from video games:
These are just a few examples of bugs that can happen with silent arithmetic wraparound. |
Now we're no longer talking about security vulnerabilities, we're talking about niche low-impact bugs. Niche, low-impact bugs happen all the time for lots of reasons, and doesn't justify this design IMO. We're no longer talking about safety or security concerns. |
Also note that these are all old bugs, mostly written in assembly on machines with small word sizes. |
@SirCmpwn can you give examples of situations where you want wraparound on overflow? Here are some situations that I can think of:
And these are all pretty closely related, where the math is trying to behave in an arbitrary and unpredictable way. Possibly also include compression algorithms? I'm not too familiar with how those work though. I understand that you would want unchecked overflow in situations where you want performance, but I mean situations where you actually want the numbers to wraparound in a realistic usecase. In other words, when would you use the |
There are more I'm sure that don't come to mind at the moment. |
What do you plan to do when an overflow does happen in release mode, by the way? Crash? DoS vulnerability. Change the arithmetic to avoid overflow? Could cause countless other unexpected behaviors. |
Also, regarding |
Thanks for your input, especially the justification ("clearer and less Perl-y"). Can you explain your second proposal more with a code example? |
I see you asked my opinion, had to read through this, 8). I tried to summarize my understanding in the snippet below (lising the alternatives somewhere in the middle). More important for me would be that the various mechanisms/keywords/syntax are well documented and reasonably easy to understand (optionally compared to other languages) and undefined behaviour avoided if possible. Hope this makes sense...
|
Alright everyone thank you for your input. I have concluded that status quo is the best solution. Reminder that choosing between wrapping operations and non-overflow operations is about semantics; use the correct operation for the behavior you're trying to accomplish. Meanwhile |
Still unanswered. All of my other argument still hold up, but I suppose you don't care. Would still like to know exactly what behavior you're going to use. |
|
So you don't actually get any practical benefit from this design whatsoever because you (1) don't have overflow checks at runtime, so it (2) only shows up in debug mode where (3) if the user is testing for overflow then they already have the presence of mind to handle it if the language didn't have overflow checks which (1) it doesn't anyway in release mode. |
In case this issue gets revisited later, I'll just add that I think the decision made here was correct. That the machine is overflowing under the hood is irrelevant. That's not because we want overflowing/wrapping when doing arithmetic. It's because that's what gives you a good, flexible and cheap instruction sets. The instruction set is not meant to dictate what kind of operations you do. At least with RISC-like architectures. They're pieces from which you construct the desired behaviour, which often takes more than a single instruction. Disabling bounds checking is clearly an optimization (a very common one though). This is related to the act of re-using libraries: maybe I want to make a super safe application and keep bounds checking on everywhere. What then if the library writer didn't wrap all his arithmetic with whatever incantation is needed to get safe behavior (or vice-versa if safety is default). |
Sorry for the necro but I think a lot of this conversation missed an important note about why overflow being UB is useful: Overflow is UB in C and C++ in large part because it allows the optimizer to apply mathematical identities to the code. Zig inherits this behavior. This means that if a non-modular add overflows, it is ACTUAL UB and the optimizer may have transformed your program into something that doesn't work correctly. It is this property that makes the safety check necessary in debug modes, and it is these optimizations (in addition to many others) that make Zig competitive with C and C++ performance in release modes. Here's a specific example where this matters: var index = (x + 1) * 5;
var value = array[index - 5]; This is the sort of code that the compiler sees after inlining functions. // distribute multiplication
// only works because x+1 may not overflow
var index = x*5 + 5;
// inline calculation
var value = array[x*5 + 5 - 5];
// cancel terms
var value = array[x*5];
// remove multiplication by constant
var value = array[(x<<2) + x]; This optimization is only possible because overflow is UB. If overflow were defined, this code would do something totally incorrect in the case of overflow, and the program's execution speed would suffer in all cases to make sure it did that incorrect thing. |
@SpexGuy your explanation is correct but somewhat easy to be misleading in this context. var index :u32 = (x + 1)* 5 can be transformed to var index : u32 = 5*x + 5 if you use undefined behaviour because using undefined behaviour we can keep pretending (by ignoring overflow) that u32 is a mathematical integer in which the distributive law holds . But what does overflowing mean? It means that we do our computation in modular arithmetic mod @pow(2,32) aka 2 complement arithmetic (ie. using +%, -% and *% and the native instructions of the processor) where the distributive law holds just as well. Modular arithmetic is mathematically perfectly sane and satisfies all the usual laws of arithmetic (associativity, distributivity, commutativity), i.e. 5 *% x +% 5 == 5 *%(x +% 1) Is a strict identity with no undefined behaviour in sight. What is no longer true is that inequalities keep working as expected, mainly because since modular arithmetic is arithmetic on a circle, for every two distinct modular integer one comes before the other. What is not (always) true is that the non negative mathematical integer < @pow(2,32) whose lowest 32 bits are those of (5 *% x +% 5) == The pretence of doing modular arithmetic and pretending it is "true" arithmetic in the integers works only so well. Likewise if the non negative mathematical integer < @pow(2,32) whose lowest 32 bits are those of a < then it is obviously not true (in general) that the non negative mathematical integer < @pow(2,32) whose lowest 32 bits are those of (a %+ c) < In #7512 I proposed to define modular integers as a primitive datatype. It is perfectly fine to insist on writing +%, -% and *% for their arithmetic operations so that it is visible that modular arithmetic is knowingly used. It has perfectly sane semantics as long as you don't expect to do inequalities, or if you do use inequalities be forced to take care. |
Ah, thanks for pointing this out, my example was bad. A better example is comparison. if (x > 4) {
if (x + 1 > 4) {
}
} In this example, with mathematical ints, the optimizer can remove the inner runtime check. With wrapping arithmetic it cannot. |
@SpexGuy I don't want to be annoying, but it is precisely because so many people seem to think that the modular operations +%, -%, * % are "flawed" but hardware optimised, instead of better behaved but different since you lose a strict ordering compatible with arithmetic* that I think a modular data type (with +%, -% and * % as the only arithmetic operators) both avoids subtle bugs (see "practical advantages in #7512) and is conceptually useful. *well lose, you lose it when compared to proper mathematical integers. Finite size types like u32 and i32 just reinstate compatibility with arithmetic with undefined behaviour (and of course hopefully, making sure that overflow does not occur). |
Thanks @RogierBrussee and @SpexGuy for the examples. This is finally slowly sinking in for me, after too many decades of programming C. I think it is worth reiterating: On almost all modern processors, you are doing modular arithmetic whether you want it or not. In other words, arithmetic integers are unsafe. They are a lie. They may allow some optimizations, but (at least in my experience) very few coders actually use them correctly. This kind of thing is partly why people like John Regehr and Dan Bernstein starting talking about having a sane dialect of C supported by the main compilers. Zig already has polymorphic arithmetic operators, just do as in @RogierBrussee's proposal, #7512 and have different integer types.
Use of types for this is safer IMHO. You can accidentally type I should be able to program in Zig without ever having UB, if I so desire. With Say what you mean. Safe and clear behavior should not be opt-in. You should opt out of it for speed in the few cases you actually need it. |
(Deleted, added this to the wrong issue!) |
@kyle-github : much as I agree that modular integers are useful and should be part of the language, the "ordinary" integers I I an u that pretend that no wrapping occurs are easier to argue about (which avoids bugs), and at least in debugging mode you get warned if wrapping does occur. Of course it is even better if one can makes sure wrapping cannot occur (and it would be nice if the compiler could tell you that if it figures that out). |
@RogierBrussee, not sure what you mean about It is really hard to figure out if wrapping/overflow/underflow is going to occur without either runtime checks or magic. John Regehr's team at Utah State has been working on things like this for years. He's got some good articles about it on his web site with specific descriptions of what happens in LLVM/clang. So there is some magic, but it does not cover a lot of cases. |
@kyle-github thanks for the refs. it is easier to argue about the ordinary mathematical integers than about modular integers if for no other reason than familiarity, and e.g. because they have a definite signedness. Just because "wrapping" is well defined behaviour does not mean it is typically expected behaviour. E.g. I seriously doubt most programmers use unsigned int in C or C++ fully expecting it to wrap for every addition they do, and taking that into account in all their inequalities. I would expect people to be a lot more careful if they opt in to write n : m32; if ( @lts(n +% 1, MAX_INT) ) though. |
Rip out the -w variants of integer types. We are going to follow Swift's example and have explicit wrapping operations:
+%
- wrapping integer addition-%
- wrapping integer subtraction (or negation)*%
- wrapping integer multiplication<<%
- wrapping left shift+%=
-%=
*%=
<<%=
We'll also follow this pattern for volatile reads and writes, likely with builtin function(s) rather than operators.
The text was updated successfully, but these errors were encountered: