-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect transformation around icmp: unclear semantics of "lifetime" intrinsics leads to miscompilation #45725
Comments
Since %p and %q are different allocas, they will have two different values, including different integer representations (both alloca are 4 bytes). InstCombine is making the deduction %p = alloca i32, align 4 to %c2 = false because of this. It does not change the compare function because InstCombine is not interprocedural. The basic block A is dead code, it will never be executed. Dead code can be arbitrarily wrong. |
The alloca have disjoint lifetimes. That means LLVM may actually allocate them to the same stack slot. |
I think the issue you're talking about is something like the following? void z(long*);
int a() {
long raddr1, raddr2;
{
long r; z(&r); raddr1 = (long)&r;
}
{
long r; z(&r); raddr2 = (long)&r;
}
return raddr1 == raddr2;
} clangs allocate the two versions of "r" at the same address, but folds the compare to "false". This isn't really an integer icmp vs. pointer icmp thing; it's more of a "our representation of local variable lifetimes is wrong" thing. Interestingly, both gcc and msvc do the same thing. |
Correct, I missed that. However, I wonder why/whether it is legal to shorten the lifetime even though the pointers are still used for the compare call. I case of Eli's example, I think C++20 justifies it in http://eel.is/c++draft/basic.stc#4:
Note that all compilers optimize the function even without the ptr2int cast: |
If you're talking about C/C++ variables, the lifetime is defined by the standard. In my example, the two variables aren't live at the same time, so the addresses may or may not be numerically equal. (The problem is just the contradiction.) If two variables are live at the same time, the addresses must not be numerically equal. If you're talking about the semantics of LLVM, according to the LangRef, two allocas can't have the same address. The rules for alloca specify that the memory is not freed until the function returns. And the rules for llvm.lifetime.start/end don't say anything about the address, only that accessing the memory is undefined outside the lifetime. Therefore, it's illegal for stack coloring to allocate two variables whose address escapes at the same address. Obviously stack coloring does in fact allocate variables at the same address. And we can't fix it to match LangRef without causing a significant regression for C++ code.
raddr1 and raddr2 are pointers, not integers; the rule in question doesn't apply. |
=
Somehow got this backwards; meant to say "integers, not pointers". |
Not sure if "you" is me here, but I agree that is a similar example -- z could observe the two addresses and so we have a contradiction, like in my example.
Agreed: whether or not the addresses are equal is non-deterministic. That's why I said my example has two possible executions. It's just, "f(true); f(false);" is not one of them.
The rule in question is a C++ rule, so it applies to Eli's example, but AFAIK LLVM has no such rule and thus it does not apply to my example. Is that not right?
Oh, that is surprising. Why would LLVM explicitly say that about alloca and not about malloc? And what about popping and then pushing the same stack frame, so that a stack slot ends up at the same physical address -- that would also be two allocas with the same address (just different in provenance, similar to malloc reusing memory), why is that not a problem? I thought the entire point of these lifetime annotations was to let LLVM re-use the same stack slots for different variables, and this rule seems to make that unnecessarily hard.
I wonder if it would be possible to adjust lifetime.end semantics to say that this may (or may not) free the memory for that alloca, to permit reusing addresses. |
malloc'ed memory is allocated until you call free(), according to the C standard. This is a similar thing, except that it's freed implicitly when the function returns. To be clear, I only mean allocas which are live at the same time; obviously later function calls can reuse the same stack. (For dynamic allocas, there's also stacksave/stackrestore; a stackrestore also frees memory. But those are pretty rare in most code.)
I think it's implied by "allocating" memory; two allocations can't overlap. This roughly matches the semantics of the languages we're trying to model.
Maybe. It would be pretty inconvenient to reason about, since the uses of the memory aren't tied to the actual allocation. And it would be difficult to describe the rules in an intuitive way. But I can't think of any showstoppers. |
These are not live at the same time though? I do not recall the exact wording, but as far as I know C does not require local variables to be live for the entire function; when they are declared in a sub-scope like in your example, they can be deallocated earlier. So given that LLVM encodes these source-level scopes into its IR using lifetime.start/end, it seems odd to say that allocas are "freed implicitly when the function returns". It seems more useful, and closer to C, to instead say they are "freed implicitly when their lifetime ends". This also reflects what stack coloring is doing.
In my mental model, an "alloca" is only really allocated once it becomes live (lifetime.start), and it gets deallocated when its live range ends. LLVM does not have "scopes"/"blocks" like C or many other source languages do, so what should those languages do when they want stack variables to be allocated only when their scope begins, and deallocated when their scope ends (which I think is common)? They add lifetime.start/end. (There is a special case for alloca that don't have lifetime intrinsics associated with those; those are indeed live until the function returns. This does create some nasty corner cases, which has been the subject of discussion in Rust for quite a while, but progress has been hard as it is not very clear what exactly LLVM's requirements are. See e.g. rust-lang/rust#42371.) |
(I got a sendmail error from the web frontend, re-triggering notifications.) |
(More email errors.) |
(There have been posts but no notifications.) |
FYI: In Alive2, the semantics of alloca depends on whether it has lifetime intrisic uses at parsing time. :) |
For more details which might be related with the github link:
|
That is funny, this is exactly what Miri also does.^^
Interesting, Miri does not even reserve a block ID. Each new lifetime.start generates a fresh block ID. This models that we want pointers to be invalidated when the lifetime of a local ends, even if the same local becomes live again later. (Compilation from that to Alive2's semantics seems fine, it just restricts allocator choice.) |
Hmm, interesting. Alive2 actually treats comparsions differently if there are lifetime intrinsics: https://alive2.llvm.org/ce/z/xWcu-2 . |
In theory it doesn't (modulo bugs, of course). In this case, since the 2nd alloca is only allocated after the first one is deallocated, the comparison may return either true or false non-deterministically (depending on where the 2nd alloca ends up). If you check, Alive2 is ok with optimizing that function to either return true or false. If there are no lifetime intrinsics, then it's guaranteed that the two allocas must be in disjoint memory, hence the pointer comparison can only be folded to false. |
A Rust user found another case where comparison with icmp is non-deterministic: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=8affd2a65b02a35e16dbab4682cb8886. What this does is it take pointers to two functions that are marked as "unnamed", casts those pointers to integers, and then compares them twice -- once after passing them through a "black box" noinline function, and once immediately. The compiler seemingly optimized the latter comparison to "false", but later makes the two function's addresses identical. If we describe the semantics of this program at the LLVM IR level, then I think we have to either accept that "icmp" can produce different results for the same input (it is non-deterministic), or the program has to have UB. But it's not UB to compare "unnamed function" pointers, right? My expectation was that unnamed function can end up having the same or different addresses depending on optimizer decisions etc, but whether it is one or the other has to be consistent throughout a single execution of the program. |
You're right, whether two addresses are the same or different has to be chosen consistently. This is also why the merging transform on unnamed_addr is "one-way": you can merge two globals into one global, but you can't split one global into two. (I remember discussing this at some point for constants, but I can't seem to find it.) This doesn't really come up in C: the only unnamed_addr globals with a user-accessible address are strings, and we fold identical strings in the frontend. This is only loosely related to the alloca issue; please file a separate bug. |
All right, opened llvm/llvm-bugzilla-archive#47090 . |
For Eli F example:
At the "return raddr1 == raddr2;" isn't both raddr1 and raddr2 undefined because r is out of scope. So, comparison of two "undefined" is probably undefined also. |
@James I don't think they are poison values, which would mean they are initially non-poison, but become poison when leaving the scop. |
mentioned in issue llvm/llvm-bugzilla-archive#52570 |
FWIW this is now tracked as a soundness issue on the Rust side, since the miscompilations resulting from this can break language guarantees, even in safe code: rust-lang/rust#107975. Is there any agreed-upon plan for how to resolve these inconsistencies in LLVM? |
Between nikic throwing in the towel, and the utter silence in this thread for the past two-and-a-half years, I'll hazard a guess that the answer is “no”. |
For local variable allocations, we want to ensure:
The problems with the current "lifetime" intrinsics:
I don't think there's a good way to retrofit reasonable semantics onto llvm.lifetime.start and llvm.lifetime.end; there's a significant mismatch between the way "alloca" is defined, and how we really want to allocate memory on the stack. I think the answer here is to introduce new intrinsics. There are a couple of ways you could go about it. But probably, it's conceptually simplest to just have intrinsics that have semantics similar to "malloc" and "free", with the restriction that the allocations have to be properly nested. I'm not going to try to formally define what exactly "properly nested" means, but it's comparable to llvm.call.preallocated.setup. Each allocation uniquely happens at one point in the function, and you have to free allocations in the reverse order they were allocated. Lowering from that form to a static stack allocation is then a straightforward walk through the function: you "push" an entry on the stack when you see an alloc, and "pop" it when you see a free. (Haven't thought through how this interacts with exception handling; probably landing pads need some sort of annotation.) Once we have that, it should be straightforward to implement the correct rule: "icmp eq" of with two allocations with overlapping lifetime folds to false, and "icmp eq" with two allocations with non-overlapping lifetimes doesn't fold. Most optimizations wouldn't need to be aware the intrinsics are different than just "malloc" and "free"; probably optimizations that currently check for "token" need to be aware. This representation also has some other minor benefits: alias analysis is more accurate because passes like the inliner don't introduce multiple variables using the same allocation, and we aren't so dependent on "coloring" to get stack usage proportional to what the user intuitively expects. (I don't expect I'll have time to try to implement this.) |
Here's another problem with the "lifetime" intrinsics: #51838. |
I believe this issue is responsible for the following end-to-end miscompilation for C11: https://godbolt.org/z/vfTE1Kv4c #include <stdint.h>
#if defined(__GNUC__)
#define OPTIMIZATION_BARRIER(x) __asm__("" : "+r"(x))
#elif defined(_M_IX86)
#define OPTIMIZATION_BARRIER(x) __asm add x, 0
#else
#define OPTIMIZATION_BARRIER(x)
#endif
int main() {
uintptr_t x, y;
{ int a[2]; x = (uintptr_t)a; }
{ int b[2]; y = (uintptr_t)b; }
uintptr_t equal = x == y; // 0
OPTIMIZATION_BARRIER(equal);
uintptr_t diff = x-y; // 0
return equal == diff; // returns 1 with -O
} Longer version without inline assembly, which exhibits the issue even if all non-portable constructs are removed:https://godbolt.org/z/a4Yz6Tdn7#include <stdint.h>
struct pair { uintptr_t x; uintptr_t y; };
static inline struct pair equal_integers_from_different_allocations(void) {
struct pair p = {0, 0};
{ int a[2] = {0, 0}; p.x = (uintptr_t)a; }
{ int b[2] = {0, 0}; p.y = (uintptr_t)b; }
return p;
}
int nonequality(void) {
struct pair p = equal_integers_from_different_allocations();
return p.x != p.y; // 1
}
int difference(void) {
struct pair p = equal_integers_from_different_allocations();
return p.x - p.y; // 0
}
#if defined(__clang__)
__attribute__((optnone))
#elif defined(__GNUC__)
__attribute__((noipa))
#elif defined(_MSC_VER)
__declspec(noinline)
#endif
int launch_missiles() { return 1; }
static inline int equal(uintptr_t x, uintptr_t y) {
return x-y == y-x && x-y <= -(uintptr_t)1/2; // 1 iff x == y, but weird
}
int main(void) {
struct pair p = equal_integers_from_different_allocations();
if (p.x != p.y && equal(p.x, p.y)) { return launch_missiles(); }
// "For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 [...]"
// "Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations."
// https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
} I am moderately interested in getting this issue fixed. I want to be able to honestly claim that my tooling for formal proofs about C programs is sound with respect to how real-world compilers execute these programs, and allocation patterns breaking integer arithmetic is beyond what I can handle. (Note that the pointer-provenance discuss does not change the desired behavior here, both PVI and PNVI-* have numeric results of integer operations be independent of provenance.) |
See also rust-lang/rust#107975 (comment). Issue #54002 might be related as well. As an alternative proposal, it would be sound for |
That is illegal in C, but AFAIK it is not documented as illegal in LLVM. It is legal in Rust, so if LLVM performs such folds without documenting its assumption that would be a major problem. The code you link has a comment saying "Different non-empty allocations that exist at the same time have different addresses" (emphasis mine). It would assume this means that the transformation only applies when the lifetimes of the allocations overlap? |
Yes, but this is almost fine -- (I am saying "almost" because removing |
That's what the comment says, but I don't see where that condition is actually checked; and the comment llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp Lines 2672 to 2673 in 0db5d8e
Thank you for the clarifications about llvm and rust semantics. |
Yeah for alloca LLVM is internally inconsistent, as discussed above:
This can lead to arbitrary nonsense even in safe Rust code (since as you say, it's just a bunch of integer arithmetic), so I am not surprised that C code verified/generated by your tool is also affected. Your post mentioned "malloc" and it would be news to me that those are also affected, but who knows. (Now it has been edited so I guess that was a mistake.) |
Thank you for the summary, it makes sense now. I agree that the reasoning based on which I suspected malloc having the same issue was wrong (I made the edit even clearer now). I also confirmed that a direct analog to my example does not break with malloc. Based on last comment I tried compiling with Do you think the status of this issue is clear enough to say something like "llvm developers accept that this behavior is a compiler bug", or might we be in for another rabbithole to the tune of strict aliasing and pointer provenance? |
Nobody suggested it's not a bug, and this comment is fairly clear:
I'll take the liberty to remove the question mark from the title. It's just a hard bug to fix, from what I understand. Also see this comment. |
Extended Description
(Bugzilla made me pick a component, so I made a wild guess. I do not have the slightest idea which of these internal LLVM components is responsible here. Would be nice if I could select "unknown"...)
In #33896 #c99, Eli Friedman wrote
I think Juneyoung found a transformation that is indeed not consistent with this, which I adjusted as follows (https://godbolt.org/z/XYQ7Vx):
The function "src" compares "p" and "q" twice, once inside "compare". It calls "f" twice with the two results of the comparison. The first comparison is passed via indirect information flow, i.e., the equivalent of "if p == q { f(true) } else { f(false) }" in Rust. "p" and "q" could be equal or not, so this function has two possible executions: either "f" gets called twice with "true" as argument, or it gets called twice with "false" as argument.
The transformed program (with "opt -instsimplify") is
Notice how in block A, "f" gets called with two different values, which should be impossible because the original program only calls f with two times the same value.
The text was updated successfully, but these errors were encountered: