-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SROA conflicts with the way clang lowers unions #64081
Comments
Reduced: https://llvm.godbolt.org/z/hEbPGz69q ; RUN: opt -S -passes=sroa
@.str = global [2 x i8] [i8 97, i8 97]
define void @f(i1 %cond) {
entry:
%u = alloca { i6, i8 }, align 1
%a = alloca [2 x i8], align 1
call void @llvm.memcpy.p0.p0.i64(ptr align 1 %u, ptr align 1 @.str, i64 2, i1 false)
br i1 %cond, label %if.then, label %if.end
if.then: ; preds = %entry
store i6 -31, ptr %u, align 1
%b = getelementptr inbounds i8, ptr %u, i64 1
store i8 0, ptr %b, align 1
br label %if.end
if.end: ; preds = %if.then, %entry
call void @llvm.memcpy.p0.p0.i64(ptr align 1 %a, ptr align 1 %u, i64 2, i1 false)
call void @puts(ptr %a)
ret void
}
declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
declare void @puts(ptr) From a brief glance at the code, llvm-project/llvm/lib/Transforms/Scalar/SROA.cpp Line 3153 in 992fa7b
|
The issue here looks like SROA is specifically getting confused by integer types that aren't multiples of 8; you could probably make your frontend avoid emitting allocas with such types as a workaround. (The "largest alignment" thing is just to try to make the generated types look simpler; it's not actually necessary for correctness.) Really, almost every place that's accessing AllocaInst::getAllocatedType() should be using AllocaInst::getAllocationSize() instead. |
I have reduced yet another issue to this bug: %A = type { i32 }
%B = type { i1, i3 }
define i32 @main() {
%1 = alloca %A, align 4
%2 = alloca %B, align 1
%3 = alloca i8, align 1
store i8 42, ptr %3, align 1
%4 = load i8, ptr %3, align 1
%5 = trunc i8 %4 to i7
%6 = getelementptr inbounds { i1, i8 }, ptr %2, i64 0, i32 1
store i7 %5, ptr %6, align 1
%7 = getelementptr inbounds { <{ %B, [2 x i8] }>, i1, [3 x i8] }, ptr %1, i64 0, i32 0, i32 0
call void @llvm.memcpy.p0.p0.i64(ptr align 1 %7, ptr align 1 %2, i64 2, i1 false)
%8 = call i8 @getC(ptr nonnull readonly align 4 %1)
%9 = zext i8 %8 to i32
ret i32 %9
}
define i8 @getC(ptr nonnull readonly align 4 %0) {
%2 = getelementptr inbounds %A, ptr %0, i32 0, i32 0
%3 = getelementptr inbounds %B, ptr %2, i32 0, i32 1
%4 = load i8, ptr %3, align 1
ret i8 %4
}
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)
|
We got some regressions downstream after the fix in 54067c5 I think this test program show the problem (or parts of it):
When the frontend is emitting memcpy to perform an aggregate copy (this could for example be a EmitAggExprToLValue) such a memcpy would end up copying padding. That doesn't mean that the padding is of much interest, and the IR might just be reading the aggregate members without any type punning(?) as in this issue. But with this patch such memcpy:s seem to prevent SROA from eliminating the alloca. And we also get an annoying memcpy inside the loop. Not quite sure how to deal with this. What you say is that the type used in the alloca shouldn't really matter, it's only the store size that is of importance. In other words, it is fully legal to do things like
and
Thus, SROA can't look at the alloca type to know how the temporary variable is accessed. But maybe it can look at the users to figure out how the stack slot is used. And if it is accessed using a single single-value-type, there won't be a need to use memcpy? |
Hm. Looks like the above example wasn't really regressed by this patch but maybe something else that has changed since clang 17. But if changing X to 28 like this: https://godbolt.org/z/35eqh5EG8 then it will show a difference depending on if I revert 54067c5 or not. |
This is probably due to introduction of |
Done in #77116. I confirmed that this removes the memcpy in your example. |
… types The bugfix in commit 54067c5, related to llvm#64081 limited the ability of SROA to handle non byte-sized types when used in aggregates that are memcpy'd. Adding some test cases involving tbaa.struct annotations that might allow us to eliminate more memcpy/alloca.
…a.struct The bugfix in commit 54067c5, related to llvm#64081 limited the ability of SROA to handle non byte-sized types when used in aggregates that are memcpy'd. Main problem was that the LLVM types used in an alloca doesn't always reflect if the stack slot can be used to for multiple types (typically unions). So even if we for example have %p = alloca i6 that stack slot may be used for other types than i6. And it would be legal to for example store an i8 to that stack slot. Thus, if %p was dereferenced in a memcpy we needed to consider that also padding bits (seen from the i6 perspective) could be of importance. The limitation added to SROA in commit 54067c5 resulted in huge regressions for a downstream target. Since the frontend typically emit memcpy for aggregate copy it seems quite normal that one end up with a memcpy that is copying padding bits even when there are no unions or type punning. So that limitation seem unfortunate in general. In this patch we try to lift the restrictions a bit. If the memcpy is decorated with tbaa.struct metadata we look at that metadata. If we find out if the slice used by our new alloca is touching memory described by a single scalar type according to the tbaa.struct, then we assume that the type derived for the new alloca is correct for all accesses made to the stack slot. And then we can allow replacing the memcpy by regular load/store instructions operating on that type (disregarding any padding bits).
godbolt
I initially reduced to a simpler LLVM IR repro for this bug, but it's not clear to me if that is a bug in SROA or just invalid IR, so I backported it to C to demonstrate that clang is able to produce equivalent IR.
The issue appears to be that clang selects the union member with the largest alignment as the representation for the union, which in this case is arbitrary since they have the same alignment, and indeed swapping the order of the union fields avoids the bug. Clang then uses this as the type of the
alloca
, which SROA takes to mean that the top two bits ofu.a.a
are not meaningful, when the source language says that those bits are perfectly meaningful when accessingu.b[0]
.Since this was reduced from another language, I can either change the way unions are lowered in that compiler (which currently uses the same method as clang, and I'm not sure what a better way would look like), or wait for an LLVM release that doesn't miscompile when lowering to the current IR.
The text was updated successfully, but these errors were encountered: