Skip to content

Clang generates incorrect code for unions of types with padding #76017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ivafanas opened this issue Dec 20, 2023 · 5 comments
Open

Clang generates incorrect code for unions of types with padding #76017

ivafanas opened this issue Dec 20, 2023 · 5 comments
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. miscompilation

Comments

@ivafanas
Copy link
Contributor

Reproducer:

#include <cstdio>
#include <cstdint>

struct my_struct_1 {
  float a;
  float b;
};

struct my_struct_2 {
  float x;
  double y;
};

union my_union {
  my_struct_1 s1;
  my_struct_2 s2;
};

my_union my_func() {
  my_union u;
  u.s1 = my_struct_1{100.f, 200.f};
  return u;
}

int main() {
  my_union u = my_func();

  if (u.s1.a != 100.f)
    std::puts("a ooops");

  if (u.s1.b != 200.f)
    std::puts("b ooops");

  return 0;
}

Please, note that my_struct_1 has field b which fits into padding of my_struct_2 and sizeof(my_struct_2) > sizeof(my_struct_1).

Bug is reproduced on X86 on clang 16 release and on the latest main branch (clang++-17 is not tested, but I assume its behaviour is the same):

# clang 16 release
clang++-16 -O1 example.cpp && ./a.out
b ooops

# clang main 5caae72d1a4f
clang++ -O1 example.cpp && ./a.out
b ooops

The problem is that C++ unions are represented in IR level as a struct of member with the largets size:

%union.my_union = type { %struct.my_struct_2 }
%struct.my_struct_2 = type { float, double }
%struct.my_struct_1 = type { float, float }

Information about my_struct_1 layout is lost when union llvm::StructType is constructed. SROA pass operates on my_struct_2 layout only.

my_func IR before SROA:

*** IR Dump After SimplifyCFGPass on _Z7my_funcv ***
; Function Attrs: mustprogress nounwind uwtable
define dso_local { float, double } @_Z7my_funcv() #0 {
entry:
  %retval = alloca %union.my_union, align 8
  %ref.tmp = alloca %struct.my_struct_1, align 4
  call void @llvm.lifetime.start.p0(i64 8, ptr %ref.tmp) #5
  %a = getelementptr inbounds %struct.my_struct_1, ptr %ref.tmp, i32 0, i32 0
  store float 1.000000e+02, ptr %a, align 4, !tbaa !5
  %b = getelementptr inbounds %struct.my_struct_1, ptr %ref.tmp, i32 0, i32 1
  store float 2.000000e+02, ptr %b, align 4, !tbaa !10
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %retval, ptr align 4 %ref.tmp, i64 8, i1 false), !tbaa.struct !11
  call void @llvm.lifetime.end.p0(i64 8, ptr %ref.tmp) #5
  %coerce.dive = getelementptr inbounds %union.my_union, ptr %retval, i32 0, i32 0
  %0 = load { float, double }, ptr %coerce.dive, align 8
  ret { float, double } %0
}

my_func IR after SROA:

*** IR Dump After SROAPass on _Z7my_funcv ***
; Function Attrs: mustprogress nounwind uwtable
define dso_local { float, double } @_Z7my_funcv() #0 {
entry:
  %.fca.0.insert = insertvalue { float, double } poison, float 1.000000e+02, 0
  %.fca.1.insert = insertvalue { float, double } %.fca.0.insert, double undef, 1
  ret { float, double } %.fca.1.insert
}

As you can see, 200.f value is lost and undef value insertion happens here.
On the other side, IR before SROA also looks suspicious, because 32bits of double value are undef -> whole double is undef.

gcc works well on this example.

Possibly related discussion:
https://discourse.llvm.org/t/struct-copy/11330

Possibly related issue:
#53710

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Dec 20, 2023
@EugeneZelenko EugeneZelenko added llvm:optimizations and removed clang Clang issues not falling into any other category labels Dec 20, 2023
@ivafanas
Copy link
Contributor Author

Fix idea 1: lower C/C++ unions into byte arrays of size of the maximum union member.

Fix idea 2: lower C/C++ unions into intermediate llvm::StructType which is the union member of maximum size with padding being filled by extra members (byte arrays or i1 / i2 / i4 / i16 / i32 types)

I can try to implement fix if there are no objections.

?

@nikic
Copy link
Contributor

nikic commented Dec 21, 2023

I think part of the issue here is a SROA bug (#64081), but part of it is that the ABI is just crazy. Just comparing to what GCC does, it looks like the result is returned in xmm0 and xmm1. For s1 a pair of floats is returned in xmm0. For s2 a float is returned in xmm0 and a double in xmm1. I don't think we even have the capability of representing this ABI in LLVM IR. cc @efriedma-quic

@efriedma-quic
Copy link
Collaborator

If the clang ABI code makes the function's return type {double, double}, the rest of the code should do the right thing, I think. (The clang calling convention code is designed to allow casting this way, and SROA should preserve values stored in padding, barring bugs.) Maybe end up with some ugly bitcasting in some cases, but it should do the right thing functionally.

Briefly glancing at the code, it looks like the bug is in X86_64ABIInfo::GetSSETypeAtOffset.

@nikic
Copy link
Contributor

nikic commented Dec 21, 2023

Oh yeah, that makes sense. I didn't consider that we can just bitcast the float pair to a double.

@nikic nikic added clang:codegen IR generation bugs: mangling, exceptions, etc. and removed llvm:optimizations labels Dec 22, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 22, 2023

@llvm/issue-subscribers-clang-codegen

Author: Afanasyev Ivan (ivafanas)

Reproducer:
#include &lt;cstdio&gt;
#include &lt;cstdint&gt;

struct my_struct_1 {
  float a;
  float b;
};

struct my_struct_2 {
  float x;
  double y;
};

union my_union {
  my_struct_1 s1;
  my_struct_2 s2;
};

my_union my_func() {
  my_union u;
  u.s1 = my_struct_1{100.f, 200.f};
  return u;
}

int main() {
  my_union u = my_func();

  if (u.s1.a != 100.f)
    std::puts("a ooops");

  if (u.s1.b != 200.f)
    std::puts("b ooops");

  return 0;
}

Please, note that my_struct_1 has field b which fits into padding of my_struct_2 and sizeof(my_struct_2) &gt; sizeof(my_struct_1).

Bug is reproduced on X86 on clang 16 release and on the latest main branch (clang++-17 is not tested, but I assume its behaviour is the same):

# clang 16 release
clang++-16 -O1 example.cpp &amp;&amp; ./a.out
b ooops

# clang main 5caae72d1a4f
clang++ -O1 example.cpp &amp;&amp; ./a.out
b ooops

The problem is that C++ unions are represented in IR level as a struct of member with the largets size:

%union.my_union = type { %struct.my_struct_2 }
%struct.my_struct_2 = type { float, double }
%struct.my_struct_1 = type { float, float }

Information about my_struct_1 layout is lost when union llvm::StructType is constructed. SROA pass operates on my_struct_2 layout only.

my_func IR before SROA:

*** IR Dump After SimplifyCFGPass on _Z7my_funcv ***
; Function Attrs: mustprogress nounwind uwtable
define dso_local { float, double } @<!-- -->_Z7my_funcv() #<!-- -->0 {
entry:
  %retval = alloca %union.my_union, align 8
  %ref.tmp = alloca %struct.my_struct_1, align 4
  call void @<!-- -->llvm.lifetime.start.p0(i64 8, ptr %ref.tmp) #<!-- -->5
  %a = getelementptr inbounds %struct.my_struct_1, ptr %ref.tmp, i32 0, i32 0
  store float 1.000000e+02, ptr %a, align 4, !tbaa !5
  %b = getelementptr inbounds %struct.my_struct_1, ptr %ref.tmp, i32 0, i32 1
  store float 2.000000e+02, ptr %b, align 4, !tbaa !10
  call void @<!-- -->llvm.memcpy.p0.p0.i64(ptr align 8 %retval, ptr align 4 %ref.tmp, i64 8, i1 false), !tbaa.struct !11
  call void @<!-- -->llvm.lifetime.end.p0(i64 8, ptr %ref.tmp) #<!-- -->5
  %coerce.dive = getelementptr inbounds %union.my_union, ptr %retval, i32 0, i32 0
  %0 = load { float, double }, ptr %coerce.dive, align 8
  ret { float, double } %0
}

my_func IR after SROA:

*** IR Dump After SROAPass on _Z7my_funcv ***
; Function Attrs: mustprogress nounwind uwtable
define dso_local { float, double } @<!-- -->_Z7my_funcv() #<!-- -->0 {
entry:
  %.fca.0.insert = insertvalue { float, double } poison, float 1.000000e+02, 0
  %.fca.1.insert = insertvalue { float, double } %.fca.0.insert, double undef, 1
  ret { float, double } %.fca.1.insert
}

As you can see, 200.f value is lost and undef value insertion happens here.
On the other side, IR before SROA also looks suspicious, because 32bits of double value are undef -> whole double is undef.

gcc works well on this example.

Possibly related discussion:
https://discourse.llvm.org/t/struct-copy/11330

Possibly related issue:
#53710

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. miscompilation
Projects
None yet
Development

No branches or pull requests

6 participants