Clang should lower unions to LLVM as byte array, not structure type #107239

mkuron · 2024-09-04T13:18:00Z

Consider a C++ union like this:

struct S1 {
  char s;
  int i;
};
struct S2 {
  int a;
  int b;
};
union U {
  S1 s1;
  S2 s2;
};

Clang lowers it to LLVM-IR like this as LLVM has no concept of unions:

%union.U = type { %struct.S1 }
%struct.S1 = type { i8, i32 }

This is dangerous whenever one of the structs contains alignment padding that doesn't coincide with the other structs' padding. In the above example, LLVM's perspective is that the second, third, and fourth byte of the union are unused, but if the union happens to contain an S2, they are in fact used. Whenever something in LLVM iterates over structure members, there is thus a danger of data loss. This has come up before in various contexts (#53710, #64081, #76017, probably others that I haven't seen) and the workaround tends to be to make LLVM locally reinterpret the structure type as an opaque byte array or fix up its size. This does not scale, however, and there are likely more places that need this workaround but don't have it yet.

@Artem-B pointed out on #53710 that the documentation says

If it sees an aggregate, it assumes that loading/storing all fields of an aggregate is equivalent of loading/storing complete aggregate.

From the LLVM docs for load https://llvm.org/docs/LangRef.html#id211 :

If the value being loaded is of aggregate type, the bytes that correspond to padding may be accessed but are ignored, because it is impossible to observe padding from the loaded aggregate value.

For store it says: https://llvm.org/docs/LangRef.html#id216

If is of aggregate type, padding is filled with undef.

and suggested I open this issue. Clang's way of lowering unions as structure types thus seems to be in violation of the assumptions that LLVM makes about structure types. Instead of applying workarounds throughout LLVM, the preferable solution would thus be to change Clang's union lowering to use byte arrays instead of structure types. The above example would then be represented as

%union.U = type { [8 x i8] }

which does not entice LLVM to make any assumptions that certain bytes might be padding.

The code in Clang that is responsible for lowering unions is in CGRecordLowering::lowerUnion:

llvm-project/clang/lib/CodeGen/CGRecordLayoutBuilder.cpp

Lines 313 to 378 in 4497ec2

    
           void CGRecordLowering::lowerUnion(bool isNoUniqueAddress) { 
        
             CharUnits LayoutSize = 
        
                 isNoUniqueAddress ? Layout.getDataSize() : Layout.getSize(); 
        
             llvm::Type *StorageType = nullptr; 
        
             bool SeenNamedMember = false; 
        
             // Iterate through the fields setting bitFieldInfo and the Fields array. Also 
        
             // locate the "most appropriate" storage type.  The heuristic for finding the 
        
             // storage type isn't necessary, the first (non-0-length-bitfield) field's 
        
             // type would work fine and be simpler but would be different than what we've 
        
             // been doing and cause lit tests to change. 
        
             for (const auto *Field : D->fields()) { 
        
               if (Field->isBitField()) { 
        
                 if (Field->isZeroLengthBitField(Context)) 
        
                   continue; 
        
                 llvm::Type *FieldType = getStorageType(Field); 
        
                 if (LayoutSize < getSize(FieldType)) 
        
                   FieldType = getByteArrayType(LayoutSize); 
        
                 setBitFieldInfo(Field, CharUnits::Zero(), FieldType); 
        
               } 
        
               Fields[Field->getCanonicalDecl()] = 0; 
        
               llvm::Type *FieldType = getStorageType(Field); 
        
               // Compute zero-initializable status. 
        
               // This union might not be zero initialized: it may contain a pointer to 
        
               // data member which might have some exotic initialization sequence. 
        
               // If this is the case, then we aught not to try and come up with a "better" 
        
               // type, it might not be very easy to come up with a Constant which 
        
               // correctly initializes it. 
        
               if (!SeenNamedMember) { 
        
                 SeenNamedMember = Field->getIdentifier(); 
        
                 if (!SeenNamedMember) 
        
                   if (const auto *FieldRD = Field->getType()->getAsRecordDecl()) 
        
                     SeenNamedMember = FieldRD->findFirstNamedDataMember(); 
        
                 if (SeenNamedMember && !isZeroInitializable(Field)) { 
        
                   IsZeroInitializable = IsZeroInitializableAsBase = false; 
        
                   StorageType = FieldType; 
        
                 } 
        
               } 
        
               // Because our union isn't zero initializable, we won't be getting a better 
        
               // storage type. 
        
               if (!IsZeroInitializable) 
        
                 continue; 
        
               // Conditionally update our storage type if we've got a new "better" one. 
        
               if (!StorageType || 
        
                   getAlignment(FieldType) >  getAlignment(StorageType) || 
        
                   (getAlignment(FieldType) == getAlignment(StorageType) && 
        
                   getSize(FieldType) > getSize(StorageType))) 
        
                 StorageType = FieldType; 
        
             } 
        
             // If we have no storage type just pad to the appropriate size and return. 
        
             if (!StorageType) 
        
               return appendPaddingBytes(LayoutSize); 
        
             // If our storage size was bigger than our required size (can happen in the 
        
             // case of packed bitfields on Itanium) then just use an I8 array. 
        
             if (LayoutSize < getSize(StorageType)) 
        
               StorageType = getByteArrayType(LayoutSize); 
        
             FieldTypes.push_back(StorageType); 
        
             appendPaddingBytes(LayoutSize - getSize(StorageType)); 
        
             // Set packed if we need it. 
        
             const auto StorageAlignment = getAlignment(StorageType); 
        
             assert((Layout.getSize() % StorageAlignment == 0 || 
        
                     Layout.getDataSize() % StorageAlignment) && 
        
                    "Union's standard layout and no_unique_address layout must agree on " 
        
                    "packedness"); 
        
             if (Layout.getDataSize() % StorageAlignment) 
        
               Packed = true; 
        
           }

It even says that the complicated heuristic for deciding which one of the union's members the type should be based on is unnecessary. It already contains some fallback paths where it generates an opaque byte array if it can't identify an appropriate member.

Pinging the people involved in #64081 and #76017: @jacobly0, @nikic, @efriedma-quic, @ivafanas.

The text was updated successfully, but these errors were encountered:

nikic · 2024-09-04T13:30:24Z

I think it's okay to switch to a byte array to make things more robust and obvious. But I do want to point out that whenever some piece of code emits a load/store of an aggregate type, it's that code that is wrong. E.g. looking at #53710 it's a bug in the NVPTX backend that it copies byval arguments using a load+store pair instead of using memcpy. Generally, the only information you're allowed to take from a byval argument is its size and alignment.

llvmbot · 2024-09-04T13:49:26Z

@llvm/issue-subscribers-clang-codegen

Author: Michael Kuron (mkuron)

Consider a C++ union like this: ```c++ struct S1 { char s; int i; }; struct S2 { int a; int b; }; union U { S1 s1; S2 s2; }; ``` Clang lowers it to LLVM-IR like this as LLVM has no concept of unions: ```llvm %union.U = type { %struct.S1 } %struct.S1 = type { i8, i32 } ``` This is dangerous whenever one of the structs contains alignment padding that doesn't coincide with the other structs' padding. In the above example, LLVM's perspective is that the second, third, and fourth byte of the union are unused, but if the union happens to contain an `S2`, they are in fact used. Whenever something in LLVM iterates over structure members, there is thus a danger of data loss. This has come up before in various contexts (#53710, #64081, #76017, probably others that I haven't seen) and the workaround tends to be to make LLVM locally reinterpret the structure type as an opaque byte array or fix up its size. This does not scale, however, and there are likely more places that need this workaround but don't have it yet.

@Artem-B pointed out on #53710 that the documentation says
> If it sees an aggregate, it assumes that loading/storing all fields of an aggregate is equivalent of loading/storing complete aggregate.
>
> From the LLVM docs for load https://llvm.org/docs/LangRef.html#id211 :
>
> > If the value being loaded is of aggregate type, the bytes that correspond to padding may be accessed but are ignored, because it is impossible to observe padding from the loaded aggregate value.
>
> For store it says: https://llvm.org/docs/LangRef.html#id216
>
> > If <value> is of aggregate type, padding is filled with undef.

and suggested I open this issue. Clang's way of lowering unions as structure types thus seems to be in violation of the assumptions that LLVM makes about structure types. Instead of applying workarounds throughout LLVM, the preferable solution would thus be to change Clang's union lowering to use byte arrays instead of structure types. The above example would then be represented as

%union.U = type { [8 x i8] }

which does not entice LLVM to make any assumptions that certain bytes might be padding.

The code in Clang that is responsible for lowering unions is in CGRecordLowering::lowerUnion:

llvm-project/clang/lib/CodeGen/CGRecordLayoutBuilder.cpp

Lines 313 to 378 in 4497ec2

    
           void CGRecordLowering::lowerUnion(bool isNoUniqueAddress) { 
        
             CharUnits LayoutSize = 
        
                 isNoUniqueAddress ? Layout.getDataSize() : Layout.getSize(); 
        
             llvm::Type *StorageType = nullptr; 
        
             bool SeenNamedMember = false; 
        
             // Iterate through the fields setting bitFieldInfo and the Fields array. Also 
        
             // locate the "most appropriate" storage type.  The heuristic for finding the 
        
             // storage type isn't necessary, the first (non-0-length-bitfield) field's 
        
             // type would work fine and be simpler but would be different than what we've 
        
             // been doing and cause lit tests to change. 
        
             for (const auto *Field : D->fields()) { 
        
               if (Field->isBitField()) { 
        
                 if (Field->isZeroLengthBitField(Context)) 
        
                   continue; 
        
                 llvm::Type *FieldType = getStorageType(Field); 
        
                 if (LayoutSize < getSize(FieldType)) 
        
                   FieldType = getByteArrayType(LayoutSize); 
        
                 setBitFieldInfo(Field, CharUnits::Zero(), FieldType); 
        
               } 
        
               Fields[Field->getCanonicalDecl()] = 0; 
        
               llvm::Type *FieldType = getStorageType(Field); 
        
               // Compute zero-initializable status. 
        
               // This union might not be zero initialized: it may contain a pointer to 
        
               // data member which might have some exotic initialization sequence. 
        
               // If this is the case, then we aught not to try and come up with a "better" 
        
               // type, it might not be very easy to come up with a Constant which 
        
               // correctly initializes it. 
        
               if (!SeenNamedMember) { 
        
                 SeenNamedMember = Field->getIdentifier(); 
        
                 if (!SeenNamedMember) 
        
                   if (const auto *FieldRD = Field->getType()->getAsRecordDecl()) 
        
                     SeenNamedMember = FieldRD->findFirstNamedDataMember(); 
        
                 if (SeenNamedMember && !isZeroInitializable(Field)) { 
        
                   IsZeroInitializable = IsZeroInitializableAsBase = false; 
        
                   StorageType = FieldType; 
        
                 } 
        
               } 
        
               // Because our union isn't zero initializable, we won't be getting a better 
        
               // storage type. 
        
               if (!IsZeroInitializable) 
        
                 continue; 
        
               // Conditionally update our storage type if we've got a new "better" one. 
        
               if (!StorageType || 
        
                   getAlignment(FieldType) >  getAlignment(StorageType) || 
        
                   (getAlignment(FieldType) == getAlignment(StorageType) && 
        
                   getSize(FieldType) > getSize(StorageType))) 
        
                 StorageType = FieldType; 
        
             } 
        
             // If we have no storage type just pad to the appropriate size and return. 
        
             if (!StorageType) 
        
               return appendPaddingBytes(LayoutSize); 
        
             // If our storage size was bigger than our required size (can happen in the 
        
             // case of packed bitfields on Itanium) then just use an I8 array. 
        
             if (LayoutSize < getSize(StorageType)) 
        
               StorageType = getByteArrayType(LayoutSize); 
        
             FieldTypes.push_back(StorageType); 
        
             appendPaddingBytes(LayoutSize - getSize(StorageType)); 
        
             // Set packed if we need it. 
        
             const auto StorageAlignment = getAlignment(StorageType); 
        
             assert((Layout.getSize() % StorageAlignment == 0 || 
        
                     Layout.getDataSize() % StorageAlignment) && 
        
                    "Union's standard layout and no_unique_address layout must agree on " 
        
                    "packedness"); 
        
             if (Layout.getDataSize() % StorageAlignment) 
        
               Packed = true; 
        
           }

It even says that the complicated heuristic for deciding which one of the union's members the type should be based on is unnecessary. It already contains some fallback paths where it generates an opaque byte array if it can't identify an appropriate member.

Pinging the people involved in #64081 and #76017: @jacobly0, @nikic, @efriedma-quic, @ivafanas.

efriedma-quic · 2024-09-04T18:17:40Z

The way forward here is to just remove these types from the IR completely, I think. AllocaInst::getAllocatedType() and Argument::getParamByValType aren't returning useful information; if you look at the LangRef rules, only the size is semantically significant. So we should just be using AllocaInst::getAllocationSize() etc. Once we do that, clang's lowering of unions becomes irrelevant.

We could change the way clang emits these types in the meantime, I guess, but we don't get much benefit out of it.

(Also, this isn't at all relevant to #76017.)

Artem-B · 2024-09-04T18:28:47Z

just remove these types from the IR completely,

+1. This seems to be the simplest consistent way to keep everyone happy.

@nikic

whenever some piece of code emits a load/store of an aggregate type, it's that code that is wrong

Semantics of loads/stores of aggregates is currently documented in the LLVM IR and as such should be valid, no?
If it's wrong, perhaps IR spec should be updated to reflect that.

efriedma-quic · 2024-09-04T18:47:50Z

Semantics of loads/stores of aggregates is currently documented in the LLVM IR and as such should be valid, no? If it's wrong, perhaps IR spec should be updated to reflect that.

Aggregate loads/stores work as documented... the issue is that the documented semantics aren't the semantics you want for copying an alloca/byval value.

Artem-B · 2024-09-05T00:03:48Z

Aggregate loads/stores work as documented... the issue is that the documented semantics aren't the semantics you want for copying an alloca/byval value.

I agree that there's a disconnect, but I think I'm missing something here.

User's IR says "here's an aggregate of type T". IR spec says -- you're free to load/store that aggregate, but you only get the fields, not the padding. Presumably, if one loads/stores/loads that aggregate, the values of both loads will be the same, but not necessarily the padding.

Clang clearly assumes that everything must be copied. I want to understand what is that assumption based on, because LLVM does not seem to make such a promise. At least I have not found it in the docs so far.
I may have overlooked or misinterpreted something. Does IR spec say that aggregates must copy all bytes? I'd appreciate if you could point me in the right direction.

nikic · 2024-09-05T08:02:35Z

Clang clearly assumes that everything must be copied. I want to understand what is that assumption based on, because LLVM does not seem to make such a promise. At least I have not found it in the docs so far. I may have overlooked or misinterpreted something. Does IR spec say that aggregates must copy all bytes? I'd appreciate if you could point me in the right direction.

IR loads and stores of aggregates do not copy padding. However, clang will not actually (or at least should not) emit load/stores of aggregate type, so this is not a problem for union lowering. Our frontend guidelines explicitly discourage the creation of values of aggregate type unless strictly necessary, e.g. for return ABI handling, and clang follows that.

The type argument of alloca and byval arguments is only meaningful for its size and alignment. alloca {i32, i8}, align 4 and alloca [8 x i8], align 4 are semantically strictly equivalent (assuming the usual alignments). The same is true for byval({i32, i8}) align 4 and byval([8 x i8]) align 4. I think the alloca docs are pretty clear on this point, while the byval ones could maybe be clarified to say that the word "copy" there is meant in the sense of memcpy, not the sense of "load+store with the byval type".

The way forward here is to just remove these types from the IR completely, I think.

Yes, I agree.

Artem-B · 2024-09-05T18:36:15Z

clang will not actually (or at least should not) emit load/stores of aggregate type, so this is not a problem for union lowering.

OK, so when clang emits initial unoptimized IR, the IR is fine. But it's just the beginning of the story of those aggregate values.
If LLVM wants/needs to materialize a copy of it for whatever reason, does it promise to always make a complete copy, or does it only guarantees to preserve the defined fields of the aggregate type?

github-actions bot added the clang label Sep 4, 2024

mkuron changed the title ~~Clang should lower unions to LLVM as byte array, not structure~~ Clang should lower unions to LLVM as byte array, not structure type Sep 4, 2024

llvm deleted a comment Sep 4, 2024

EugeneZelenko added clang:codegen and removed clang labels Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clang should lower unions to LLVM as byte array, not structure type #107239

Clang should lower unions to LLVM as byte array, not structure type #107239

mkuron commented Sep 4, 2024

nikic commented Sep 4, 2024

llvmbot commented Sep 4, 2024

efriedma-quic commented Sep 4, 2024

Artem-B commented Sep 4, 2024

efriedma-quic commented Sep 4, 2024

Artem-B commented Sep 5, 2024

nikic commented Sep 5, 2024

Artem-B commented Sep 5, 2024

Clang should lower unions to LLVM as byte array, not structure type #107239

Clang should lower unions to LLVM as byte array, not structure type #107239

Comments

mkuron commented Sep 4, 2024

nikic commented Sep 4, 2024

llvmbot commented Sep 4, 2024

efriedma-quic commented Sep 4, 2024

Artem-B commented Sep 4, 2024

efriedma-quic commented Sep 4, 2024

Artem-B commented Sep 5, 2024

nikic commented Sep 5, 2024

Artem-B commented Sep 5, 2024