Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LICM sinks/hoist memory accesses to dead stack objects (wrong code at -O1 on x86_64-linux-gnu) #51838

Open
zhendongsu opened this issue Nov 13, 2021 · 11 comments
Labels
bugzilla Issues migrated from bugzilla loopoptim miscompilation

Comments

@zhendongsu
Copy link

zhendongsu commented Nov 13, 2021

Bugzilla Link 52496
Version unspecified
OS All
CC @fhahn,@MaskRay,@preames,@RKSimon,@rnk,@rotateright,@xiangzh1

Extended Description

It appears to be a regression from 12.*.

[758] % clangtk -v
clang version 14.0.0 (https://github.com/llvm/llvm-project.git 8ed8d370880b5c4e7bbf52b50791710a9f4f834b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /local/suz-local/opfuzz/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
[759] % 
[759] % clangtk -O0 -w small.c; ./a.out
[760] % 
[760] % clangtk -O1 -w small.c
[761] % ./a.out
Aborted
[762] % 
[762] % cat small.c
int printf(const char *, ...);
int a, *b;
int main() {
  int *c, *d, e;
  while (a) {
    int f[1];
    while (a)
      printf(c);
    c = f;
    b = (int *)&d;
  }
L:
  e = 1;
  if (a) {
    printf("%d", a);
    b = &e;
    (*c)++;
  }
  if (a)
    goto L;
  if (!e)
    __builtin_abort();
  return 0;
}
@rotateright
Copy link
Contributor

The bug is either in or exposed by CGP. I'm attaching a slight IR reduction that shows this diff:

% llc -o - -disable-cgp=0 52496.ll | clang -x assembler - && ./a.out; echo $?
0
% llc -o - -disable-cgp=1 52496.ll | clang -x assembler - && ./a.out; echo $?
1

@rotateright
Copy link
Contributor

@rotateright
Copy link
Contributor

I see that CGP is duplicating/sinking casts of alloca'd pointers around lifetime markers and that seems suspicious, but I'm not familiar with how that (or StackColoring?) works.

cc'ing some other contributors based on commit logs in those areas.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@fhahn
Copy link
Contributor

fhahn commented Mar 21, 2022

The original C reproducer works correctly on current trunk: https://godbolt.org/z/nM3ca1hr4

The IR reproducer from @rotateright still shows a difference with and without codegenprepare, but I think this may be due to the reproducer already having accesses to locations after lifetime.end, e.g. in L.preheader.

@fhahn
Copy link
Contributor

fhahn commented Mar 21, 2022

Ok, looks like the original program has an stack-use-after-scope bug and hence the program invalid (f is declared in the while() scope but escapss and is used outside that scope): https://clang.godbolt.org/z/fjEqbETnG

@fhahn fhahn closed this as completed Mar 21, 2022
@fhahn fhahn added the invalid Resolved as invalid, i.e. not a bug label Mar 21, 2022
@zhendongsu
Copy link
Author

Ok, looks like the original program has an stack-use-after-scope bug and hence the program invalid (f is declared in the while() scope but escapss and is used outside that scope): https://clang.godbolt.org/z/fjEqbETnG

Florian, this looks like an address sanitizer bug for 13.0.* at -O1 as the code should be valid:

  • 13.0.* at -O0 okay
  • 12.0.* and trunk at -O1 also okay

@fhahn

@fhahn
Copy link
Contributor

fhahn commented Mar 22, 2022

You are right, the access to the out-of-scope stack entry is never executed in the original program!

The issue is that LICM hoists a load of and sinks a store to an alloca out of a loop and executes it unconditionally, even though the lifetime of the alloca has been ended. This means we now execute a load/store of a dead stack object. It's not entirely clear from LangRef what this means, but According to Alive2 this is UB: https://alive2.llvm.org/ce/z/QkhiTM (simplified version).

As a consequence, the backend will use the same stack slot for %e and %f. I don't think the load is problematic, but the store may trash a valid different stack value.

llvm::isSafeToSpeculativelyExecute may need updating to consider that the alloca is dead. CC'ing some people for additional thoughts @preames @rotateright, @alinas @nikic

IR reproducer for LICM (https://clang.godbolt.org/z/Pzn5Mf86v):

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
define dso_local i32 @main(i32 %n) {
entry:
  %e = alloca i32, align 4
  %f = alloca i32, align 4
  store i32 0, i32* %e
  %f.cast = bitcast i32* %f to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %f.cast)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %f.cast)
  br label %L

L:                                                ; preds = %if.end, %L.preheader
  %iv = phi i32 [ 0, %entry ], [ %iv.next, %if.end ]
  %tobool5.not = icmp eq i32 %iv, %n
  br i1 %tobool5.not, label %if.end, label %if.then

if.then:                                          ; preds = %L
  %i8 = load i32, i32* %f, align 4
  %inc = add nsw i32 %i8, 1
  store i32 %inc, i32* %f, align 4
  br label %if.end

if.end:                                           ; preds = %if.then, %L
  %tobool7.not = icmp eq i32 %iv, 1999
  %iv.next = add i32 %iv, 1
  br i1 %tobool7.not, label %if.end9, label %L

if.end9:                                          ; preds = %if.end
  %l = load i32, i32* %e
  ret i32 %l
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1

@fhahn fhahn reopened this Mar 22, 2022
@fhahn fhahn changed the title wrong code at -O1 on x86_64-linux-gnu LICM sinks/hoist memory accesses to dead stack objects (wrong code at -O1 on x86_64-linux-gnu) Mar 22, 2022
@fhahn fhahn added miscompilation loopoptim and removed invalid Resolved as invalid, i.e. not a bug labels Mar 22, 2022
@nikic
Copy link
Contributor

nikic commented Mar 23, 2022

We have this wording in LangRef:

A lifetime of a memory object is a property that decides its accessibility. Unless stated otherwise, a memory object is alive since its allocation, and dead after its deallocation. It is undefined behavior to access a memory object that isn’t alive, but operations that don’t dereference it such as getelementptr, ptrtoint and icmp return a valid result. This explains code motion of these instructions across operations that impact the object’s lifetime. A stack object’s lifetime can be explicitly specified using llvm.lifetime.start and llvm.lifetime.end intrinsic function calls.

I think we should adjust the semantics here to say that loading from a dead stack object results in a poison value (while storing results in immediate undefined behavior). Otherwise either a) not all unordered, dereferenceable, aligned loads are speculatable or b) allocas are non-dereferenceable after the lifetime ends. I guess b) is viable from a semantics perspective, but not really viable from an implementation perspective right now, because it would make dereferenceability of allocas context-sensitive.

@nunoplopes Do you see any issue with making load after alloca lifetime end poison rather than UB?

In any case, the store clearly has to be UB, so I guess LICM scalar promotion needs an additional check to not use an unconditional store if it doesn't know that the alloca is live.

@preames
Copy link
Collaborator

preames commented Mar 23, 2022

My take: Interpreting lifetime intrinsics as changing whether the memory dereferenceable is a bad idea. An access outside of the lifetime can certainly be defined as returning poison for loads, or leaving the memory in an unspecified state, but allowing them to trap is an extremely profound change to the optimizer which is simply a bad idea.

The entire point of an alloca is that it's stack storage with a lifetime the compiler understands. If we allow lifetime intrinsics to change that lifetime, everything which moves code has to be aware of that, and we have a problem which is current context-free which must become context-sensitive. (This is @nikic's option b above. Option a must be rejected out of hand.)

I think this is simply a case where the LangRef is out of sync with reality, and we need to fix LangRef. That was my feedback on the change which added this wording to the LangRef, and my position here has not changed.

An additional argument in favor of this is that current LangRef assigns different semantics to a lifetime marker on an alloca than a lifetime marker on heap allocated storage. That inconsistency is a strong hint that the specified behavior is wrong. Allowing the distinct disallows e.g. heap to stack conversion without stripping all lifetime markers.

Another such hint is that allowing lifetime markers to change dereferenceability implies that an alloca without markers can be more aggressively optimized than one with. That sounds very suspect to say the least.

If we do accept my view and change langref, we do have to also go fix stack coloring. Last I looked at that code - it's been a while, so my memory might be off - it took lifetime markers as gospel truth, not a strong hint. With the interpretation I'm suggesting (that the middle end already uses), that becomes a clear bug.

@nunoplopes
Copy link
Member

nunoplopes commented Mar 23, 2022

Some history: lifetime markers were added to allow stack coloring. The semantics was not well documented nor well thought off. Later we (me & Juneyong) tried to improve the description in LangRef. Johannes asked us if we could also define the semantics of lifetime markers for heap objects while at it. We didn't really want that because LLVM doesn't use lifetime markers on heap objects but he insisted that he would use such a feature, so eventually we gave in.

The semantics of the lifetime markers for stack objects must be defined by how the stack coloring works. I agree it is never ok to allow stores after liveness ends and that allowing loads beforehand might be ok.
Why is it potentially not ok to allow loads outside the respective lifetime? For example, if you want to support segmented stacks. In that case, the stack may have been already deallocated when you access a dead object. I don't know if any of the JIT implementations uses such a thing right now, but changing the semantics would forbid such aggressive stack implementations to ever exist. Is that bad? I don't know. Nor do I care personally, I just want a consistent LangRef so I can check it against the implementation🙂

@nikic
Copy link
Contributor

nikic commented Jul 18, 2022

If we do accept my view and change langref, we do have to also go fix stack coloring. Last I looked at that code - it's been a while, so my memory might be off - it took lifetime markers as gospel truth, not a strong hint. With the interpretation I'm suggesting (that the middle end already uses), that becomes a clear bug.

The core problem with fixing stack coloring is that it makes it effectively impossible to (non-trivially) color escaped allocas. You might have a store to the alloca behind every call or indirect store.

StackColoring actually already has an option to gracefully handle cases like in this bug (

/// The user may write code that uses allocas outside of the declared lifetime
/// zone. This can happen when the user returns a reference to a local
/// data-structure. We can detect these cases and decide not to optimize the
/// code. If this flag is enabled, we try to save the user. This option
/// is treated as overriding LifetimeStartOnFirstUse below.
static cl::opt<bool>
ProtectFromEscapedAllocas("protect-from-escaped-allocas",
cl::init(false), cl::Hidden,
cl::desc("Do not optimize lifetime zones that "
"are broken"));
), where there is a direct use of an alloca outside its lifetime region (but I don't think it handles indirect uses of escaped allocas).

As far as stack coloring is concerned, I'd assume that supporting escaped allocas is the one and only purpose of lifetime markers (otherwise deriving lifetimes from uses shouldn't be particularly hard), so losing support for that would be problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla loopoptim miscompilation
Projects
None yet
Development

No branches or pull requests

6 participants