Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU stepthomas atomic csub no rtn forms ver2 #1

Closed
wants to merge 467 commits into from

Conversation

stepthomas
Copy link
Owner

philnik777 and others added 30 commits October 6, 2023 11:21
…ability macros

Adding additional instantiations to the dylib isn't actually an ABI break as long as programs targeting an older dylib don't start to depend on them. Making additional instantiations a matter of availability allows us to add them without an ABI break.

Reviewed By: #libc, ldionne, Mordante

Spies: arichardson, ldionne, Mordante, libcxx-commits

Differential Revision: https://reviews.llvm.org/D154796
- Add test coverage for sext/zext boolean additions
- [InstCombine] Fold comparison of adding two z/sext booleans

Fixes llvm#64859.
Reviewed By: #libc, ldionne

Spies: ldionne, Mordante, libcxx-commits

Differential Revision: https://reviews.llvm.org/D155411
…stinations (llvm#68074)

Extend `bufferization.materialize_in_destination` to support memref
destinations. This op can now be used to indicate that a tensor
computation should materialize in a given buffer (that may have been
allocated by another component/runtime). The op still participates in
"empty tensor elimination".

Example:
```mlir
func.func @test(%out: memref<10xf32>) {
  %t = tensor.empty() : tensor<10xf32>
  %c = linalg.generic ... outs(%t: tensor<10xf32>) -> tensor<10xf32>
  bufferization.materialize_in_destination %c in restrict writable %out : (tensor<10xf32>, memref<10xf32>) -> ()
  return
}
```
After "empty tensor elimination", the above IR can bufferize without an
allocation:
```mlir
func.func @test(%out: memref<10xf32>) {
  linalg.generic ... outs(%out: memref<10xf32>)
  return
}
```

This change also clarifies the meaning of the `restrict` unit attribute
on `bufferization.to_tensor` ops.
Some large AVR programs (for devices without long jump) may exceed
128KiB, and lld should give explicit errors other than generate wrong
executables silently.
…mber detection (llvm#68405)

Fixes misleading comment introduced in
`f74aaca63202cabb512c78fe19196ff348d436a8`
…llvm#67120)

The transfrom interpreter accepts an argument to a "library" file with
named sequences. This patch exteneds this functionality such that (1)
several such individual files are accepted and (2) folders can be passed
in, in which all `*.mlir` files are loaded.
…ffles

Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold.

It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask

First stage towards addressing Issue llvm#67803
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
…erErrors member variable correctly based on the PP diagnostics. (llvm#68127)""

This reverts commit a6acf3f and
relands a50e63b. The original revert
was done by mistake.
…8382)

While working on llvm#68377 inspecting `Allocate()` calls, I found out that
there are couple of places where we forget to use placement-new to
create objects in the allocated memory.
/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:948:33: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Werror,-Wsign-compare]
          (Index + Mask.size()) <= NumSrcElts) {
           ~~~~~~~~~~~~~~~~~~~  ^  ~~~~~~~~~~
…ing a simple transform dialect interpreter pass (llvm#68330)
These were missed as I didn't expect clang codegen to be updated
…lvm#65852)

This patch folds the pattern `a ne/eq (zext/sext (a ne/eq c))` into a boolean constant or a compare.
Clang vs GCC: https://godbolt.org/z/4ro817WE8
Proof for `zext`: https://alive2.llvm.org/ce/z/6z9NRF
Proof for `sext`: https://alive2.llvm.org/ce/z/tv5wuE
Fixes llvm#65073.
Introduced by 5b657f5 that moved
LICM after AMDGPUCodeGenPrepare. Some instructions are no longer
sunk during ir optimizations but in machine-sinking instead.
If vgpr instruction used sgpr defined inside the cycle is sunk outside
of the cycle we end up with not-handled case of temporal divergence.
Add test for theoretical case when SALU instruction (represents
uniform value) is sunk outside of the cycle.
Add a test when SALU instruction can be sunk if it edits lane mask.
Temporal divergence that was present in input or introduced in IR
transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies
by changing sgpr source instr to vgpr instr.
After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare,
machine-sinking can introduce temporal divergence by sinking
instructions outside of the cycle.
Add isSafeToSink callback in TargetInstrInfo.
Chen Zheng and others added 26 commits October 10, 2023 09:13
…er (llvm#67284)

For now, data location expression is hard coded to little endian. We are
going to support sanitizers on AIX which is big endian. Support big
endian too in the data location expression parser of llvm-symbolizer.
llvm#68501)

…ailabl externally

A workaround for llvm#60996

As the title suggested, we can avoid emitting available externally
functions which is marked as noinline already. Such functions should
contribute nothing for optimizations.

The update for docs will be sent seperately if this got approved.
1. Map R16-R31 to DWARF registers 130-145.
2. Make R16-R31 caller-saved registers.
3. Make R16-31 allocatable only when feature EGPR is supported
4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX
space, except XSAVE*/XRSTOR

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4

Explanations for some seemingly unrelated changes:

inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir:
The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is
the encoding for the register
  class in the enum generated by tablegen. This encoding will change
  any time a new register class is added. Since the number is part
  of the input, this means it can become stale.

seh-directive-errors.s:
   R16-R31 makes ".seh_pushreg 17" legal

musttail-varargs.ll:
It seems some LLVM passes use the number of registers rather the number
of allocatable registers as heuristic.
1. Use `Ext.PrimaryVT` in `PatSetCC_m `
2. Merge `PatFprFprDynFrm` from Zfh/Zhinx two locations into
`PatFprFprDynFrm_m`.
…vm#68578)

Fixes llvm#68481, In the following scenario, the conversion fails:
1. resultType of tosa.slice is UnrankedTensorType
2. tosa.slice.getsize().size() < resultType.getRank()
This restores the pre-b9383a86b8f behavior. Most platforms / compilers don't
add relocations for CIEs, however they're not prohibited and we want objects
that contain them to remain loadable.
llvm#68320)

This function has several overloads that allow to specify the symbol
that should be renamed and the scope for that renaming in different
ways. The overloads were inconsistent in the following way (quoted
strings are `StringAttr`s, other variables are `Operation *`):

* `replaceAllSymbolUses(symbolOp, "new_symbol", scopeOp)` would traverse
into the nested regions of `scopeOp` and hence rename the symbol inside
of `scopeOp`.
* `replaceAllSymbolUses("symbol", "new_symbol", scopeOp)` would *not*
traverse into the nested regions of `scopeOp` and hence *not* rename the
symbol.

The underlying behavior was spread over different places and is somewhat
hard to understand. The two overloads above mainly differed by what
`collectSymbolScopes` computed, which is itself overloaded. If `scopeOp`
is a top-level module, then the overload on `(Operation *, Operation
*)`, which is used in the first of the above cases, computes a scope
where the body region of the module is the `limit`; however, the
overload on `(StringAttr, Operation *)` computed the module op itself as
the `limit`. Later, `walkSymbolTable` would walk the body of the module
if it was given as a region but it would *not* enter the regions of the
module op because that op has a symbol table (which was assumed to be a
*different* scope).

The fix in this commit is change the behavior of `collectSymbolScopes`
such that the `(StringAttr, Operation *)` overload returns a scope for
each region in the `limit` argument.
The -fsanitize=alignment implementation follows the model that we allow
forming unaligned pointers but disallow accessing unaligned pointers.
See [RFC: Enforcing pointer type alignment in Clang](https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html)
for detail.

memcpy is a memory access and we require an `int *` argument to be aligned.
Similar to https://reviews.llvm.org/D9673 , emit -fsanitize=alignment check for
arguments of builtin memcpy and memmove functions to catch misaligned load like:
```
// Check the alignment of a but ignore the alignment of b
void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); }
```

For a reference parameter, we emit a -fsanitize=alignment check as well, which
can be optimized out by InstCombinePass. We rely on the call site
`TCK_ReferenceBinding` check instead.
```
// The alignment check of a will be optimized out.
void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); }
```

The diagnostic message looks like
```
runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *'
```

We could use a better message for memcpy, but we don't do it for now as it would
require a new check name like misaligned-pointer-use, which is probably not
necessary. *RFC: Enforcing pointer type alignment in Clang* is not well documented,
but this patch does not intend to change the that.

Technically builtin memset functions can be checked for -fsanitize=alignment as
well, but it does not seem too useful.
1. The generated file contained a lot of duplicate switch cases, e.g.:
```
switch (Syntax) {
case AttributeCommonInfo::Syntax::AS_GNU:
  return llvm::StringSwitch<int>(Name)
...
    .Case("error", 1)
    .Case("warning", 1)
    .Case("error", 1)
    .Case("warning", 1)
```

2. Some attributes were listed in wrong places, e.g.:
```
case AttributeCommonInfo::Syntax::AS_CXX11: {
if (ScopeName == "") {
  return llvm::StringSwitch<int>(Name)
...
    .Case("warn_unused_result", LangOpts.CPlusPlus11 ? 201907 : 0)
```

`warn_unused_result` is a non-standard attribute and should not be
available as [[warn_unused_result]].

3. Some attributes had the wrong version, e.g.:
```
case AttributeCommonInfo::Syntax::AS_CXX11: {
} else if (ScopeName == "gnu") {
  return llvm::StringSwitch<int>(Name)
...
    .Case("fallthrough", LangOpts.CPlusPlus11 ? 201603 : 0)
```

[[gnu::fallthrough]] is a non-standard spelling and should not have the
standard version. Instead, __has_cpp_attribute should return 1 for it.

There is another issue with attributes that share spellings, e.g.:
```
    .Case("interrupt", true && (T.getArch() == llvm::Triple::arm || ...) ? 1 : 0)
    .Case("interrupt", true && (T.getArch() == llvm::Triple::avr) ? 1 : 0)
...
    .Case("interrupt", true && (T.getArch() == llvm::Triple::riscv32 || ...) ? 1 : 0)
```
As can be seen, __has_attribute(interrupt) would only return true for
ARM targets. This patch does not address this issue.

Differential Revision: https://reviews.llvm.org/D159393
…tof` (llvm#65246)

Fixes llvm#64619

Clang warns diagnostic for non-standard layout types in `offsetof` only
if they are in evaluated context. With this patch, you'll also get
diagnostic if you use `offsetof` on non-standard layout types in any
other contexts
… on downstream projects that may define additional opcodes.
I would put this into the implementation of verify for tosa.slice
…68571)

On MinGW targets, the .ctors section is always used for constructors.

When using the .ctors section, the constructors need to be emitted in
reverse order to get them execute in the right order. (Constructors with
a specific priority are sorted separately by the linker later.) In LLVM,
in CodeGen/AsmPrinter/AsmPrinter.cpp, there's code that reverses them
before writing them out, executed when using the .ctors section. This
logic is done whenever TM.Options.UseInitArray is set to false. Thus,
make sure to set UseInitArray to false for this target.

This fixes llvm#55938.
…vm#68287)

A number of useful constants can be encoded with a 64-bit ORR followed
by a 64-bit EOR, including all remaining repeated byte patterns, some
useful repeated 16-bit patterns, and some irregular masks. This patch
prioritizes that encoding over three or four instruction encodings.
Encoding with MOV + MOVK or ORR + MOVK is still preferred for fast
literal generation and readability respectively.

The method devises three candidate values, and checks if both Candidate
and (Imm ^ Candidate) are valid logical immediates. If so, Imm is
materialized with:

```
ORR Xd, XZR, #(Imm ^ Candidate)
EOR Xd, Xd, #(Candidate)
```

The method has been exhaustively tested to ensure it can solve all
possible values (excluding 0, ~0, and plain logical immediates, which
are handled earlier).
)

This allows tooling to properly show diffs for files in the
llvm/test/tools/llvm-rc/Inputs directory.

Keep the actual icon/cursor/bitmap files marked as binary.
llvm#67791)

The primary ISA-independent justification for using PC-relative
addressing is that it makes code position-independent and therefore
allows sharing of .text pages between processes.

When not sharing .text pages, we can use absolute relocations instead,
which will possibly prevent a bubble introduced by s_getpc_b64.

Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>
llvm#68646)

Fixes:llvm#68542
It‘s meaningless to diagnose further error for invalid function
declaration.
Similar to D159254, this fixes the order of WriteAdr operands on post/pre-inc
loads/stores in the Cortex-A510 scheduling model.

I will add the same for other models too, this will be the most impactful due
to it being the default cpu scheduling model.

Closes llvm#68518
When converting to ConstantRange, we should treat undef like a full range.
Fixes llvm#68381.
These will be used in future pass to ensure that loads/stores of masks
are legal (as the LLVM backend does not support this for any type
smaller
than an svbool, which is vector<[16]xi1>).

Depends on llvm#68399
…C_CSUB instructions

The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have encodings for
non-value-returning forms, although actually using them isn't supported by
hardware. However, these encodings aren't supported by the backend, meaning
that they can't even be assembled or disassembled.

Add support for the non-returning encodings, but gate actually using them
in instruction selection behind a new feature FeatureAtomicCSubNoRtnInsts,
which no target uses. This does allow the non-returning instructions to be
tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover them.
The feature does not gate assembling or disassembling them, this is now
not an error, and encoding and decoding tests have been adapted accordingly.
stepthomas pushed a commit that referenced this pull request Oct 10, 2023
This reverts commit a1e81d2.

Revert "Fix test hip-offload-compress-zlib.hip"

This reverts commit ba01ce6.

Revert due to sanity fail at

https://lab.llvm.org/buildbot/#/builders/5/builds/37188

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0xaaaae2d90e7c: note: pointer points here
  bc 00 00 00 94 dc 29 9a  89 fb ca 2b 78 9c 8b 8f  77 f6 71 f4 73 8f f7 77  73 f3 f1 77 74 89 77 0a
              ^
    #0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25
    #1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7

Will reland after fixing it.
@stepthomas stepthomas closed this Oct 10, 2023
@stepthomas stepthomas deleted the AMDGPU-stepthomas-atomic-csub-no-rtn-forms-ver2 branch October 10, 2023 09:46
@stepthomas stepthomas restored the AMDGPU-stepthomas-atomic-csub-no-rtn-forms-ver2 branch October 10, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment