Feature/merge upstream 20210528 #59

kaz7 · 2021-07-18T23:45:34Z

Merge up to 2021/5/28.
This requires clang as compiler because of upstream modifications.
This will be solved later after merging upstream fixes.

Pass regression tests.

NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa

cxx20_iterator_traits.compile.pass.cpp actually depends on implementation details of libc++, which is not great; but I just left a comment and moved on.

…cpp. NFCI.

- Currently, the host cpu information is not easily available on z/OS as in other platforms. - This information is stored in the Communications Vector Table (https://www.ibm.com/docs/en/zos/2.2.0?topic=information-cvt-mapping) Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D102793

Currently, BPF only contains three relocations: R_BPF_NONE for no relocation R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation R_BPF_64_32 for call insn and normal 32-bit data relocation Also .BTF and .BTF.ext sections contain symbols in allocated program and data sections. These two sections reserved 32bit space to hold the offset relative to the symbol's section. When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld may attempt to resolve relocations for .BTF and .BTF.ext, which we want to prevent. So we used R_BPF_NONE for such relocations. This all works fine until when we try to do linking of multiple objects. . R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data is different, so lld target->relocate() needs more context to do a correct job. . The same for R_BPF_64_32. More context is needed for lld target->relocate() to differentiate call insn vs. normal 32-bit data relocation. . Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE, they will not be relocated properly when multiple .BTF/.BTF.ext sections are merged by lld. This patch intends to address this issue by adding additional relocation kinds: R_BPF_64_ABS64 for normal 64-bit data relocation R_BPF_64_ABS32 for normal 32-bit data relocation R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations. The old R_BPF_64_{64,32} semantics: R_BPF_64_64 for LD_imm64 relocation R_BPF_64_32 for call insn relocation The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values is maintained. They are the most common use cases for bpf programs and we want to maintain backward compatibility as much as possible. ExecutionEngine RuntimeDyld BPF relocations are adjusted as well. R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and other relocations will be ignored. Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!" fatal error. FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp are removed as they are not triggered in BPF backend. BPF backend used FK_SecRel_8 for LD_imm64 instruction operands. Differential Revision: https://reviews.llvm.org/D102712

Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103057

Said function had a few shortfalls: - didn't set an abort message on Android - was logged on several lines - didn't provide extra information like the size requested if OOM'ing This improves the function to address those points. Differential Revision: https://reviews.llvm.org/D103034

…to intptr_t A test in ir.c makes use of casting a void* to an integer type to print it's address. This cast is currently done with the datatype `long` however, which is only guaranteed to be equal to the pointer width on LP64 system. Other platforms may use a length not equal to the pointer width. 64bit Windows as an example uses 32 bit for `long` which does not match the 64 bit pointers. This also results in clang warning due to `-Wvoid-pointer-to-int-cast`. Technically speaking, since the test only passes the value 42, it does not cause any issues, but it'd be nice to fix the warning at least. Differential Revision: https://reviews.llvm.org/D103085

All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.

…s on AVX1 Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.

Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.

We are using TOCEntry symbols like `LC..0` in TOC loads, this is hard to read , at least requiring an additional step to figure out the loaded symbols. We should print out the name in comments. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D102949

Removed some of the older raw "MLIRized" versions that are no longer needed now that the sparse runtime support library can focus on the proper sparse tensor types rather than the opague pointer approach of the past. This avoids legacy... Reviewed By: penpornk Differential Revision: https://reviews.llvm.org/D102960

All callers pass "false" for the Equality parameter. Kill the dead code, and update the function block comment.

The parseInputFile function returns an empty unique_ptr to signal an error, like when the input file doesn't exist, or is malformed. In this case, the tool should exit immediately rather than segfault by dereferencing the unique_ptr later. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102891

Stylistic changes only. 1) Don't pass a parameter just to do an early exit. 2) Use a name which matches actual behavior.

This reverts commit 0bebda1. Causing "Invalid record" errors.

The 2nd test is based on the fuzzer example in post-commit comments of D101191 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661 The 1st test shows that we don't deal with this symmetrically. We should be able to reduce both examples (possibly in instsimplify instead of instcombine).

…banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail.

A recent fix for problems with ENTRY statement handling didn't get the case of a procedure dummy argument on an ENTRY statement in an executable part right; the code presumed that those dummy arguments would be objects, not entities that might be objects or procedures. Fix. Differential Revision: https://reviews.llvm.org/D103098

llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance. We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen. Redundant switch csprof-cold-thres is also removed and tests cleaned up. Differential Revision: https://reviews.llvm.org/D103071

Update the paragraph on generic / indexed_generic to reflect the unification of these operations. Differential Revision: https://reviews.llvm.org/D102775

Make sure that if SCUDO_DEBUG=1 in tests then we had the same in the scudo library itself. Reviewed By: cryptoad, hctim Differential Revision: https://reviews.llvm.org/D103061

Cast of signed types to u64 breaks comparison. Also remove double () around operands. Reviewed By: cryptoad, hctim Differential Revision: https://reviews.llvm.org/D103060

…ly infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: AliveToolkit/alive2#726

Differential Revision: https://reviews.llvm.org/D103104

Now that we can fold some transposes into multiplies (CM: A * B^t and RM: A^t * B), we want to move them around to create the optimal expressions: * fold away double transposes while still using them to assert the shape * sink transposes hoping they cancel out * lift transposes when both operands are transposed This also modifies the matrix remarks to include the number of exposed transposes (i.e. transposes that we couldn't fold into a multiply). The adjustment to the test remarks-inlining is a bit subtle: I am changing the double transpose to a single transpose so that we don't remove it completely. More importantly this changes some of the total instruction count, most notable stores because we can no longer use a vector store. Differential Revision: https://reviews.llvm.org/D102733

This patch is the third in a series of patches fixing markdown links and references inside the mlir documentation. This patch addresses all broken references to other markdown files and sections inside the Tutorials folder. Differential Revision: https://reviews.llvm.org/D103017

@aeubanks

-enable-matrix just adds a single pass, so it's easier to just check in new-pm-default.ll rather than duplicating the full checks for -O3 with the new pass manager. Suggested post-commit by @aeubanks.

Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D103154

... and ClanfFormatStyleOptions.rst for EmptyLineAfterAccessModifier Differential-Revision: https://reviews.llvm.org/D102989

When lowering the dynamic, guided, auto and runtime types of scheduling, there is an optional monotonic or non-monotonic modifier. This patch adds support in the OMP IR Builder to pass this down to the runtime functions. Also implements tests for the variants. Differential Revision: https://reviews.llvm.org/D102008

This struct was used to specify the device on which memory was being allocated/free in atmi_malloc/free. It has now been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103239

This reverts commit ea4c5fb.

Differential Revision: https://reviews.llvm.org/D102829

…(5/n) This revision refactors and simplifies the pattern detection logic: thanks to SSA value properties, we can actually look at all the uses of a given value and avoid having to pattern-match specific chains of operations. A bufferization pattern for subtensor is added and specific inplaceability analysis is implemented for the simple case of subtensor. More advanced use cases will follow. Differential revision: https://reviews.llvm.org/D102512

WG14 adopted N2645 and WG21 EWG has accepted P2334 in principle (still subject to full EWG vote + CWG review + plenary vote), which add support for #elifdef as shorthand for #elif defined and #elifndef as shorthand for #elif !defined. This patch adds support for the new preprocessor directives.

For uniform ReplicateRecipes, only the first lane should be used, so sinking them would mean we have to compute the value of the first lane multiple times. Also, at the moment, sinking them causes a crash because the value of the first lane is re-used by all users. Reported post-commit for D100258.

The vector calling convention dictates that when the vector argument registers are exhaused, GPRs are used to pass the address via the stack. When the GPRs themselves are exhausted, at best we would previously crash with an assertion, and at worst we'd generate incorrect code. This patch addresses this issue by passing fixed-length vectors via the stack with their full fixed-length size and aligned to their element type size. Since the calling convention lowering can't yet handle scalable vector types, this patch adds a fatal error to make it clear that we are lacking in this regard. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102422

DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told whether it's to use vector types and also whether it's to issue a truncating store. However, the truncating store code path assumes a scalar integer `ConstantSDNode`, and when using vector types it creates either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which is a constant. The `riscv64` target is able to expose a crash here because it switches on both code paths at the same time. The `f32` is stored as `i32` which must be promoted to `i64`, necessitating a truncating store. It also decides later that it prefers a vector store of `v2f32`. While vector truncating stores are legal, this combine is not able to emit them. We also don't have a test case. This patch adds an assert to catch this case more gracefully, and updates one of the caller functions to the function to turn off the use of truncating stores when preferring vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103173

We were accidentally leaning on code in lowerLoad which expands extending loads which should be removed.

The original version of this was reverted, and @rjmcall provided some advice to architect a new solution. This is that solution. This implements a builtin to provide a unique name that is stable across compilations of this TU for the purposes of implementing the library component of the unnamed kernel feature of SYCL. It does this by running the Itanium mangler with a few modifications. Because it is somewhat common to wrap non-kernel-related lambdas in macros that aren't present on the device (such as for logging), this uniquely generates an ID for all lambdas involved in the naming of a kernel. It uses the lambda-mangling number to do this, except replaces this with its own number (starting at 10000 for readabililty reasons) for lambdas used to name a kernel. Additionally, this implements itself as constexpr with a slight catch: if a name would be invalidated by the use of this lambda in a later kernel invocation, it is diagnosed as an error (see the Sema tests). Differential Revision: https://reviews.llvm.org/D103112

Summary: Make the file name and descriptors static so that they are reused by print-changed=diff. This avoids errors about being unable to create temporary files when doing the later comparisons in a large compile. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D100116

It should technically be a 1, since we are only setting the first bit.

This was broken several days ago in 8269057.

When lowering the dynamic, guided, auto and runtime types of scheduling, there is an optional monotonic or non-monotonic modifier. This patch adds support in the OMP IR Builder to pass this down to the runtime functions. Also implements tests for the variants. Differential Revision: https://reviews.llvm.org/D102008

…merge-upstream-20210528

kaz7 · 2021-07-19T23:04:11Z

FYI, gcc-11 works fine with this version.
For gcc-9, cherry-picking 886e291 is required.

…… (#67069) We noticed some performance issue while in lldb-vscode for grabing the name of the SBValue. Profiling shows SBValue::GetName() can cause synthetic children provider of shared/unique_ptr to deference underlying object and complete it type. This patch lazily moves the dereference from synthetic child provider's Update() method to GetChildAtIndex() so that SBValue::GetName() won't trigger the slow code path. Here is the culprit slow code path: ``` ... frame #59: 0x00007ff4102e0660 liblldb.so.15`SymbolFileDWARF::CompleteType(this=<unavailable>, compiler_type=0x00007ffdd9829450) at SymbolFileDWARF.cpp:1567:25 [opt] ... frame #67: 0x00007ff40fdf9bd4 liblldb.so.15`lldb_private::ValueObject::Dereference(this=0x0000022bb5dfe980, error=0x00007ffdd9829970) at ValueObject.cpp:2672:41 [opt] frame #68: 0x00007ff41011bb0a liblldb.so.15`(anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::Update(this=0x000002298fb94380) at LibStdcpp.cpp:403:40 [opt] frame #69: 0x00007ff41011af9a liblldb.so.15`lldb_private::formatters::LibStdcppSharedPtrSyntheticFrontEndCreator(lldb_private::CXXSyntheticChildren*, std::shared_ptr<lldb_private::ValueObject>) [inlined] (anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::LibStdcppSharedPtrSyntheticFrontEnd(this=0x000002298fb94380, valobj_sp=<unavailable>) at LibStdcpp.cpp:371:5 [opt] ... frame #78: 0x00007ff40fdf6e42 liblldb.so.15`lldb_private::ValueObject::CalculateSyntheticValue(this=0x000002296c66a500) at ValueObject.cpp:1836:27 [opt] frame #79: 0x00007ff40fdf1939 liblldb.so.15`lldb_private::ValueObject::GetSyntheticValue(this=<unavailable>) at ValueObject.cpp:1867:3 [opt] frame #80: 0x00007ff40fc89008 liblldb.so.15`ValueImpl::GetSP(this=0x0000022c71b90de0, stop_locker=0x00007ffdd9829d00, lock=0x00007ffdd9829d08, error=0x00007ffdd9829d18) at SBValue.cpp:141:46 [opt] frame #81: 0x00007ff40fc7d82a liblldb.so.15`lldb::SBValue::GetSP(ValueLocker&) const [inlined] ValueLocker::GetLockedSP(this=0x00007ffdd9829d00, in_value=<unavailable>) at SBValue.cpp:208:21 [opt] frame #82: 0x00007ff40fc7d817 liblldb.so.15`lldb::SBValue::GetSP(this=0x00007ffdd9829d90, locker=0x00007ffdd9829d00) const at SBValue.cpp:1047:17 [opt] frame #83: 0x00007ff40fc7da6f liblldb.so.15`lldb::SBValue::GetName(this=0x00007ffdd9829d90) at SBValue.cpp:294:32 [opt] ... ``` Differential Revision: https://reviews.llvm.org/D159542

Sisyph and others added 30 commits May 25, 2021 10:58

[AMDGPU] Allow no-modifier operands in cvtDPP

b67ea3d

NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa

[libc++] [test] Format some C++20 iterator_traits tests. NFCI.

148c19a

cxx20_iterator_traits.compile.pass.cpp actually depends on implementation details of libc++, which is not great; but I just left a comment and moved on.

[libc++] [test] Make iter_difference_t.pass.cpp into a .compile.pass.…

bb523cc

…cpp. NFCI.

[SystemZ] Return true from preferZeroCompareBranch().

e77cb4a

Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103057

[VectorCombine] Remove unneeded InsertPointGuard (NFCI).

8e83ff5

All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.

[CostModel][X86] Improve accuracy of 256-bit non-uniform vector shift…

def6269

…s on AVX1 Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.

[libc++] Try to fix the oss-fuzz failure

d95a4b9

[SCEV] Remove unused parameter from computeBECount [NFC]

a47b2d4

All callers pass "false" for the Equality parameter. Kill the dead code, and update the function block comment.

[SCEV] Cleanup doesIVOverflowOnX checks [NFC]

aabca2d

Stylistic changes only. 1) Don't pass a parameter just to do an early exit. 2) Use a name which matches actual behavior.

Revert "[OpaquePtr] Make atomicrmw work with opaque pointers"

0bbb502

This reverts commit 0bebda1. Causing "Invalid record" errors.

[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg …

18c5444

…banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail.

[mlir][linalg] Update Linalg.md (NFC).

6779fcb

Update the paragraph on generic / indexed_generic to reflect the unification of these operations. Differential Revision: https://reviews.llvm.org/D102775

[Hexagon] Improve argument packing in vector shuffle selection

e7c839b

[scudo] Consistent setting of SCUDO_DEBUG

6a84d37

Make sure that if SCUDO_DEBUG=1 in tests then we had the same in the scudo library itself. Reviewed By: cryptoad, hctim Differential Revision: https://reviews.llvm.org/D103061

[scudo] Fix CHECK implementation

8e30b55

Cast of signed types to u64 breaks comparison. Also remove double () around operands. Reviewed By: cryptoad, hctim Differential Revision: https://reviews.llvm.org/D103060

[mlir] Add an optional distributionTypes attribute to TiledLoopOp.

2ea6e13

Differential Revision: https://reviews.llvm.org/D103104

fhahn and others added 26 commits May 27, 2021 10:57

[Matrix] Include matrix pipeline for new PM in new-pm-defaults.ll.

9a4506e

-enable-matrix just adds a single pass, so it's easier to just check in new-pm-default.ll rather than duplicating the full checks for -O3 with the new pass manager. Suggested post-commit by @aeubanks.

[lit][test] Improve testing of use_llvm_tool

2ae5843

Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D103154

Add triples to a bunch of x86-specific tests that currently fail on PPC

1546c52

[clang-format] [NFC] realign documentation in Format.h...

7faffde

... and ClanfFormatStyleOptions.rst for EmptyLineAfterAccessModifier Differential-Revision: https://reviews.llvm.org/D102989

[ARM] Extra test for reverted WLS memset. NFC

1d5b976

[AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t

8b79dfb

This struct was used to specify the device on which memory was being allocated/free in atmi_malloc/free. It has now been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103239

Revert "[OpenMP]Add support for workshare loop modifier in lowering"

86627be

This reverts commit ea4c5fb.

Add --quiet option to llvm-gsymutil to suppress output of warnings.

5f2d4b2

Differential Revision: https://reviews.llvm.org/D102829

Fix -Wswitch warning; NFC

ce276b7

AMDGPU/GlobalISel: Remove redundant parameter from function

8a203ac

AMDGPU/GlobalISel: Lower constant-32-bit zextload/sextload consistently

ee35900

We were accidentally leaning on code in lowerLoad which expands extending loads which should be removed.

Speculatively fix a -Woverloaded-virtual diagnostic; NFC

758f51c

Speculatively fix this harder and with improved spelling capabilities.

caf86d2

Correct the 'KEYALL' mask.

023fbf3

It should technically be a 1, since we are only setting the first bit.

Hopefully fix the Clang sphinx doc build.

96ef4f4

This was broken several days ago in 8269057.

Merge commit '66963bf3819df4f47bd874a946af058f0c1c4ec0' into develop

7754372

Merge commit '9091ecdae0290d8c425d48a2c86bbdd4876d6507' into feature/…

4813124

…merge-upstream-20210528

kaz7 merged commit 838573d into develop Jul 18, 2021

kaz7 deleted the feature/merge-upstream-20210528 branch July 18, 2021 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/merge upstream 20210528 #59

Feature/merge upstream 20210528 #59

kaz7 commented Jul 18, 2021

kaz7 commented Jul 19, 2021

Feature/merge upstream 20210528 #59

Feature/merge upstream 20210528 #59

Conversation

kaz7 commented Jul 18, 2021

kaz7 commented Jul 19, 2021