LLVM and SPIRV-LLVM-Translator pulldown (WW18) #3616

vmaksimo · 2021-04-26T09:35:20Z

LLVM: llvm/llvm-project@fcb45b5
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@a9d3819

It failed in https://lab.llvm.org/buildbot/#/builders/68/builds/10912 And it was caused due to https://reviews.llvm.org/rG64f47c1e58a1

This change adds debug information about whether PGO is being used or not. Microsoft performance tooling (e.g. xperf, WPA) uses this information to show whether functions are optimized with PGO or not, as well as whether PGO information is invalid. This information is useful for validating whether training scenarios are providing good coverage of real world scenarios, showing if profile data is out of date, etc. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99994

The new layout more closely matches the layout used by other compilers. This is only used when LLVM_ENABLE_PER_TARGET_RUNTIME_DIR is enabled. Differential Revision: https://reviews.llvm.org/D100869

Found by https://lab.llvm.org/buildbot/#/builders/96/builds/6936

…table if -fasynchronous-unwind-tables On ELF targets, if a function has uwtable or personality, or does not have nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module. Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified. (i.e. If -g[123], every function gets `.eh_frame`. This behavior is strange but that is the status quo on GCC and Clang.) Let's take asan as an example. Other sanitizers are similar. `asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true, so every function gets `.eh_frame` if `-g[123]` is specified. This is the root cause that `-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame while `-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame. This patch * sets the nounwind attribute on sanitizer module ctor/dtor. * let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute. The "uwtable" mechanism is generic: synthesized functions not cloned/specialized from existing ones should consider `Function::createWithDefaultAttr` instead of `Function::create` if they want to get some default attributes which have more of module semantics. Other candidates: "frame-pointer" (ClangBuiltLinux/linux#955 ClangBuiltLinux/linux#1238), dso_local, etc. Differential Revision: https://reviews.llvm.org/D100251

Example: ``` %0 = linalg.init_tensor : tensor<...> %1 = linalg.generic ... outs(%0: tensor<...>) %2 = linalg.generic ... outs(%0: tensor<...>) ``` Memref allocated as a result of `init_tensor` bufferization can be incorrectly overwritten by the second linalg.generic operation Reviewed By: silvas Differential Revision: https://reviews.llvm.org/D100921

__hwasan_tag_memory expects untagged pointers, so make sure our pointer is untagged.

…real samples. Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported. This patch also fixes two existing issues. Pseudo probes are stored in `Address2ProbesMap` and their pointers are used in `PseudoProbeInlineTree`. Previously `std::vector` was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing `std::vector` to `std::list` instead. The other issue is that all outlined functions shared the same inline frame previously due to the unchanged `Index` value as the dummy inlineSite identifier. Good results seen for SPEC2017 in general regarding profile quality. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D100235

Each of the cases marked as legal here have an imported pattern in AArch64GenGlobalISel.inc. So, if we mark them as legal, we get selection for free. Technically this is only supposed to happen if we have NEON support. But, we fall back if we don't have that in the legalizer right now. I suppose it'd be better to have a FIXME so we can write the testcase when the time comes. (Plus, it'd just fall back in selection if NEON isn't available, so it's not *wrong*, I guess?) This fixes some fallbacks in the test suite. (Also use `isScalar` from LegalityPredicates.cpp while we're here just to tidy things a little bit.) Differential Revision: https://reviews.llvm.org/D100916

…ations These should always go to a FPR, since they always use the vector registers. Differential Revision: https://reviews.llvm.org/D100885

They are unused now. Note: NaCl is still used and is currently expected to be needed until 2022-06 (https://blog.chromium.org/2020/08/changes-to-chrome-app-support-timeline.html). Differential Revision: https://reviews.llvm.org/D100981

This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976

Added the float lowerings for avg pool with corresponding tests. Differential Revision: https://reviews.llvm.org/D100793

Ever since Dave Zarzycki's patch to sort test start times based on prior test timing data (https://reviews.llvm.org/D98179) the test suite aborts with a SIGHUP. I don't believe his patch is to blame, but rather uncovers an preexisting issue by making test runs more deterministic. I was able to narrow down the issue to TestSimulatorPlatform.py. The issue also manifests itself on the standalone bot on GreenDragon [1]. This patch disables the test until we can figure this out. [1] http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-standalone/ rdar://76995109

Don't phrase the semantics in terms of the optimizer. Instead have a more straightforward execution based semantic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D63439

…one coro.destroy Summary: The original logic seems to be we could collecting a CoroBegin if one of the terminators could be dominated by one of coro.destroy, which doesn't make sense. This patch rewrites the logics to collect CoroBegin if all of terminators are dominated by one coro.destroy. If there is no such coro.destroy, we would call hasEscapePath to evaluate if we should collect it. Test Plan: check-llvm Reviewed by: lxfind Differential Revision: https://reviews.llvm.org/D100614

…n extended from Lo. This recognizes the case when Hi is (sra Lo, 31). We can use SPLAT_VECTOR_I64 rather than splatting the high bits and combining them in the vector register.

This help expose more fusion opportunities. Differential Revision: https://reviews.llvm.org/D100685

Differential Revision: https://reviews.llvm.org/D90854

This patch implements -gstrict-dwarf option in clang FE. Reviewed By: dblaikie, probinson, aprantl Differential Revision: https://reviews.llvm.org/D100809

Since we already have a tagged pointer available to us, we can just extract the tag from it and avoid an LDG instruction. Differential Revision: https://reviews.llvm.org/D101014

The #pragma clang section can be used at a coarse granularity to specify the section used for bss/data/text/rodata for global objects. When split functions is enabled, the function may be split into two parts violating user expectations. Reference: https://clang.llvm.org/docs/LanguageExtensions.html#specifying-section-names-for-global-objects-pragma-clang-section Differential Revision: https://reviews.llvm.org/D101004

This patch adds new clang tool named amdgpu-arch which uses HSA to detect installed AMDGPU and report back latter's march. This tool is built only if system has HSA installed. The value printed by amdgpu-arch is used to fill -march when latter is not explicitly provided in -Xopenmp-target. Reviewed By: JonChesterfield, gregrodgers Differential Revision: https://reviews.llvm.org/D99949

This patch allows PRE of the following type of loads: ``` preheader: br label %loop loop: br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p br label %merge merge: ... br i1 ..., label %loop, label %exit ``` Into ``` preheader: %x0 = load %p br label %loop loop: %x.pre = phi(x0, x2) br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p %x1 = load %p br label %merge merge: x2 = phi(x.pre, x1) ... br i1 ..., label %loop, label %exit ``` So instead of loading from %p on every iteration, we load only when the actual clobber happens. The typical pattern which it is trying to address is: hot loop, with all code inlined and provably having no side effects, and some side-effecting calls on cold path. The worst overhead from it is, if we always take clobber block, we make 1 more load overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken at least once, the transform is neutral or profitable. There are several improvements prospect open up: - We can sometimes be smarter in loop-exiting blocks via split of critical edges; - If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header. Differential Revision: https://reviews.llvm.org/D99926 Reviewed By: reames

The value is always an immediate and can never be in a register. This the kind of thing TargetConstant is for. Saves a step GenDAGISel to convert a Constant to a TargetConstant.

We shouldn't print IR when seeing these passes.

…ument clang-tidy should not generate warnings for the goto argument without parentheses, because it would be a syntax error. The only valid case where an argument can be enclosed in parentheses is "Labels as Values" gcc extension: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html. This commit adds support for the label-as-values extension as implemented in clang. Fixes bugzilla issue 49634. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D99924

The straightforward `AddLinkFlag('-lc++experimental')` approach doesn't work on e.g. MSVC. For linking to libc++ itself, a more convoluted logic is used (see configure_link_flags_cxx_library). Differential Revision: https://reviews.llvm.org/D99177

vmaksimo · 2021-04-27T13:39:59Z

Actually, there was a bunch of changes to SLP that are included in this PR, so it's hard to say which was the guilty one:

| | * 18c61fc498c7 2021-04-22 | [SLP]Skip undefs trying to find perfect/shuffled tree entries matching. [Alexey Bataev]
| | * d4f5f23bbbe5 2021-04-22 | [SLP]Replace more `TTI` with `TTIRef`, NFC. [Alexey Bataev]
| | * da2cdfd4211a 2021-04-22 | [SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC. [Alexey Bataev]
| | * e99b98cb1bca 2021-04-06 | [SLP]Improve cost model for the vectorized extractelements. [Alexey Bataev]
| | * 07c236f3c3fa 2021-04-21 | [SLP]Add a test with broadcast shuffle kind in SLP, NFC. [Alexey Bataev]
| | * | | af870e11aed7 2021-04-20 | [SLP] Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | b82344a01949 2021-04-20 | Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." [Alexey Bataev]
| | * | | daf6e18c55c2 2021-04-20 | [SLP] Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | cf00cb8bed72 2021-04-20 | Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." [Alexey Bataev]
| | * | | b232771acad6 2021-04-20 | [SLP] Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | 803048106533 2021-04-19 | Revert "[SLP]Add detection of shuffled/perfect matching of tree entries." [Alexey Bataev]
| | * | | d6fde913790d 2021-04-14 | [SLP]Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | | | | | 72142b909d63 2021-04-14 | [SLP]Added a tests for shuffled matched tree entries, NFC. [Alexey Bataev]
| | * | | | | | | b49c41afbaa2 2021-04-14 | [SLP] createOp - fix null dereference warning. NFCI. [Simon Pilgrim]

bader · 2021-04-27T13:46:39Z

Actually, there was a bunch of changes to SLP that are included in this PR, so it's hard to say which was the guilty one:

| | * 18c61fc498c7 2021-04-22 | [SLP]Skip undefs trying to find perfect/shuffled tree entries matching. [Alexey Bataev]
| | * d4f5f23bbbe5 2021-04-22 | [SLP]Replace more `TTI` with `TTIRef`, NFC. [Alexey Bataev]
| | * da2cdfd4211a 2021-04-22 | [SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC. [Alexey Bataev]
| | * e99b98cb1bca 2021-04-06 | [SLP]Improve cost model for the vectorized extractelements. [Alexey Bataev]
| | * 07c236f3c3fa 2021-04-21 | [SLP]Add a test with broadcast shuffle kind in SLP, NFC. [Alexey Bataev]
| | * | | af870e11aed7 2021-04-20 | [SLP] Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | b82344a01949 2021-04-20 | Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." [Alexey Bataev]
| | * | | daf6e18c55c2 2021-04-20 | [SLP] Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | cf00cb8bed72 2021-04-20 | Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." [Alexey Bataev]
| | * | | b232771acad6 2021-04-20 | [SLP] Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | 803048106533 2021-04-19 | Revert "[SLP]Add detection of shuffled/perfect matching of tree entries." [Alexey Bataev]
| | * | | d6fde913790d 2021-04-14 | [SLP]Add detection of shuffled/perfect matching of tree entries. [Alexey Bataev]
| | * | | | | | | 72142b909d63 2021-04-14 | [SLP]Added a tests for shuffled matched tree entries, NFC. [Alexey Bataev]
| | * | | | | | | b49c41afbaa2 2021-04-14 | [SLP] createOp - fix null dereference warning. NFCI. [Simon Pilgrim]

Based on this list, @alexey-bataev might be the right person to investigate this crash.

vmaksimo · 2021-04-29T09:12:47Z

/summary:run

llvm/test/Transforms/InstCombine/trunc-extractelement-spir.ll

bader · 2021-04-29T17:01:47Z

llvm/test/Transforms/OpenMP/hide_mem_transfer_latency.ll

@@ -1,5 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --function-signature --scrub-attributes
-; RUN: opt -S -openmp-opt-cgscc -aa-pipeline=basic-aa -openmp-hide-memory-transfer-latency < %s | FileCheck %s
+; Pass is not designed for the old pass manager -- CallGraph is not updated.


IIRC, @AlexeySachkov, created an issue about enabling new PM. Can you add a link to the issue here and in llvm/test/Transforms/OpenMP/values_in_offload_arrays.ll test, please?

@bader I can't find this issue, unfortunately.
I hope these comments will help to fix test failures once we enable the new PM.

@AlexeySachkov, how do we track changes we did to llorg sources to disable NewPM?

IIRC, @AlexeySachkov, created an issue about enabling new PM.

I didn't create such issue

bader · 2021-04-29T17:03:10Z

llvm/test/Transforms/FunctionAttrs/2008-09-03-ReadOnly.ll

@@ -1,5 +1,7 @@
-; RUN: opt < %s -basic-aa -function-attrs -S | FileCheck %s
-; RUN: opt < %s -aa-pipeline=basic-aa -passes=function-attrs -S | FileCheck %s
+; First test run uses module pass that is not designed for the old pass manager.


vmaksimo · 2021-04-30T17:01:32Z

/summary:run

After intel/llvm#3616

vladimirlaz · 2021-05-04T07:28:39Z

@vmaksimo, the fix for llvm-test-suite on CUDA is available under intel/llvm-test-suite#264
It will be submitted together with the current PR

After intel/llvm#3616

After intel#3616

walter-erquinigo and others added 30 commits April 21, 2021 15:20

Fix TestVSCode_runInTerminal

c4a83c4

It failed in https://lab.llvm.org/buildbot/#/builders/68/builds/10912 And it was caused due to https://reviews.llvm.org/rG64f47c1e58a1

[libcxx] Stop using use c++ subdirectory for libc++ library

f749550

The new layout more closely matches the layout used by other compilers. This is only used when LLVM_ENABLE_PER_TARGET_RUNTIME_DIR is enabled. Differential Revision: https://reviews.llvm.org/D100869

Fix VSCode/TestOptions.test

875654f

Found by https://lab.llvm.org/buildbot/#/builders/96/builds/6936

[IR] Add doc about Function::createWithDefaultAttr. NFC

ac30379

[HWASan] Untag argument to __hwasan_tag_memory.

3511022

__hwasan_tag_memory expects untagged pointers, so make sure our pointer is untagged.

[AArch64][GlobalISel] Fix regbankselect for G_FCMP with vector destin…

3011aa1

…ations These should always go to a FPR, since they always use the vector registers. Differential Revision: https://reviews.llvm.org/D100885

Delete le32/le64 targets

77ac823

They are unused now. Note: NaCl is still used and is currently expected to be needed until 2022-06 (https://blog.chromium.org/2020/08/changes-to-chrome-app-support-timeline.html). Differential Revision: https://reviews.llvm.org/D100981

AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies

987e528

[mlir][tosa] Add tosa.avg_pool2d lowering

648dfdf

Added the float lowerings for avg pool with corresponding tests. Differential Revision: https://reviews.llvm.org/D100793

Wordsmith the semantics of invariant.load

b9e9e2e

Don't phrase the semantics in terms of the optimizer. Instead have a more straightforward execution based semantic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D63439

[RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sig…

f6d8cf7

…n extended from Lo. This recognizes the case when Hi is (sra Lo, 31). We can use SPLAT_VECTOR_I64 rather than splatting the high bits and combining them in the vector register.

[mlir][linalg] Add pattern to push reshape after elementwise operation

d40a19c

This help expose more fusion opportunities. Differential Revision: https://reviews.llvm.org/D100685

[RISCV] Custom lowering of FLT_ROUNDS_

6e63dfd

Differential Revision: https://reviews.llvm.org/D90854

[Debug-Info] implement -gstrict-dwarf

26f138e

This patch implements -gstrict-dwarf option in clang FE. Reviewed By: dblaikie, probinson, aprantl Differential Revision: https://reviews.llvm.org/D100809

scudo: Obtain tag from pointer instead of loading it from memory. NFCI.

e4fa0b3

Since we already have a tagged pointer available to us, we can just extract the tag from it and avoid an LDG instruction. Differential Revision: https://reviews.llvm.org/D101014

[RISCV] Use TargetConstant for condition code of RISCVISD::SELECT_CC.

58c5b4c

The value is always an immediate and can never be in a register. This the kind of thing TargetConstant is for. Saves a step GenDAGISel to convert a Constant to a TargetConstant.

[NewPM] Mark some more wrapper passes as ignored

1dfb52a

We shouldn't print IR when seeing these passes.

vmaksimo and others added 3 commits April 27, 2021 20:42

Disable test cases which check NewPM pass (OpenMPOpt)

89c9b6d

[SLP] Fix the cost calculation if non-compatible vectors shuffled

007575b

Change test which checks NewPM module pass (function-attrs)

d4627f2

vmaksimo force-pushed the llvmspirv_pulldown branch from 5b82bd5 to d4627f2 Compare April 29, 2021 09:11

vmaksimo marked this pull request as ready for review April 29, 2021 14:11

vmaksimo requested review from AaronBallman, AGindinson, AlexeySachkov, AlexeySotkin, bader, elizabethandrews, kbobrovs, mdtoguchi, mlychkov, premanandrao and sndmitriev as code owners April 29, 2021 14:11

bader reviewed Apr 29, 2021

View reviewed changes

llvm/test/Transforms/InstCombine/trunc-extractelement-spir.ll Outdated Show resolved Hide resolved

bader reviewed Apr 29, 2021

View reviewed changes

vladimirlaz added a commit to vladimirlaz/llvm-test-suite that referenced this pull request May 4, 2021

[SYCL] Remove XFAIL for passing test

7455f63

After intel/llvm#3616

This was referenced May 4, 2021

[SYCL] Remove XFAIL for passing test vladimirlaz/llvm-test-suite#2

Closed

[SYCL] Remove XFAIL for passing test intel/llvm-test-suite#264

Merged

vladimirlaz merged commit 2e65058 into intel:sycl May 4, 2021

vladimirlaz added a commit to intel/llvm-test-suite that referenced this pull request May 4, 2021

[SYCL] Remove XFAIL for passing test (#264)

2e66331

After intel/llvm#3616

aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023

[SYCL] Remove XFAIL for passing test (intel/llvm-test-suite#264)

59c65e7

After intel#3616

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM and SPIRV-LLVM-Translator pulldown (WW18) #3616

LLVM and SPIRV-LLVM-Translator pulldown (WW18) #3616

vmaksimo commented Apr 26, 2021

vmaksimo commented Apr 27, 2021

bader commented Apr 27, 2021

vmaksimo commented Apr 29, 2021

bader Apr 29, 2021

vmaksimo Apr 30, 2021

bader Apr 30, 2021

AlexeySachkov Apr 30, 2021

bader Apr 29, 2021

vmaksimo commented Apr 30, 2021

vladimirlaz commented May 4, 2021

LLVM and SPIRV-LLVM-Translator pulldown (WW18) #3616

LLVM and SPIRV-LLVM-Translator pulldown (WW18) #3616

Conversation

vmaksimo commented Apr 26, 2021

vmaksimo commented Apr 27, 2021

bader commented Apr 27, 2021

vmaksimo commented Apr 29, 2021

bader Apr 29, 2021

Choose a reason for hiding this comment

vmaksimo Apr 30, 2021

Choose a reason for hiding this comment

bader Apr 30, 2021

Choose a reason for hiding this comment

AlexeySachkov Apr 30, 2021

Choose a reason for hiding this comment

bader Apr 29, 2021

Choose a reason for hiding this comment

vmaksimo commented Apr 30, 2021

vladimirlaz commented May 4, 2021