Feature/merge upstream 20210929 #96

kaz7 · 2021-10-15T10:39:03Z

Merge upstream/main to 4da744a.

The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1j3nf3dro - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/4n1zvP37j - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110504

The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/e5YE99a4P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `6`. For store we have: https://godbolt.org/z/3vM4KsE1n - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `3`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110505

The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.5` So pick cost of `9`. For store we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110506

The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/q6GbK89br - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `18`. For store we have: https://godbolt.org/z/Yzfoo5TnW - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110507

When rebase_exec=true in DidAttach(), all modules are loaded before the rendezvous breakpoint is set, which means the LoadInterpreterModule() method is not called and m_interpreter_module is not initialized. This causes the very first rendezvous breakpoint hit with m_initial_modules_added=false to accidentally unload the module_sp that corresponds to the dynamic loader. This bug (introduced in D92187) was causing the rendezvous mechanism to not work in Android 28. The mechanism works fine on older/newer versions of Android. Test: Verified rendezvous on Android 28 and 29 Test: Added dlopen test Reviewed By: labath Differential Revision: https://reviews.llvm.org/D109797

Let the calling pass or pattern replace the uses of the original root operation. Internally, the tileAndFuse still replaces uses and updates operands but only of newly created operations. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D110169

…iceRTL. Use the in-project clang, llvm-link and opt if available and unless CMake cache variables specify to use a different compiler. This applies D101265 to the new DeviceRTL's CMakeLists.txt which was copied before D101265 was applied. Fixes the openmp-offloading-cuda-runtime builder which was failing since D110006. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110251

Refactor the XML converting attribute and text getters to use LLVM API. While at it, remove some redundant error and missing XML support handling, as the called base functions do that anyway. Add tests for these methods. Note that this patch changes the getter behavior to be IMHO more correct. In particular: - negative and overflowing integers are now reported as failures to convert, rather than being wrapped over or capped - digits followed by text are now reported as failures to convert to double, rather than their numeric part being converted Differential Revision: https://reviews.llvm.org/D110410

Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513

The StringConvert API is no longer used anywhere but in debugserver. Since debugserver does not use LLVM API, we cannot replace it with llvm::to_integer() and llvm::to_float() there. Let's just move the sources into debugserver. Differential Revision: https://reviews.llvm.org/D110478

…en switch cover all possible values.""" This reverts commit 8ba2adc.

… command guide This change is to add some missing details to the help text and command guide: - Added a note to the command guide that --debug-macro also dumps .debug_macinfo. - Added a note to the command guide that --debug-frame and --eh_frame are aliases, and in cases where both sections are present one command outputs both. - Changed the wording in the help output for --ignore-case and --regex to closer match the command guide.

The argument is always used with its default value, so remove the argument entirely.

Apparently I gave a ll file a .patch extension. Oops.

Function specialization was crashing on poison values and constexpr values. The problem is that these values are not added to the solver, so it crashes when a lookup is performed for these values. This fixes that by not specialising on these values. For poison that is obvious, but for constexpr this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but specialising on constexpr values wasn't done very intentionally, and need some more work and tests if we wanted to support this. As a follow up, we need to look if the solver should exit more gracefully and return a "don't know", or that it should really support these constexprs. This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600). Differential Revision: https://reviews.llvm.org/D110529

…egisters Add a convenience method to add supplementary registers that takes care of adding invalidate_regs to all (potentially) overlapping registers. Differential Revision: https://reviews.llvm.org/D110023

Use MemCount instead of hard-coded value 7. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110532

We used to put the canonical spelling of flags after alias processing on that line. For clang-cl in particular, that meant that we put flags on that line that the clang-cl driver doesn't even accept, and the "Driver args:" line wasn't usable. Differential Revision: https://reviews.llvm.org/D110458

KILL instructions are sometimes present and prevented hard clauses from being formed. Fix this by ignoring all meta instructions in clauses. Differential Revision: https://reviews.llvm.org/D106042

Currently detection of races with TLS/stack initialization is broken because we imitate the write before thread initialization, so it's modelled with a wrong thread/epoch. Fix that and add a test. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110538

Depends on D110538. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110539

The trace tests crashed on darwin because of some thread initialization issues (thread initialization is somewhat different on darwin). Instead of starting real threads, create a new ThreadState in the main thread. This makes the tests more unit-testy and hopefully won't crash on darwin (there is almost no platform-specific code involved now). This will also help with future trace tests that will need more than 1 thread. Creating more than 1 real thread and dispatching test actions across multiple threads in the required deterministic order is painful. Depends on D110539. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110546

Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898 Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D110550

…atabase.c test case. It appears that this test assumes that the toolchain utilizes the integrated assembler by default, since the expected output in the CHECKs are compilation_database.o. However, this test fails on AIX as AIX does not utilize the integrated assembler. On AIX, the output instead is of the form /tmp/compilation_database-*.s. Thus, this patch explicitly adds the -fintegrated-as option to match the assumption that the integrated assembler is used by default. Differential Revision: https://reviews.llvm.org/D110431

Similar to: 29c09c7 Planned follow-up is to add a transform here to allow removing a common shift fold that is conflicting with D110170.

@src

This is another step towards trying to re-apply D110170 by eliminating conflicting transforms that cause infinite loops. a47c8e4 was a previous patch in this direction. The diffs here are mostly cosmetic, but intentional: 1. The existing code that would handle this pattern in FoldShiftByConstant() is limited to 'shl' only now. The formatting change to IsLeftShift shows that we could move several transforms into visitShl() directly for efficiency because they are not common shift transforms. 2. The tests are regenerated to show new instruction names to prove that we are getting (almost) identical logic results. 3. The one case where we differ ("trunc_sandwich_small_shift1") shows that we now use a narrow 'and' instruction. Previously, we relied on another transform to do that, but it is limited to legal types. That seems to be a legacy constraint from when IR analysis and codegen were less robust. https://alive2.llvm.org/ce/z/JxyGA4 declare void @llvm.assume(i1) define i8 @src(i32 %x, i32 %c0, i8 %c1) { ; The sum of the shifts must not overflow the source width. %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %ov = icmp ult i32 %sum, 32 call void @llvm.assume(i1 %ov) %sh1 = lshr i32 %x, %c0 %tr = trunc i32 %sh1 to i8 %sh2 = lshr i8 %tr, %c1 ret i8 %sh2 } define i8 @tgt(i32 %x, i32 %c0, i8 %c1) { %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %maskc = lshr i8 -1, %c1 %s = lshr i32 %x, %sum %t = trunc i32 %s to i8 %a = and i8 %t, %maskc ret i8 %a }

These functions transfer ownership to the caller. Make this clear in the type system. No behavior change.

This plugin parses Fortran files and creates a YAML report with all the OpenMP constructs and clauses seen in the file. The following tests have been modified to be compatible for testing the plugin, hence why they are not reused from another directory: - omp-atomic.f90 - omp-declarative-directive.f90 - omp-device-constructs.f90 The plugin outputs a single file in the same directory as the source file in the following format: `<source-file-name>.yaml` Building the plugin: `ninja flangOmpReport` Running the plugin: `./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report -fopenmp <source_file.f90>` Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com> Co-authored-by: Stuart Ellis <stuart.ellis@arm.com> Reviewed By: awarzynski, kiranchandramohan Differential Revision: https://reviews.llvm.org/D109890

The ARM backend was explicitly setting global binding on the personality symbol. This was added without any comment in a7ec2dc, which introduced EHABI support (back in 2011). None of the other backends do anything equivalent, as far as I can tell. This causes problems when attempting to wrap the personality symbol. Wrapped symbols are marked as weak inside LTO to inhibit IPO (see https://reviews.llvm.org/D33621). When we wrap the personality symbol, it initially gets weak binding, and then the ARM backend attempts to change the binding to global, which causes an error in MC because of attempting to change the binding of a symbol from non-global to global (the error was added in https://reviews.llvm.org/D90108). Simply drop the ARM backend's explicit global binding setting to fix this. This matches all the other backends, and a large internal application successfully linked and ran with this change, so it shouldn't cause any problems. Test via LLD, since wrapping is required to exhibit the issue. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D110609

Quantized int type should include I32 types as its the output of a quantizd convolution or matmul operation. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D110651

To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Previous revisions didn't properly declare the new dependencies. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364

We weren't retaining the ctypes closures that the ExecutionEngine was calling back into, leading to mysterious errors. Open to feedback about how to test this. And an extra pair of eyes to make sure I caught all the places that need to be aware of this. Differential Revision: https://reviews.llvm.org/D110661

Differential Revision: https://reviews.llvm.org/D110644

bug 51926 identified an issue where a dangling comma caused the cell count to be to off by one Differential Revision: https://reviews.llvm.org/D110481

…ad of string Differential Revision: https://reviews.llvm.org/D110635

Also, this adds unit tests to check that limits.h complies with the C standard. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D110643

Similar to what SDAG does when it sees a smulo/umulo against 2 (see: `DAGCombiner::visitMULO`) This pattern is fairly common in Swift code AFAICT. Here's an example extracted from a Swift testcase: https://godbolt.org/z/6cT8Mesx7 Differential Revision: https://reviews.llvm.org/D110662

Tests fail on Windows otherwise.

In looking at the disk space used by a ninja check-all, I found that a few of the largest files were copies of clang and lld made into temp directories by a couple of tests. These tests were added in D53021 and D74811. Clean up these copies after usage. Differential Revision: https://reviews.llvm.org/D110276

When we have code with truncates, those truncates may be changed into G_ANDs with constants. These may, in turn, feed into other G_AND instructions. Running this combine post-legalize allows us to optimize examples like this one: https://godbolt.org/z/zrGY4dfEW SDAG currently optimizes the example above so that there is only one `and`. GISel doesn't optimize it, because the G_AND we'd optimize here is translated as a G_TRUNC. Later, that G_TRUNC is turned into a G_AND during legalization. Differential Revision: https://reviews.llvm.org/D110667

…ions. CompactUnwindSplitter splits compact-unwind sections on record boundaries and adds keep-alive edges from target functions back to their respective records. In MachO_arm64.cpp, a CompactUnwindSplitter pass is added to the pre-prune pass list when setting up the standard pipeline. This patch does not provide runtime support for compact-unwind, but is a first step towards enabling it.

We generate symbols like `profc`/`profd` for each function, and put them into csects. When there are weak functions, we generate weak symbols for the functions as well, with ELF (and some others), linker (binder) will discard and only keep one copy of the weak symbols. However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect, as binder can NOT discard a subset of a csect. This creates a unique challenge for using those symbols to calculate some relative offsets. This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct. The downside is that we won't be able to discard the duplicated counters and profile data, but those can not be discarded even if we keep the weak linkage, due to the binder limitation of not discarding a subsect of the csect either . Reviewed By: Whitney, MaskRay Differential Revision: https://reviews.llvm.org/D110422

Comment says: // If the operand is larger than the shift count type but the shift // count type has enough bits to represent any shift value ... It clearly talks about the shifted operand, not the shift-amount operand, but the comparison is performed against Log2_32_Ceil(Op2.getValueSizeInBits()) where Op2 is the shift amount operand. This comparison also doesn't make sense in the context of the previous one (ShiftsSize > Op2Size) because Op2Size == Op2.getValueSizeInBits(). Fix to use Op1. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110509

…eration"" This reverts commit 73a196a. Causes crashes as reported in https://reviews.llvm.org/D109963

ASan device library functions (those starts with the prefix __asan_) are at the moment undergoing through undesired optimizations due to internalization. Hence, in order to avoid such undesired optimizations on ASan device library functions, do not internalize them in the first place. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D110468

On AIX, we relied on LTO to merge the csects for profiling data/counter sections. AIX binder now get the namedcsect support to support the merging, so now we can enable PGO without LTO with the new binder. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110671

…tions. Follow-up to fc734da to enable compact-unwind splitting on x86-64.

check-orc-rt had no cmake target dependency on orc or llvm-jitlink, which could lead to regression test failures in compiler-rt. This patch should fix the issue. Patch by Jack Andersen (jackoalan@gmail.com). Thanks Jack! Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D110659

Though this is a full port of the example, it is not yet fully functional due to a threading issue in the SimpleRemoteEPC implementation. The issue was discussed in D110530, but it needs a more thorough solution. For now we are dropping the dependency to the old `OrcRPC` here (it's been the last use-case in-tree). The test for the example is under review in ... and will be re-enabled once the threading issue is solved.

Rename `lenParams` to `typeparams` to be in sync with fir-dev. This patch is part of the upstreaming effort from fir-dev branch. Co-authored-by: Jean Perier <jperier@nvidia.com> Co-authored-by: Valentin Clement <clementval@gmail.com> Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D110645

With -fpreserve-vec3-type enabled, a cast was not created when converting from a non-vec3 type to a vec3 type, even though a conversion to vec3 was performed. This resulted in creation of invalid store instructions. Differential Revision: https://reviews.llvm.org/D108470

…merge-upstream-20210929

Should fix test failure on Arm/AArch64 quick bots which only build those targets. https://lab.llvm.org/buildbot/#/builders/171/builds/4077

kaz7 · 2021-10-15T11:23:09Z

Pass internal regression tests.

LebedevRI and others added 30 commits September 27, 2021 14:15

[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool

b1695c2

Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513

[gn build] Port 9da2fa2

e2eb651

Revert "Recommit "Revert "[CVP] processSwitch: Remove default case wh…

3a998c0

…en switch cover all possible values.""" This reverts commit 8ba2adc.

[LoopFlatten] Precommit new test widen-iv2.ll for D110234.

a588ae4

Removing a default constructor argument; NFC

38d0908

The argument is always used with its default value, so remove the argument entirely.

[AArch64] Fix neon-reverseshuffle test extension. NFC

ebee606

Apparently I gave a ll file a .patch extension. Oops.

[lldb] [DynamicRegisterInfo] Add a convenience method to add suppl. r…

3303154

…egisters Add a convenience method to add supplementary registers that takes care of adding invalidate_regs to all (potentially) overlapping registers. Differential Revision: https://reviews.llvm.org/D110023

tsan: de-hardcode MemCount const

1455b55

Use MemCount instead of hard-coded value 7. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110532

[AMDGPU] Ignore KILLs when forming clauses

bf98093

KILL instructions are sometimes present and prevented hard clauses from being formed. Fix this by ignoring all meta instructions in clauses. Differential Revision: https://reviews.llvm.org/D106042

tsan: add a test for stack init race

b72176b

Depends on D110538. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110539

[mlir] AsyncRuntime: use int64_t for ref counting operations

92db09c

Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898 Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D110550

[InstCombine] match variable names and code comments; NFC

025a805

Similar to: 29c09c7 Planned follow-up is to add a transform here to allow removing a common shift fold that is conflicting with D110170.

[llvm] ConvertOption::accept(), acceptInternal() to std::unique_ptr<>

2f95542

These functions transfer ownership to the caller. Make this clear in the type system. No behavior change.

[llvm] Convert OptTable::parseOneArgGrouped() to std::unique_ptr<>

7789a68

Stuart Ellis and others added 27 commits September 28, 2021 22:57

[mlir][tosa] Add i32 to supported quantized type

4f38f06

Quantized int type should include I32 types as its the output of a quantizd convolution or matmul operation. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D110651

[NFC][sanitizer] Return StackDepotStats by value

7c1128f

Differential Revision: https://reviews.llvm.org/D110644

fixes bug #51926 where dangling comma caused overrun

a36227c

bug 51926 identified an issue where a dangling comma caused the cell count to be to off by one Differential Revision: https://reviews.llvm.org/D110481

[clang] Let PPCallbacks::PragmaWarning() pass specifier as enum inste…

5cf0606

…ad of string Differential Revision: https://reviews.llvm.org/D110635

[libc] Add support for 128 bit ints in limits.h

b62d72f

Also, this adds unit tests to check that limits.h complies with the C standard. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D110643

[test] Specify triple in backend-attribute-error-warning.cpp

2d56fbf

Tests fail on Windows otherwise.

Revert "Recommit "[AArch64] Split bitmask immediate of bitwise AND op…

c07f709

…eration"" This reverts commit 73a196a. Causes crashes as reported in https://reviews.llvm.org/D109963

[gn build] Port c07f709

fd9a5b9

[JITLink][MachO][x86-64] Add support for splitting compact-unwind sec…

1f2f1a4

…tions. Follow-up to fc734da to enable compact-unwind splitting on x86-64.

Merge commit '4da744a20ff58c9b3d8df0e2eb9e8b69d9e5cc3d' into feature/…

791d7e3

…merge-upstream-20210929

[AMDGPU] Require AMDGPU target for ASAN instrumentation tests

fa80fb9

Should fix test failure on Arm/AArch64 quick bots which only build those targets. https://lab.llvm.org/buildbot/#/builders/171/builds/4077

kaz7 merged commit ebd8823 into develop Oct 15, 2021

kaz7 deleted the feature/merge-upstream-20210929 branch October 15, 2021 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/merge upstream 20210929 #96

Feature/merge upstream 20210929 #96

kaz7 commented Oct 15, 2021

kaz7 commented Oct 15, 2021

Feature/merge upstream 20210929 #96

Feature/merge upstream 20210929 #96

Conversation

kaz7 commented Oct 15, 2021

kaz7 commented Oct 15, 2021