-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/merge upstream 20210929 #96
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1j3nf3dro - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/4n1zvP37j - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110504
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/e5YE99a4P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `6`. For store we have: https://godbolt.org/z/3vM4KsE1n - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `3`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110505
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.5` So pick cost of `9`. For store we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110506
The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/q6GbK89br - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `18`. For store we have: https://godbolt.org/z/Yzfoo5TnW - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110507
When rebase_exec=true in DidAttach(), all modules are loaded before the rendezvous breakpoint is set, which means the LoadInterpreterModule() method is not called and m_interpreter_module is not initialized. This causes the very first rendezvous breakpoint hit with m_initial_modules_added=false to accidentally unload the module_sp that corresponds to the dynamic loader. This bug (introduced in D92187) was causing the rendezvous mechanism to not work in Android 28. The mechanism works fine on older/newer versions of Android. Test: Verified rendezvous on Android 28 and 29 Test: Added dlopen test Reviewed By: labath Differential Revision: https://reviews.llvm.org/D109797
Let the calling pass or pattern replace the uses of the original root operation. Internally, the tileAndFuse still replaces uses and updates operands but only of newly created operations. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D110169
…iceRTL. Use the in-project clang, llvm-link and opt if available and unless CMake cache variables specify to use a different compiler. This applies D101265 to the new DeviceRTL's CMakeLists.txt which was copied before D101265 was applied. Fixes the openmp-offloading-cuda-runtime builder which was failing since D110006. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110251
Refactor the XML converting attribute and text getters to use LLVM API. While at it, remove some redundant error and missing XML support handling, as the called base functions do that anyway. Add tests for these methods. Note that this patch changes the getter behavior to be IMHO more correct. In particular: - negative and overflowing integers are now reported as failures to convert, rather than being wrapped over or capped - digits followed by text are now reported as failures to convert to double, rather than their numeric part being converted Differential Revision: https://reviews.llvm.org/D110410
Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513
The StringConvert API is no longer used anywhere but in debugserver. Since debugserver does not use LLVM API, we cannot replace it with llvm::to_integer() and llvm::to_float() there. Let's just move the sources into debugserver. Differential Revision: https://reviews.llvm.org/D110478
…en switch cover all possible values.""" This reverts commit 8ba2adc.
… command guide This change is to add some missing details to the help text and command guide: - Added a note to the command guide that --debug-macro also dumps .debug_macinfo. - Added a note to the command guide that --debug-frame and --eh_frame are aliases, and in cases where both sections are present one command outputs both. - Changed the wording in the help output for --ignore-case and --regex to closer match the command guide.
The argument is always used with its default value, so remove the argument entirely.
Apparently I gave a ll file a .patch extension. Oops.
Function specialization was crashing on poison values and constexpr values. The problem is that these values are not added to the solver, so it crashes when a lookup is performed for these values. This fixes that by not specialising on these values. For poison that is obvious, but for constexpr this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but specialising on constexpr values wasn't done very intentionally, and need some more work and tests if we wanted to support this. As a follow up, we need to look if the solver should exit more gracefully and return a "don't know", or that it should really support these constexprs. This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600). Differential Revision: https://reviews.llvm.org/D110529
…egisters Add a convenience method to add supplementary registers that takes care of adding invalidate_regs to all (potentially) overlapping registers. Differential Revision: https://reviews.llvm.org/D110023
Use MemCount instead of hard-coded value 7. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110532
We used to put the canonical spelling of flags after alias processing on that line. For clang-cl in particular, that meant that we put flags on that line that the clang-cl driver doesn't even accept, and the "Driver args:" line wasn't usable. Differential Revision: https://reviews.llvm.org/D110458
KILL instructions are sometimes present and prevented hard clauses from being formed. Fix this by ignoring all meta instructions in clauses. Differential Revision: https://reviews.llvm.org/D106042
Currently detection of races with TLS/stack initialization is broken because we imitate the write before thread initialization, so it's modelled with a wrong thread/epoch. Fix that and add a test. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110538
Depends on D110538. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110539
The trace tests crashed on darwin because of some thread initialization issues (thread initialization is somewhat different on darwin). Instead of starting real threads, create a new ThreadState in the main thread. This makes the tests more unit-testy and hopefully won't crash on darwin (there is almost no platform-specific code involved now). This will also help with future trace tests that will need more than 1 thread. Creating more than 1 real thread and dispatching test actions across multiple threads in the required deterministic order is painful. Depends on D110539. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110546
Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898 Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D110550
…atabase.c test case. It appears that this test assumes that the toolchain utilizes the integrated assembler by default, since the expected output in the CHECKs are compilation_database.o. However, this test fails on AIX as AIX does not utilize the integrated assembler. On AIX, the output instead is of the form /tmp/compilation_database-*.s. Thus, this patch explicitly adds the -fintegrated-as option to match the assumption that the integrated assembler is used by default. Differential Revision: https://reviews.llvm.org/D110431
Similar to: 29c09c7 Planned follow-up is to add a transform here to allow removing a common shift fold that is conflicting with D110170.
This is another step towards trying to re-apply D110170 by eliminating conflicting transforms that cause infinite loops. a47c8e4 was a previous patch in this direction. The diffs here are mostly cosmetic, but intentional: 1. The existing code that would handle this pattern in FoldShiftByConstant() is limited to 'shl' only now. The formatting change to IsLeftShift shows that we could move several transforms into visitShl() directly for efficiency because they are not common shift transforms. 2. The tests are regenerated to show new instruction names to prove that we are getting (almost) identical logic results. 3. The one case where we differ ("trunc_sandwich_small_shift1") shows that we now use a narrow 'and' instruction. Previously, we relied on another transform to do that, but it is limited to legal types. That seems to be a legacy constraint from when IR analysis and codegen were less robust. https://alive2.llvm.org/ce/z/JxyGA4 declare void @llvm.assume(i1) define i8 @src(i32 %x, i32 %c0, i8 %c1) { ; The sum of the shifts must not overflow the source width. %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %ov = icmp ult i32 %sum, 32 call void @llvm.assume(i1 %ov) %sh1 = lshr i32 %x, %c0 %tr = trunc i32 %sh1 to i8 %sh2 = lshr i8 %tr, %c1 ret i8 %sh2 } define i8 @tgt(i32 %x, i32 %c0, i8 %c1) { %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %maskc = lshr i8 -1, %c1 %s = lshr i32 %x, %sum %t = trunc i32 %s to i8 %a = and i8 %t, %maskc ret i8 %a }
These functions transfer ownership to the caller. Make this clear in the type system. No behavior change.
This plugin parses Fortran files and creates a YAML report with all the OpenMP constructs and clauses seen in the file. The following tests have been modified to be compatible for testing the plugin, hence why they are not reused from another directory: - omp-atomic.f90 - omp-declarative-directive.f90 - omp-device-constructs.f90 The plugin outputs a single file in the same directory as the source file in the following format: `<source-file-name>.yaml` Building the plugin: `ninja flangOmpReport` Running the plugin: `./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report -fopenmp <source_file.f90>` Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com> Co-authored-by: Stuart Ellis <stuart.ellis@arm.com> Reviewed By: awarzynski, kiranchandramohan Differential Revision: https://reviews.llvm.org/D109890
The ARM backend was explicitly setting global binding on the personality symbol. This was added without any comment in a7ec2dc, which introduced EHABI support (back in 2011). None of the other backends do anything equivalent, as far as I can tell. This causes problems when attempting to wrap the personality symbol. Wrapped symbols are marked as weak inside LTO to inhibit IPO (see https://reviews.llvm.org/D33621). When we wrap the personality symbol, it initially gets weak binding, and then the ARM backend attempts to change the binding to global, which causes an error in MC because of attempting to change the binding of a symbol from non-global to global (the error was added in https://reviews.llvm.org/D90108). Simply drop the ARM backend's explicit global binding setting to fix this. This matches all the other backends, and a large internal application successfully linked and ran with this change, so it shouldn't cause any problems. Test via LLD, since wrapping is required to exhibit the issue. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D110609
Quantized int type should include I32 types as its the output of a quantizd convolution or matmul operation. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D110651
To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Previous revisions didn't properly declare the new dependencies. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364
We weren't retaining the ctypes closures that the ExecutionEngine was calling back into, leading to mysterious errors. Open to feedback about how to test this. And an extra pair of eyes to make sure I caught all the places that need to be aware of this. Differential Revision: https://reviews.llvm.org/D110661
Differential Revision: https://reviews.llvm.org/D110644
bug 51926 identified an issue where a dangling comma caused the cell count to be to off by one Differential Revision: https://reviews.llvm.org/D110481
…ad of string Differential Revision: https://reviews.llvm.org/D110635
Also, this adds unit tests to check that limits.h complies with the C standard. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D110643
Similar to what SDAG does when it sees a smulo/umulo against 2 (see: `DAGCombiner::visitMULO`) This pattern is fairly common in Swift code AFAICT. Here's an example extracted from a Swift testcase: https://godbolt.org/z/6cT8Mesx7 Differential Revision: https://reviews.llvm.org/D110662
Tests fail on Windows otherwise.
In looking at the disk space used by a ninja check-all, I found that a few of the largest files were copies of clang and lld made into temp directories by a couple of tests. These tests were added in D53021 and D74811. Clean up these copies after usage. Differential Revision: https://reviews.llvm.org/D110276
When we have code with truncates, those truncates may be changed into G_ANDs with constants. These may, in turn, feed into other G_AND instructions. Running this combine post-legalize allows us to optimize examples like this one: https://godbolt.org/z/zrGY4dfEW SDAG currently optimizes the example above so that there is only one `and`. GISel doesn't optimize it, because the G_AND we'd optimize here is translated as a G_TRUNC. Later, that G_TRUNC is turned into a G_AND during legalization. Differential Revision: https://reviews.llvm.org/D110667
…ions. CompactUnwindSplitter splits compact-unwind sections on record boundaries and adds keep-alive edges from target functions back to their respective records. In MachO_arm64.cpp, a CompactUnwindSplitter pass is added to the pre-prune pass list when setting up the standard pipeline. This patch does not provide runtime support for compact-unwind, but is a first step towards enabling it.
We generate symbols like `profc`/`profd` for each function, and put them into csects. When there are weak functions, we generate weak symbols for the functions as well, with ELF (and some others), linker (binder) will discard and only keep one copy of the weak symbols. However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect, as binder can NOT discard a subset of a csect. This creates a unique challenge for using those symbols to calculate some relative offsets. This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct. The downside is that we won't be able to discard the duplicated counters and profile data, but those can not be discarded even if we keep the weak linkage, due to the binder limitation of not discarding a subsect of the csect either . Reviewed By: Whitney, MaskRay Differential Revision: https://reviews.llvm.org/D110422
Comment says: // If the operand is larger than the shift count type but the shift // count type has enough bits to represent any shift value ... It clearly talks about the shifted operand, not the shift-amount operand, but the comparison is performed against Log2_32_Ceil(Op2.getValueSizeInBits()) where Op2 is the shift amount operand. This comparison also doesn't make sense in the context of the previous one (ShiftsSize > Op2Size) because Op2Size == Op2.getValueSizeInBits(). Fix to use Op1. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110509
…eration"" This reverts commit 73a196a. Causes crashes as reported in https://reviews.llvm.org/D109963
ASan device library functions (those starts with the prefix __asan_) are at the moment undergoing through undesired optimizations due to internalization. Hence, in order to avoid such undesired optimizations on ASan device library functions, do not internalize them in the first place. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D110468
On AIX, we relied on LTO to merge the csects for profiling data/counter sections. AIX binder now get the namedcsect support to support the merging, so now we can enable PGO without LTO with the new binder. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110671
…tions. Follow-up to fc734da to enable compact-unwind splitting on x86-64.
check-orc-rt had no cmake target dependency on orc or llvm-jitlink, which could lead to regression test failures in compiler-rt. This patch should fix the issue. Patch by Jack Andersen (jackoalan@gmail.com). Thanks Jack! Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D110659
Though this is a full port of the example, it is not yet fully functional due to a threading issue in the SimpleRemoteEPC implementation. The issue was discussed in D110530, but it needs a more thorough solution. For now we are dropping the dependency to the old `OrcRPC` here (it's been the last use-case in-tree). The test for the example is under review in ... and will be re-enabled once the threading issue is solved.
Rename `lenParams` to `typeparams` to be in sync with fir-dev. This patch is part of the upstreaming effort from fir-dev branch. Co-authored-by: Jean Perier <jperier@nvidia.com> Co-authored-by: Valentin Clement <clementval@gmail.com> Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D110645
With -fpreserve-vec3-type enabled, a cast was not created when converting from a non-vec3 type to a vec3 type, even though a conversion to vec3 was performed. This resulted in creation of invalid store instructions. Differential Revision: https://reviews.llvm.org/D108470
…merge-upstream-20210929
Should fix test failure on Arm/AArch64 quick bots which only build those targets. https://lab.llvm.org/buildbot/#/builders/171/builds/4077
Pass internal regression tests. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merge upstream/main to 4da744a.