-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/merge upstream 20210528 #59
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa
cxx20_iterator_traits.compile.pass.cpp actually depends on implementation details of libc++, which is not great; but I just left a comment and moved on.
- Currently, the host cpu information is not easily available on z/OS as in other platforms. - This information is stored in the Communications Vector Table (https://www.ibm.com/docs/en/zos/2.2.0?topic=information-cvt-mapping) Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D102793
Currently, BPF only contains three relocations: R_BPF_NONE for no relocation R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation R_BPF_64_32 for call insn and normal 32-bit data relocation Also .BTF and .BTF.ext sections contain symbols in allocated program and data sections. These two sections reserved 32bit space to hold the offset relative to the symbol's section. When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld may attempt to resolve relocations for .BTF and .BTF.ext, which we want to prevent. So we used R_BPF_NONE for such relocations. This all works fine until when we try to do linking of multiple objects. . R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data is different, so lld target->relocate() needs more context to do a correct job. . The same for R_BPF_64_32. More context is needed for lld target->relocate() to differentiate call insn vs. normal 32-bit data relocation. . Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE, they will not be relocated properly when multiple .BTF/.BTF.ext sections are merged by lld. This patch intends to address this issue by adding additional relocation kinds: R_BPF_64_ABS64 for normal 64-bit data relocation R_BPF_64_ABS32 for normal 32-bit data relocation R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations. The old R_BPF_64_{64,32} semantics: R_BPF_64_64 for LD_imm64 relocation R_BPF_64_32 for call insn relocation The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values is maintained. They are the most common use cases for bpf programs and we want to maintain backward compatibility as much as possible. ExecutionEngine RuntimeDyld BPF relocations are adjusted as well. R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and other relocations will be ignored. Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!" fatal error. FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp are removed as they are not triggered in BPF backend. BPF backend used FK_SecRel_8 for LD_imm64 instruction operands. Differential Revision: https://reviews.llvm.org/D102712
Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103057
Said function had a few shortfalls: - didn't set an abort message on Android - was logged on several lines - didn't provide extra information like the size requested if OOM'ing This improves the function to address those points. Differential Revision: https://reviews.llvm.org/D103034
…to intptr_t A test in ir.c makes use of casting a void* to an integer type to print it's address. This cast is currently done with the datatype `long` however, which is only guaranteed to be equal to the pointer width on LP64 system. Other platforms may use a length not equal to the pointer width. 64bit Windows as an example uses 32 bit for `long` which does not match the 64 bit pointers. This also results in clang warning due to `-Wvoid-pointer-to-int-cast`. Technically speaking, since the test only passes the value 42, it does not cause any issues, but it'd be nice to fix the warning at least. Differential Revision: https://reviews.llvm.org/D103085
All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.
…s on AVX1 Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
We are using TOCEntry symbols like `LC..0` in TOC loads, this is hard to read , at least requiring an additional step to figure out the loaded symbols. We should print out the name in comments. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D102949
Removed some of the older raw "MLIRized" versions that are no longer needed now that the sparse runtime support library can focus on the proper sparse tensor types rather than the opague pointer approach of the past. This avoids legacy... Reviewed By: penpornk Differential Revision: https://reviews.llvm.org/D102960
All callers pass "false" for the Equality parameter. Kill the dead code, and update the function block comment.
The parseInputFile function returns an empty unique_ptr to signal an error, like when the input file doesn't exist, or is malformed. In this case, the tool should exit immediately rather than segfault by dereferencing the unique_ptr later. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102891
Stylistic changes only. 1) Don't pass a parameter just to do an early exit. 2) Use a name which matches actual behavior.
This reverts commit 0bebda1. Causing "Invalid record" errors.
The 2nd test is based on the fuzzer example in post-commit comments of D101191 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661 The 1st test shows that we don't deal with this symmetrically. We should be able to reduce both examples (possibly in instsimplify instead of instcombine).
…banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail.
A recent fix for problems with ENTRY statement handling didn't get the case of a procedure dummy argument on an ENTRY statement in an executable part right; the code presumed that those dummy arguments would be objects, not entities that might be objects or procedures. Fix. Differential Revision: https://reviews.llvm.org/D103098
llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance. We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen. Redundant switch csprof-cold-thres is also removed and tests cleaned up. Differential Revision: https://reviews.llvm.org/D103071
Update the paragraph on generic / indexed_generic to reflect the unification of these operations. Differential Revision: https://reviews.llvm.org/D102775
Make sure that if SCUDO_DEBUG=1 in tests then we had the same in the scudo library itself. Reviewed By: cryptoad, hctim Differential Revision: https://reviews.llvm.org/D103061
Cast of signed types to u64 breaks comparison. Also remove double () around operands. Reviewed By: cryptoad, hctim Differential Revision: https://reviews.llvm.org/D103060
…ly infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: AliveToolkit/alive2#726
Now that we can fold some transposes into multiplies (CM: A * B^t and RM: A^t * B), we want to move them around to create the optimal expressions: * fold away double transposes while still using them to assert the shape * sink transposes hoping they cancel out * lift transposes when both operands are transposed This also modifies the matrix remarks to include the number of exposed transposes (i.e. transposes that we couldn't fold into a multiply). The adjustment to the test remarks-inlining is a bit subtle: I am changing the double transpose to a single transpose so that we don't remove it completely. More importantly this changes some of the total instruction count, most notable stores because we can no longer use a vector store. Differential Revision: https://reviews.llvm.org/D102733
This patch is the third in a series of patches fixing markdown links and references inside the mlir documentation. This patch addresses all broken references to other markdown files and sections inside the Tutorials folder. Differential Revision: https://reviews.llvm.org/D103017
-enable-matrix just adds a single pass, so it's easier to just check in new-pm-default.ll rather than duplicating the full checks for -O3 with the new pass manager. Suggested post-commit by @aeubanks.
Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D103154
... and ClanfFormatStyleOptions.rst for EmptyLineAfterAccessModifier Differential-Revision: https://reviews.llvm.org/D102989
When lowering the dynamic, guided, auto and runtime types of scheduling, there is an optional monotonic or non-monotonic modifier. This patch adds support in the OMP IR Builder to pass this down to the runtime functions. Also implements tests for the variants. Differential Revision: https://reviews.llvm.org/D102008
This struct was used to specify the device on which memory was being allocated/free in atmi_malloc/free. It has now been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103239
This reverts commit ea4c5fb.
…(5/n) This revision refactors and simplifies the pattern detection logic: thanks to SSA value properties, we can actually look at all the uses of a given value and avoid having to pattern-match specific chains of operations. A bufferization pattern for subtensor is added and specific inplaceability analysis is implemented for the simple case of subtensor. More advanced use cases will follow. Differential revision: https://reviews.llvm.org/D102512
WG14 adopted N2645 and WG21 EWG has accepted P2334 in principle (still subject to full EWG vote + CWG review + plenary vote), which add support for #elifdef as shorthand for #elif defined and #elifndef as shorthand for #elif !defined. This patch adds support for the new preprocessor directives.
For uniform ReplicateRecipes, only the first lane should be used, so sinking them would mean we have to compute the value of the first lane multiple times. Also, at the moment, sinking them causes a crash because the value of the first lane is re-used by all users. Reported post-commit for D100258.
The vector calling convention dictates that when the vector argument registers are exhaused, GPRs are used to pass the address via the stack. When the GPRs themselves are exhausted, at best we would previously crash with an assertion, and at worst we'd generate incorrect code. This patch addresses this issue by passing fixed-length vectors via the stack with their full fixed-length size and aligned to their element type size. Since the calling convention lowering can't yet handle scalable vector types, this patch adds a fatal error to make it clear that we are lacking in this regard. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102422
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told whether it's to use vector types and also whether it's to issue a truncating store. However, the truncating store code path assumes a scalar integer `ConstantSDNode`, and when using vector types it creates either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which is a constant. The `riscv64` target is able to expose a crash here because it switches on both code paths at the same time. The `f32` is stored as `i32` which must be promoted to `i64`, necessitating a truncating store. It also decides later that it prefers a vector store of `v2f32`. While vector truncating stores are legal, this combine is not able to emit them. We also don't have a test case. This patch adds an assert to catch this case more gracefully, and updates one of the caller functions to the function to turn off the use of truncating stores when preferring vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103173
We were accidentally leaning on code in lowerLoad which expands extending loads which should be removed.
The original version of this was reverted, and @rjmcall provided some advice to architect a new solution. This is that solution. This implements a builtin to provide a unique name that is stable across compilations of this TU for the purposes of implementing the library component of the unnamed kernel feature of SYCL. It does this by running the Itanium mangler with a few modifications. Because it is somewhat common to wrap non-kernel-related lambdas in macros that aren't present on the device (such as for logging), this uniquely generates an ID for all lambdas involved in the naming of a kernel. It uses the lambda-mangling number to do this, except replaces this with its own number (starting at 10000 for readabililty reasons) for lambdas used to name a kernel. Additionally, this implements itself as constexpr with a slight catch: if a name would be invalidated by the use of this lambda in a later kernel invocation, it is diagnosed as an error (see the Sema tests). Differential Revision: https://reviews.llvm.org/D103112
Summary: Make the file name and descriptors static so that they are reused by print-changed=diff. This avoids errors about being unable to create temporary files when doing the later comparisons in a large compile. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D100116
It should technically be a 1, since we are only setting the first bit.
This was broken several days ago in 8269057.
When lowering the dynamic, guided, auto and runtime types of scheduling, there is an optional monotonic or non-monotonic modifier. This patch adds support in the OMP IR Builder to pass this down to the runtime functions. Also implements tests for the variants. Differential Revision: https://reviews.llvm.org/D102008
…merge-upstream-20210528
FYI, gcc-11 works fine with this version. |
kaz7
pushed a commit
that referenced
this pull request
Sep 25, 2023
…… (#67069) We noticed some performance issue while in lldb-vscode for grabing the name of the SBValue. Profiling shows SBValue::GetName() can cause synthetic children provider of shared/unique_ptr to deference underlying object and complete it type. This patch lazily moves the dereference from synthetic child provider's Update() method to GetChildAtIndex() so that SBValue::GetName() won't trigger the slow code path. Here is the culprit slow code path: ``` ... frame #59: 0x00007ff4102e0660 liblldb.so.15`SymbolFileDWARF::CompleteType(this=<unavailable>, compiler_type=0x00007ffdd9829450) at SymbolFileDWARF.cpp:1567:25 [opt] ... frame #67: 0x00007ff40fdf9bd4 liblldb.so.15`lldb_private::ValueObject::Dereference(this=0x0000022bb5dfe980, error=0x00007ffdd9829970) at ValueObject.cpp:2672:41 [opt] frame #68: 0x00007ff41011bb0a liblldb.so.15`(anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::Update(this=0x000002298fb94380) at LibStdcpp.cpp:403:40 [opt] frame #69: 0x00007ff41011af9a liblldb.so.15`lldb_private::formatters::LibStdcppSharedPtrSyntheticFrontEndCreator(lldb_private::CXXSyntheticChildren*, std::shared_ptr<lldb_private::ValueObject>) [inlined] (anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::LibStdcppSharedPtrSyntheticFrontEnd(this=0x000002298fb94380, valobj_sp=<unavailable>) at LibStdcpp.cpp:371:5 [opt] ... frame #78: 0x00007ff40fdf6e42 liblldb.so.15`lldb_private::ValueObject::CalculateSyntheticValue(this=0x000002296c66a500) at ValueObject.cpp:1836:27 [opt] frame #79: 0x00007ff40fdf1939 liblldb.so.15`lldb_private::ValueObject::GetSyntheticValue(this=<unavailable>) at ValueObject.cpp:1867:3 [opt] frame #80: 0x00007ff40fc89008 liblldb.so.15`ValueImpl::GetSP(this=0x0000022c71b90de0, stop_locker=0x00007ffdd9829d00, lock=0x00007ffdd9829d08, error=0x00007ffdd9829d18) at SBValue.cpp:141:46 [opt] frame #81: 0x00007ff40fc7d82a liblldb.so.15`lldb::SBValue::GetSP(ValueLocker&) const [inlined] ValueLocker::GetLockedSP(this=0x00007ffdd9829d00, in_value=<unavailable>) at SBValue.cpp:208:21 [opt] frame #82: 0x00007ff40fc7d817 liblldb.so.15`lldb::SBValue::GetSP(this=0x00007ffdd9829d90, locker=0x00007ffdd9829d00) const at SBValue.cpp:1047:17 [opt] frame #83: 0x00007ff40fc7da6f liblldb.so.15`lldb::SBValue::GetName(this=0x00007ffdd9829d90) at SBValue.cpp:294:32 [opt] ... ``` Differential Revision: https://reviews.llvm.org/D159542
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merge up to 2021/5/28.
This requires clang as compiler because of upstream modifications.
This will be solved later after merging upstream fixes.
Pass regression tests.