forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from llvm:main #746
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Part of D110978.
This patch enables a multi-use demanded bits fold (motivated by issue #57576): https://alive2.llvm.org/ce/z/DsZakh This mimics transforms that we already do on the single-use path. Originally, this patch did not include the last part to form a constant, but that can be removed independently to reduce risk. It's not clear what the effect of either change will be when viewed end-to-end. This is expected to be neutral or a slight win for compile-time. See the "add-demand2" series for experimental timing results: https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright Differential Revision: https://reviews.llvm.org/D133788
The current code for generating nontemporal load outputs the wrong assembly for big endian architecture. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133789
D131437 caused heap-use-after-free failures when testing TestCreateAfterAttach.py in asan mode, and "regular" crashes outside of asan. This appears to be due to a mismatch in a couple places where we choose to clear the DIEs. When we clear the DIE of a skeleton unit, we unconditionally clear the DIE of the DWO unit if it exists. However, `~ScopedExtractDIEs()` only looks at the skeleton unit when deciding to clear. If we decide to clear the skeleton unit because it is now unused, we end up clearing the DWO unit that _is_ used. This change adds a guard by checking `m_cancel_scopes` to prevent clearing the DWO unit. This is 100% reproducible by running TestCreateAfterAttach.py in asan mode, although it only seems to reproduce in our internal build, so no test case is added here. If someone has suggestions on how to write one, I can add it. Reviewed By: labath Differential Revision: https://reviews.llvm.org/D133790
The tools are called e.g. `toyc-ch1`, not `toy-ch1`. Add missing toyc-ch6/7. It turns out that the other substitutions are not needed more by specific circumstances rather than by design: The lit test exec root is set to build/mlir/test, which is where all the test tools are placed by CMake and we wouldn't need to substitute them at all. We shouldn't rely on this assumption though, because it will make things harder for standalone tests and other build systems. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D133842
Instead of checking if any of the new indices has a non-zero coefficient before using the constraint, do this directly when constructing the constraint.
Currently, FunctionModRefBehavior tracks whether the function reads or writes memory (ModRefInfo) and which locations it can access (argmem, inaccessiblemem and other). This patch changes it to track ModRef information per-location instead. To give two examples of why this is useful: * D117095 highlights a weakness of ModRef modelling in the presence of operand bundles. For a memcpy call with deopt operand bundle, we want to say that it can read any memory, but only write argument memory. This would allow them to be treated like any other calls. However, we currently can't express this and have to say that it can read or write any memory. * D127383 would ideally be modelled as a separate threadid location, where threadid Refs outside pre-split coroutines can be ignored (like other accesses to constant memory). The current representation does not allow modelling this precisely. The patch as implemented is intended to be NFC, but there are some obvious opportunities for improvements and simplification. To fully capitalize on this we would also want to change the way we represent memory attributes on functions, but that's a larger change, and I think it makes sense to separate out the FunctionModRefBehavior refactoring. Differential Revision: https://reviews.llvm.org/D130896
The old device runtime had a "simplified" version that prevented many of the runtime features from being initialized. The old device runtime was deleted in LLVM 14 and is no longer in use. Selectively deactivating features is now done using specific flags rather than the old technique. This patch simply removes the extra logic required for handling the old simple runtime scheme. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D133802
Previously, we linked in the ROCm device libraries which provide math and other utility functions late. This is not stricly correct as this library contains several flags that are only set per-TU, such as fast math or denormalization. This patch changes this to pass the bitcode libraries per-TU using the same method we use for the CUDA libraries. This has the advantage that we correctly propagate attributes making this implementation more correct. Additionally, many annoying unused functions were not being fully removed during LTO. This lead to erroneous warning messages and remarks on unused functions. I am not sure if not finding these libraries should be a hard error. let me know if it should be demoted to a warning saying that some device utilities will not work without them. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D133726
…ption Downstream users who doesn't make use of the clang cc1 frontend for commandline argument parsing, won't benefit from the Marshalling provided default initialization of the AnalyzerOptions entries. More about this later. Those analyzer option fields, as they are bitfields, cannot be default initialized at the declaration (prior c++20), hence they are initialized at the constructor. The only problem is that `ShouldEmitErrorsOnInvalidConfigValue` was forgotten. In this patch I'm proposing to initialize that field with the rest. Note that this value is read by `CheckerRegistry.cpp:insertAndValidate()`. The analyzer options are initialized by the marshalling at `CompilerInvocation.cpp:GenerateAnalyzerArgs()` by the expansion of the `ANALYZER_OPTION_WITH_MARSHALLING` xmacro to the appropriate default value regardless of the constructor initialized list which I'm touching. Due to that this only affects users using CSA as a library, without serious effort, I believe we cannot test this. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D133851
…ead of int64_t Only the main Presburger library under the Presburger directory has been switched to use arbitrary precision. Users have been changed to just cast returned values back to int64_t or to use newly added convenience functions that perform the same cast internally. The performance impact of this has been tested by checking test runtimes after copy-pasting 100 copies of each function. Affine/simplify-structures.mlir goes from 0.76s to 0.80s after this patch. Its performance sees no regression compared to its original performance at commit 18a06d4 before a series of patches that I landed to offset the performance overhead of switching to arbitrary precision. Affine/canonicalize.mlir and SCF/canonicalize.mlir show no noticable difference, staying at 2.02s and about 2.35s respectively. Also, for Affine and SCF tests as a whole (no copy-pasting), the runtime remains about 0.09s on average before and after. Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D129510
… better-suited, part 1 A simple sed doing these substitutions: - `${LLVM_BINARY_DIR}/\$\{CMAKE_CFG_INTDIR}/lib(${LLVM_LIBDIR_SUFFIX})?\>` -> `${LLVM_LIBRARY_DIR}` - `${LLVM_BINARY_DIR}/\$\{CMAKE_CFG_INTDIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}` where `\>` means "word boundary". The only manual modifications were reverting changes in - `compiler-rt/cmake/Modules/CompilerRTUtils.cmake` because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing. There are many more occurrences without `CMAKE_CFG_INTDIR`, but those are left for D132316 as they have proved somewhat tricky to fix. This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D133828
Summary: A previous patch removed the user of this function but did not remove the function causing unused function warnings. Remove it.
… Hardening is used SLH will fall back to a different technique if X16 is being used, so there is no need to warn for inline asm use. Only prevent other codegen from using it. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D133766
Handle the case where both operands are negated in matrix multiplication Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133695
Tests don't work on PPC since `return` instruciton is't called `ret` (apparently) Reviewed By: awarzynski Differential Revision: https://reviews.llvm.org/D133859
This patch adds hermite normal form computation to Matrix. Part of this algorithm lived in LinearTransform, being used for compuing column echelon form. This patch moves the implementation to Matrix::hermiteNormalForm and generalises it to compute the hermite normal form. Reviewed By: arjunp Differential Revision: https://reviews.llvm.org/D133510
LibM implementations differ, so the folders can have different results on different platforms. For instance, the `cos` folder was failing on M1 mac. I chose to match the constant floats to 2(.5) significant digits. Reviewed By: jacquesguan Differential Revision: https://reviews.llvm.org/D133797
dependence checking, NFC. Part of D110978
This allows matching ops by additionally providing an idiomatic spec for a unique return type. Differential Revision: https://reviews.llvm.org/D133862
…it is iterative and supports output fusion. This revision revisits the implementation of `transform.fuse_into_containing_op` so that it iterates on producers one use at a time. Support is added to fuse a producer through a foreach_thread shared tensor argument, in which case we tile and fuse the op inside the containing op and update the shared tensor argument to the unique destination operand. If one cannot find such a unique destination operand the transform fails.
…g_op so it is iterative and supports output fusion." This reverts commit 54a5f60 which is a WIP that was pushed by mistake.
Previously, the iteration graph is computed without priority. This patch add a heuristic when computing the iteration graph by starting with Reduction iterator when doing topo sort, which makes Reduction iterators (likely) appear as late in the sorted array as possible. The current sparse compiler also failed to compile the newly added case. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D133738
Adds the accessor methods for I[2|4|16] types to the Builder. Differential Revision: https://reviews.llvm.org/D133793
…f lld is presented" This reverts commit 44075cc. Broke check-clang, see comments on https://reviews.llvm.org/D133841
This reverts commit a0fb69d. This broke the windows lldb bot: https://lab.llvm.org/buildbot/#/builders/83/builds/23666
A previous patch (https://reviews.llvm.org/D132810) introduced a test that fails on systems where the linker executable (`ld`) has a `.exe` extension. This patch updates the regex in the test so that lit can look for both `ld` as well as `ld.exe`. Reviewed By: stella.stamenova Differential Revision: https://reviews.llvm.org/D133773
In non-pie binaries BOLT unconditionally converted type encoding from indirect to absptr, which broke std exceptions since pointers to their typeinfo were only assigned at runtime in .data section. In this patch we preserve original encoding so that indirect remains indirect and can be resolved at runtime, and absolute remains absolute. Reviewed By: rafauler, maksfb Differential Revision: https://reviews.llvm.org/D132484
This reverts commit 6bf6730. Breaks tests if LLD isn't being built, see comments on https://reviews.llvm.org/D133092
Reduce the number of subintervals that need lookup table and optimize the evaluation steps. Currently, `exp2f` is computed by reducing to `2^hi * 2^mid * 2^lo` where `-16/32 <= mid <= 15/32` and `-1/64 <= lo <= 1/64`, and `2^lo` is then approximated by a degree 6 polynomial. Experiment with Sollya showed that by using a degree 6 polynomial, we can approximate `2^lo` for a bigger range with reasonable errors: ``` > P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/64, 1/64]); > dirtyinfnorm(2^x - 1 - x*P, [-1/64, 1/64]); 0x1.e18a1bc09114def49eb851655e2e5c4dd08075ac2p-63 > P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/32, 1/32]); > dirtyinfnorm(2^x - 1 - x*P, [-1/32, 1/32]); 0x1.05627b6ed48ca417fe53e3495f7df4baf84a05e2ap-56 ``` So we can optimize the implementation a bit with: # Reduce the range to `mid = i/16` for `i = 0..15` and `-1/32 <= lo <= 1/32` # Store the table `2^mid` in bits, and add `hi` directly to its exponent field to compute `2^hi * 2^mid` # Rearrange the order of evaluating the polynomial approximating `2^lo`. Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700: ``` $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 9.534 System LIBC reciprocal throughput : 6.229 BEFORE: LIBC reciprocal throughput : 21.405 LIBC reciprocal throughput : 15.241 (with `-msse4.2` flag) LIBC reciprocal throughput : 11.111 (with `-mfma` flag) AFTER: LIBC reciprocal throughput : 18.617 LIBC reciprocal throughput : 12.852 (with `-msse4.2` flag) LIBC reciprocal throughput : 9.253 (with `-mfma` flag) $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 40.869 System LIBC latency : 30.580 BEFORE LIBC latency : 64.888 LIBC latency : 61.027 (with `-msse4.2` flag) LIBC latency : 48.778 (with `-mfma` flag) AFTER LIBC latency : 48.803 LIBC latency : 45.047 (with `-msse4.2` flag) LIBC latency : 37.487 (with `-mfma` flag) ``` Reviewed By: sivachandra, orex Differential Revision: https://reviews.llvm.org/D133870
This is in preparation for adding more gmodules tests. Differential Revision: https://reviews.llvm.org/D133876
When .gnu.version_r is empty (allowed by readelf but warned by objdump), llvm-objdump -p may decode the next section as .gnu.version_r and may crash due to out-of-bounds C string reference. ELFFile<ELFT>::getVersionDependencies handles 0-entry .gnu.version_r gracefully. Just use it. Fix #57707 Differential Revision: https://reviews.llvm.org/D133751
Get some load-store forwarding cases for big-endian where a larger store covers a smaller load, and the offset would be 0 and handled on little-endian but on big-endian the offset is adjusted to be non-zero. The idea is just to shift the data to make it look like the offset 0 case. Differential Revision: https://reviews.llvm.org/D130115
… better-suited, part 2 A simple sed doing these substitutions: - `${LLVM_BINARY_DIR}/lib${LLVM_LIBDIR_SUFFIX}\>` -> `${LLVM_LIBRARY_DIR}` - `${LLVM_BINARY_DIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}` where `\>` means "word boundary". The only manual modifications were reverting changes in - `runtimes/CMakeLists.txt` because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing. There are some `${LLVM_BINARY_DIR}/lib` without the `${LLVM_LIBDIR_SUFFIX}`, but these refer to the lib subdirectory of the source (`llvm/lib`). That `lib` is automatically appended to make the local `CMAKE_CURRENT_BINARY_DIR` value by `add_subdirectory`; since the directory name in the source tree is fixed without any suffix, the corresponding `CMAKE_CURRENT_BINARY_DIR` will also be. We therefore do not replace it but leave it as-is. This picks up where D133828 left off, getting the occurrences with*out* `CMAKE_CFG_INTDIR`. But this is difficult to do correctly and so not done in the (retroactively) previous diff. This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D132316
Enable -Wsizeof-array-div and -Wsizeof-pointer-divcompiler. Also, replace -Wmemset-transposed-args with -Wsuspicious-memaccess. The latter automatically enables the former and a few other warnings. Differential Revision: https://reviews.llvm.org/D133783
Windows build requires brackets on switch-cases that initializes variables. Reviewed By: hanchung Differential Revision: https://reviews.llvm.org/D133889
This patch adds the `llvm.intr.lifetime.start` and `llvm.intr.lifetime.end` intrinsics which are used to indicate to LLVM the lifetimes of allocated memory. These ops have the requirement that the first argument (the size) be an "immediate argument". I added an OpTrait to check this, but it is possible that an approach like GEPArg would work too. Reviewed By: rriddle, dcaballe Differential Revision: https://reviews.llvm.org/D133867
Restore GlobalsAA if sanitizers inserted at early optimize callback. The analysis can be useful for the following FunctionPassManager. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D133537
libdispatch uses its own heap (_dispatch_main_heap) for some allocations, including the dispatch_continuation_t that holds a dispatch source's event handler. Objective-C block trampolines (creating methods at runtime with a block as the implementations) use the VM_MEMORY_FOUNDATION region (see https://github.com/apple-oss-distributions/objc4/blob/8701d5672d3fd3cd817aeb84db1077aafe1a1604/runtime/objc-block-trampolines.mm#L371). This change scans both regions to fix false positives. See tests for details; unfortunately I was unable to reduce the trampoline example with imp_implementationWithBlock on a new class, so I'm resorting to something close to the bug as seen in the wild. Differential Revision: https://reviews.llvm.org/D129385
-opt-bisect-print-ir-path=foo will dump the IR to foo when opt-bisect-limit starts skipping passes. Currently we don't print the IR if the opt-bisect-limit is higher than the total number of times opt-bisect is called. This makes getting the IR right before a bad transform easier. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D133809
Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D133888
…erRead vecotr.transfer_read ops with minor identity indexing map is rank reducing, with implicit leading unit dimensions. This should be a natural extension to support in addition to full identity indexing maps. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D133883
This patch fixes three warnings of the form: mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1436:5: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively. Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests. Differential Revision: https://reviews.llvm.org/D133868
Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see 0a14551. Differential Revision: https://reviews.llvm.org/D133869
…parsing commands."" This reverts commit ac05bc0. I had incorrectly removed one set of checks in the option handling in Options::ParseAlias because I couldn't see what it is for. It was a bit obscure, but it handled the case where you pass "-something=other --" as the input_line, which caused the built-in "run" alias not to return the right value for IsDashDashCommand, causing TestHelp.py to fail.
I used RV32 so I didn't have to write RV32I and RV32E. Ideally these builtins will be wrapped in a header someday so long term I don't expect users to see these errors. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D133444
According to logs, ClInstrumentationWithCallThreshold is workaround for slow backend with large number of basic blocks. However, I can't reproduce that one, but I see significant slowdown after ClCheckConstantShadow. Without ClInstrumentationWithCallThreshold compiler is able to eliminate many of the branches. So maybe we should drop ClInstrumentationWithCallThreshold completly. For now I just change the logic to ignore constant shadow so it will not trigger callback fallback too early. Reviewed By: kstoimenov Differential Revision: https://reviews.llvm.org/D133880
Split these changes out from https://reviews.llvm.org/D133780.
This is what ld64 does (though it doesn't use ICF to do this; instead it always dedups selrefs by default). We'll want to dedup implicitly-defined selrefs as well, but I will leave that for future work. Additionally, I'm not *super* happy with the current LLD implementation because I think it is rather janky and inefficient. But at least it moves us toward the goal of closing the size gap with ld64. I've described ideas for cleaning up our implementation here: #57714 Differential Revision: https://reviews.llvm.org/D133780
Pre-commit for D133898 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133899
IntrArgMemOnly is only valid for intrinsics that use a scalar pointer argument. These intrinsics use a vector of pointer. Alias analysis will try to find a scalar pointer argument and will return incorrect alias results when it doesn't find one. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133898
The comment was refering to a now non-existant function that was removed in 93e3cf0. Differential Revision: https://reviews.llvm.org/D133098
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )