forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge upstream/main into amd-trunk-dev (17.12.2024) #232
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I am trying to switch to keeping the reduction value in a temporary scalar location so that I can use hlfir::genLoopNest easily. This also allows using omp.loop_nest with worksharing for OpenMP.
This is a NFC change. Remove duplicated test line in gfx11/gfx12 vop1 test file with the latest update_mc_test_script.py --unique option This is also preparing for the up-coming true16 change
) This change affects non-relocation mode only. Prior to having CheckLargeFunctions pass, we could have emitted code for functions that was discarded at the end due to size limitations. Since we didn't know at the time of emission if the code would be discarded or not, we had to emit jump tables in separate sections and handle them separately. However, now we always run CheckLargeFunctions and make sure all emitted code is used. Thus, we can get rid of the special jump table handling.
…lvm#119776) Mask only instructions like vmand and vmsbf should always have 0 for their Log2SEW operand. Non-mask instructions should only have 3, 4, 5, or 6 for their Log2SEW operand. Split the operand type so we can verify these cases separately. I had to fix the SEW for whole register move to vmv.v.v copy optimization and update an mir test. The vmv.v.v change isn't functional since we have already done vsetvli insertion before and nothing else uses the field after copy expansion. I can split these changes off if desired.
…97158) Generalize hoistCommonCodeFromSuccessors's `EqTermsOnly` to `AllInstsEqOnly` and always allow hoisting if all instructions match. In that case, all instructions can be hoisted and the original branch will be replaced and selects for PHIs are added. This allows preserving metadata in more cases, using the existing hoisting logic, whereas previously FoldTwoEntryPHINode would drop the metadata. https://llvm-compile-time-tracker.com/compare.php?from=716360367fbdabac2c374c19b8746f4de49a5599&to=986b2c47df516b31d998c055400e4f62aa76edc6&stat=instructions:u PR: llvm#97158
llvm#119920) This refactoring will allow to make this function weak later on so that it could be overloaded by a client. See llvm#119242.
…lvm#119919) Fix the condition so the implicit device data attribute is not applied when the routine has `attribute(host)`
This is another new clause specific to 'exit data' that takes a pointer argument. This patch implements this the same way we do a few other clauses (like attach) that have the same restrictions.
In these tests, we just want to add one instance of IndexedMemProfRecord to MemProfData.Records and retrieve it from MemProfReader. There is no particular reason to associate F1.hash() with the IndexedMemProfRecord instance. A fake value suffices. While I am at it, I'm switching to try_emplace so that I can move FakeRecord.
…ate (NFC) (llvm#119831) This patch makes the following functions private: - InstrProfWriter::addMemProfRecord - InstrProfWriter::addMemProfFrame - InstrProfWriter::addMemProfCallStack These days, we add MemProf profile to the writer context via addMemProfData. We no longer add individual items.
Strip hash_value() for CmpPredicate, as different callers have different hashing use-cases. In this case, there is just one caller, namely EarlyCSE, which calls hash_combine() on a CmpPredicate, which used to call hash_combine() on a CmpInst::Predicate prior to 4a0d53a (PatternMatch: migrate to CmpPredicate). This has uncovered a bug where two icmp instructions differing in just the fact that one of them has the samesign flag on it are hashed differently, leading to divergent hashing, and a crash. Fix this crash by dropping samesign information on icmp instructions before hashing them, preserving the former behavior. Fixes llvm#119893.
…VALUE (llvm#119927) Dummy arguments with the VALUE attribute do not need the implicit data attribute.
…ing in client code (llvm#119242)
…m#119642) Update integer range narrowing to handle negative values. The previous restriction to only narrowing known-non-negative values wasn't needed, as both the signed and unsigned ranges represent bounds on the values of each variable in the program ... except that one might be more accurate than the other. So, if either the signed or unsigned interpretetation of the inputs and outputs allows for integer narrowing, the narrowing is permitted. This commit also updates the integer optimization rewrites to preserve the stae of constant-like operations and those that are narrowed so that rewrites of other operations don't lose that range information.
…119759) Currently, when dumping the contents of a GSYM there are three issues: - Callsite information is not displayed for merged functions - this is because of a bug in `CallSiteInfoLoader::buildFunctionMap` where when enumerating through `Func.MergedFunctions` - we enumerate by value instead of by reference. - There is no variable indent for printing callsite info - meaning that when printing callsites for merged functions, the indent will be different than the other info of the merged function. To address this we add configurable indent for printing callsite info - Callsite info is printed right after merged function info. Meaning that if the merged function also has call site information, the parent's callsite info will appear right after the merged function's callsite info - leading to confusion. To address this we print the callsite info first, then the merged functions info. This change addresses all the above 3 issues. Example of old vs new: <img width="1074" alt="image" src="https://github.com/user-attachments/assets/d039ad69-fa79-4abb-9816-eda9cc2eda53" />
This was preventing the containers from being pushed to the registry.
The windows container push was not tested in the pull request and had a couple of typos that prevented it from functioning. This patch fixes that so we can actually push the container to GHCR.
and forward it to LinkerDriver's ctor so that some uses of the global `config` can be dropped. This is similar to how the ELF port migrates away from the global `config`. Pull Request: llvm#119829
…lvm#119687)" Causes bot failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4246/steps/11/logs/stdio This reverts commit 7a64855.
Around shifting negative values.
This reverts commit 49c2207. This breaks on big-endian, again: https://lab.llvm.org/buildbot/#/builders/154/builds/9018
…#119938) Uppercase each word in title and toctree _Originally posted by @nicovank in llvm#119842 (comment). --------- Co-authored-by: Nicolas van Kempen <nvankemp@gmail.com>
Reverts llvm#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.
This patch essentially replaces: std::pair<const std::vector<Frame> *, unsigned> with: ArrayRef<Frame> This way, we can store and pass ArrayRef<Frame>, conceptually one item, instead of the pointer and index. The only problem is that we don't have an existing hash function for ArrayRef<Frame>>, so we provide a custom one, namely CallStackHash.
…19957) The macho-gsym-merged-callsites-dsym is failing on some hosts. Disabling for now while we come up with a fix.
This patch sets the default user in the linux CI container to a non-root user, which enables properly testing a couple of features, particularly in libcxx.
- Put the element size field in the same place for all non-pointer types. - Put the element size and address space fields in the same place for all pointer types. - Put the number of elements and scalable fields in the same place for all vector types. This simplifies initialization and accessor methods isScalable, getElementCount, getScalarSizeInBits and getAddressSpace.
FreeListHeap uses the _end symbol which conflicts with the _end symbol defined by GPU start.cpp files so for now we exclude the test and the fuzzer on GPU.
This patch adds a Github Actions workflow for Linux premerge. This currently just calls into the existing CI scripts as a starting point.
…vm#118549) This patch implements the following intrinsics: Multi-vector 8-bit floating-point multiply-add long. ``` c // Only if __ARM_FEATURE_SME_F8F16 != 0 void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, uint64_t imm_idx fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); ``` In accordance with: ARM-software/acle#323
…vm#118624) Summary: This and previously extracted `CloneFunction*Into` functions will be used in later diffs. Test Plan: ninja check-llvm-unit check-llvm
I noticed this while working on something else, these are supposed to be privately inherited.
…base API (llvm#115752) This patch reimplements the locale base support for Windows flavors in a way that is more modules-friendly and without defining non-internal names. Since this changes the name of some types and entry points in the built library, this is effectively an ABI break on Windows (which is acceptable after checking with the Windows/libc++ maintainers).
Instead of storing an auxilliary structure with the information from the DXIL resource target extension types duplicated, access the information that we can via the type itself. This also means we need to handle some of the target extension types we haven't fully defined yet, like Texture and CBuffer. For now we make an educated guess to what those should look like based on llvm/wg-hlsl#76, and we can update them fairly easily when we've defined them more thoroughly. First part of llvm#118400
This patch enables the new premerge workflow postcommit so that we can start testing it at a reasonable scale with minimal disruption.
Creates a new toctree "Support" under which we have distinct links to arch, platform, and compiler support. * Moved "Platform Support" from index landing page to new doc. * Created explicit "Architecture Support". Requested in llvm#118964 (comment). * Moved "Compiler Support" from Status toctree to new Support toctree. --------- Co-authored-by: Carlo Cabrera <github@carlo.cab>
This is causing mis-compiles when in SPEC2017 on AArch64 after b3cba9b.
Update VPReductionPHIRecipe::execute to use the start value from the start value operand of the recipe. This is needed to make sure we resume from the correct value during epilogue vectorization. At the moment, the start value is set to the sentinel value in adjustRecipesForReductions, as the original start value needs to be used when creating ResumePhi recipes. Fixes a mis-compile introduced by b3cba9b in SPEC2017 on AArch64.
Fix issue introduced by llvm#118839.
Resolves llvm#99161 - [x] Implement `WaveActiveAllTrue` clang builtin, - [x] Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WaveActiveAllTrue` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl` - [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114` in `DXIL.td` - [x] Create the `WaveActiveAllTrue.ll` and `WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in `IntrinsicsSPIRV.td` - [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue` lowering and map it to `int_spv_WaveActiveAllTrue` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`
…nt and use it as linkLibs for ModuleToObject (llvm#120116) This change allows to expose through an interface attributes wrapping content as external resources, and the usage inside the ModuleToObject show how we will be able to provide runtime libraries without relying on the filesystem.
…lvm#117487) Essentially, this makes this ill-formed: ```c++ using mat4 = _BitInt(12) [[clang::matrix_type(3, 3)]]; ``` This matches preexisting behaviour for vector types (e.g. `ext_vector_type`), and given that LLVM IR intrinsics for matrices also take vector types, it seems like a sensible thing to do. This is currently especially problematic since we sometimes lower matrix types to LLVM array types instead, and while e.g. `[4 x i32]` and `<4 x i32>` *probably* have the same similar memory layout (though I don’t think it’s sound to rely on that either, see llvm#117486), `[4 x i12]` and `<4 x i12>` definitely don’t.
/llvm-project/clang/lib/CodeGen/CGBuiltin.cpp:19441:17: error: unused variable 'Ty' [-Werror,-Wunused-variable] llvm::Type *Ty = Op->getType(); ^ 1 error generated.
According to https://docs.github.com/en/rest/using-the-rest-api/github-event-types?apiVersion=2022-11-28, When we look at the push event payload, github.event.push.head is a string containing the SHA. This is currently causing new commits on main to cancel the premerge pipeline of older commits.
Remove unused collection of context size information that was likely leftover from debugging / testing.
…m#120039) VPInstruction has a definition of mayWriteToMemory, which seems to only be used by VPlanSLP. However VPInstructions are already handled in VPRecipeBase::mayWriteToMemory, and everywhere else seems to use this definition. I think these should be the same for all intents and purposes. The VPRecipeBase definition is more conservative but returns true for stores/calls/invokes/SLPStores.
This follows GCC behavior of allowing a trailing immediate, that is ignored by the assembler.
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Fix bazel build after llvm#120116
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.