forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 75
merge main into amd-staging #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
z1-cciauto
merged 62 commits into
amd-staging
from
amd/merge/upstream_merge_20251103184739
Nov 4, 2025
Merged
merge main into amd-staging #481
z1-cciauto
merged 62 commits into
amd-staging
from
amd/merge/upstream_merge_20251103184739
Nov 4, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Upstream the FPToFP Builtin CeilOp
…lvm#165731) Use a flag to determine whether this macro should be set when intializing the preprocessor. This macro was added to the driver in 9d117e7 because it can be conditionally disabled, but before that, the flag to gate behavior was removed under the assumption it wasn't conditional in b5b622a. This patch is to connect the macro with the preexisting flag
…tweak (llvm#163726) Prevents the tweak from splitting **qualified names** (e.g., `foo::Type`) by incorrectly inserting a space around the scope resolution (`::`). **Before:** ```cpp // input: virtual foo::Type::func() = 0 // output: foo :: Type :: func() ``` **After:** ```cpp // input: virtual foo::Type::func() = 0 // output: foo::Type::func() ```
The code to apply relocations was sometimes creating unaligned destination pointers. Instead of giving them an explicit type (i.e. `uint64_t *`) and forcing the compiler to generate unaligned stores, mark the pointer as `void *`. The compiler will figure out the correct series of store instructions.
…#166055) Report a diagnostic in case vector_size or ext_vector_type attributes are used with a negative size. The same evaluation result can be used for other checks, for example, the too big a size. Issue llvm#165463
We had a couple in the llvm/actions repository that were pinned to main. Pin them to the latest SHA in main to keep them consistent with everything else. These also ensures we are compliant with our own CI best practices and also cleans up the remaining CodeQL findings for this specific issue.
8dae17b refactors basic_string for more code reuse. This makes sense in most cases, but has performance overhead in the case of ~basic_string. The refactoring of ~basic_string to call __reset_internal_buffer() added a redundant (inside the destructor) reset of the object, which the optimizer is unable to optimize away in many cases. This patch prevents a ~1% regression we observed on an internal workload when applying the original refactoring. This does slightly pessimize the code readability, but I think this change is worth it given the performance impact. I'm hoping to add a benchmark(s) to the upstream libc++ benchmark suite around string construction/destruction to ensure that this case does not regress as it seems common in real world applications. I will put up a separate PR for that when I figure out a reasonable way to write it.
Nikita Popov reported an issue ([1]) where a dangling weak symbol __bpf_trap is in the final binary and this caused libbpf failing like below: $ veristat -v ./t.o Processing 't.o'... libbpf: elf: skipping unrecognized data section(4) .eh_frame libbpf: elf: skipping relo section(5) .rel.eh_frame for section(4) .eh_frame libbpf: failed to find BTF for extern '__bpf_trap': -3 Failed to open './t.o': -3 In llvm, the dag selection phase generates __bpf_trap in code. Later the UnreachableBlockElim pass removed __bpf_trap from the code, but __bpf_trap symbol survives in the symbol table. Having a dangling __bpf_trap weak symbol is not good for old kernels as seen in the above veristat failure. Although users could use compiler flag `-mllvm -bpf-disable-trap-unreachable` to workaround the issue, this patch fixed the issue by removing the dangling __bpf_trap. [1] llvm#165696
…PRs (llvm#165801) When a PR is submitted the macos-14 workflow will run with LTO/PGO disabled. This makes it possible to run the workflow on the free runners with the six hour timeout and will allow us to test the workflow on pull requests.
All of the users of this function are guarded by LLVM_ON_UNIX and LLVM_ENABLE_THREADS ifdefs, so wrap the function itself in these guards as well to avoid the unused function warning.
[llvm#159474](llvm#159474) - All printf variants set errno and consistently return -1 on error, instead of returning various predefined error codes - Return value overflow handling is added
It did compute the length only on the first line, and thus the following lines could be (and in the test example were) moved over the column limit, when the = was aligned.
Fix the parameter name in the BuildExtVectorType function, also updating the code style to be consistent with BuildVectorType Discovered in llvm#166055
…t for callbr instruction with inline-asm (llvm#152161) (llvm#166195) Reapply llvm#152161 with fixed 'changed' flags.
… path (llvm#164893) This is a follow up to llvm#162509. Using the `SearchPathW` API, we can ensure that the correct version of Python is installed before `liblldb` is loaded (and `python.dll` subsequently). If it's not, we try to add it to the search path with the methods introduced in llvm#162509. If that fails or if that method is `#ifdef`'d out, we print an error which will appear before lldb crashes due to the missing dll. Before llvm#162509, when invoked from Powershell, lldb would silently crash (no error message/crash report). After llvm#162509, it crashes without any indications that the root cause is the missing python.dll. With this patch, we print the error before crashing.
\llvm#166081 forgot to actually use this as the body.
…166226) Due to me not double-checking my PR, an overly eager AI auto-completion made it into my previous PR :/
…64281) This commit adds move constructor, move assignment and `swap` to `exception_ptr`. Adding those operators allows us to avoid unnecessary calls to `__cxa_{inc,dec}rement_refcount`. Performance results (from libc++'s CI): ``` Benchmark Baseline Candidate Difference % Difference ------------------------------------ ---------- ----------- ------------ -------------- bm_exception_ptr_copy_assign_nonnull 9.77 9.94 0.18 1.79% bm_exception_ptr_copy_assign_null 10.29 10.65 0.35 3.42% bm_exception_ptr_copy_ctor_nonnull 7.02 7.01 -0.01 -0.13% bm_exception_ptr_copy_ctor_null 10.54 10.60 0.06 0.56% bm_exception_ptr_move_assign_nonnull 16.92 13.76 -3.16 -18.70% bm_exception_ptr_move_assign_null 10.61 10.76 0.14 1.36% bm_exception_ptr_move_ctor_nonnull 13.31 10.25 -3.06 -23.02% bm_exception_ptr_move_ctor_null 10.28 7.30 -2.98 -28.95% bm_exception_ptr_swap_nonnull 19.22 0.63 -18.59 -96.74% bm_exception_ptr_swap_null 20.02 7.79 -12.23 -61.07% ``` As expected, the `bm_exception_ptr_copy_*` benchmarks are not influenced by this change. `bm_exception_ptr_move_*` benefits between 18% and 30%. The `bm_exception_ptr_swap_*` tests show the biggest improvements since multiple calls to the copy constructor are replaced by a simple pointer swap. While `bm_exception_ptr_move_assign_null` did not show a regression in the CI measurements, local measurements showed a regression from 3.98 to 4.71, i.e. by 18%. This is due to the additional `__tmp` inside `operator=`. The destructor of `__other` is a no-op after the move because `__other.__ptr` will be a nullptr. However, the compiler does not realize this, since the destructor is not inlined and is lacking a fast-path. As such, the swap-based implementation leads to an additional destructor call. `bm_exception_ptr_move_assign_nonnull` still benefits because the swap-based move constructor avoids unnecessary __cxa_{in,de}crement_refcount calls. As soon as we inline the destructor, this regression should disappear again. Works towards llvm#44892
…member function from a CallExpr. (llvm#166101) There's a bug illustrated by this example: ``` template <typename T> struct Holder { T value; T& operator*() { return value; } }; struct X { using Dispatch = float (X::*)() [[clang::nonblocking]]; void fails(Holder<Dispatch>& holder) [[clang::nonblocking]] { (this->*(*holder))(); <<< the expression is incorrectly determined not to be nonblocking } void succeeds(Holder<Dispatch>& holder) [[clang::nonblocking]] { auto func = *holder; (this->*func)(); } }; ``` In both cases we have a `CXXMemberCallExpr`. In `succeeds`, the expression refers to a `Decl` (`func`) and gets a useful PTMF type. In `fails`, the expression does not refer to a `Decl` and its type is special, printed as `bound member function`. `Expr` provides a method for extracting the true type so we can use that in this situation. --------- Co-authored-by: Doug Wyatt <dwyatt@apple.com> Co-authored-by: Sirraide <aeternalmail@gmail.com>
…name. NFC (llvm#165912) Note, X86 forces a frame pointer for stackmaps/patchpoint. So they use RBP where we use SP.
…#165293) The default behavior is to _not_ copy such swiftmodules into the dSYM, as perviously implemented in 96f95c9. This patch adds the option to override the behavior, so that such swiftmodules can be copied into the dSYM. This is useful when the dSYM will be used on a machine which has a different Xcode/SDK than where the swiftmodules were built. Without this, when LLDB is asked to "p/po" a Swift variable, the underlying Swift compiler code would rebuild the dependent `.swiftmodule` files of the Swift stdlibs, which takes ~1 minute in some cases. See PR for tests.
All builtin Clang headers need to be covered by the modulemap. This fixes llvm#166173
…ainerGlobals.cpp (llvm#166231) This PR fixes the appearance of the following warning message when building LLVM with clang (21.1.2) ``` [48/100] Building CXX object lib/Target/DirectX/CMakeFiles/LLVMDirectXCodeGen.dir/DXContainerGlobals.cpp.o In file included from /nix/store/ffrg0560kj0066s4k9pznjand907nlnz-gcc-14.3.0/include/c++/14.3.0/cassert:44, from /workspace/llvm-project/llvm/include/llvm/Support/Endian.h:19, from /workspace/llvm-project/llvm/include/llvm/Support/MD5.h:33, from /workspace/llvm-project/llvm/lib/Target/DirectX/DXContainerGlobals.cpp:28: /workspace/llvm-project/llvm/lib/Target/DirectX/DXContainerGlobals.cpp: In lambda function: /workspace/llvm-project/llvm/lib/Target/DirectX/DXContainerGlobals.cpp:198:78: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] 198 | (uint64_t)Binding.LowerBound + Binding.Size - 1 <= UINT32_MAX && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ 199 | "Resource range is too large"); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` I marked this PR as an NFC because it only modifies an assertion condition to remove a compiler warning.
This commit aggregates the following changes: 1. Fix the format (i.e., indentation) when printing stop hooks via `target stop-hook list`. 2. Add `IndentScope Stream::MakeIndentScope()` to make managing (and restoring!) of the indentation level on `Stream` instances more ergonomic and less error prone. 3. Simplify printing of stop hooks using the new `IndentScope`.
Excluding test areas that (1) don't really pertain to the profcheck effort, and (2) are easier to maintain this way.
…6040) LowerCallTo handles all of the ABI details, including the load of implicit sret return to the expected result positions.
Match the IR type that clang uses here: https://godbolt.org/z/KzbodEcxh This was manually selecting the IR legal type. Instead just set the flag to ensure legal types.
…m#165737) Using invalid syncscopes on certain NVVM intrinsics causes an obscure error to appear: (error 9: NVVM_ERROR_COMPILATION), libNVVM extra log: Could not find scope ID=5. This is not a very helpful error. A much more useful error would be something like 'NVPTX does not support syncscope "agent"' This would immediately make it clear that the issue is not NVPTX specific, but actually from code being fed to NVPTX. This would save users time in debugging issues related to this.
…8710) Currently, Scudo always returns the exact size allocated when calling getUsableSize. This can be a performance issue where some programs will get the usable size and do unnecessary calls to realloc since they think there isn't enough space in the allocation. By default, usable size will still return the exact size of the allocation. Note that if the exact behavior is disabled and MTE is on, then the code will still give an exact usable size.
Instead of relying on any pass manager to schedule Polly's passes, add Polly's own pipeline manager which is seen as a monolithic pass in LLVM's pass manager. Polly's former passes are now phases of the new PhaseManager component. Relying on LLVM's pass manager (the legacy as well as the New Pass Manager) to manage Polly's phases never was a good fit that the PhaseManager resolves: * Polly passes were modifying analysis results, in particular RegionInfo and ScopInfo. This means that there was not just one unique and "definite" analysis result, the actual result depended on which analyses ran prior, and the pass manager was not allowed to throw away cached analyses or prior SCoP optimizations would have been forgotten. The LLVM pass manger's persistance of analysis results is not contractual but designed for caching. * Polly depends on a particular execution order of passes and regions (e.g. regression tests, invalidation of consecutive SCoPs). LLVM's pass manager does not guarantee any excecution order. * Polly does not completely preserve DominatorTree, RegionInfo, LoopInfo, or ScalarEvolution, but only as-needed for Polly's own uses. Because the ScopDetection object stores references to those analyses, it still had to lie to the pass manager that they would be preserved, or the pass manager would have released and recomputed the invalidated analysis objects that ScopDetection/ScopInfo was still referencing. To ensure that no non-Polly pass would see these not-completely-preserved analyses, all analyses still had to be thrown away after the ScopPassManager, respectively with a BarrierNoopPass in case of the LPM. * The NPM's PassInstrumentation wraps the IR unit into an `llvm::Any` object, but implementations such as PrintIRInstrumentation call llvm_unreachable on encountering an unknown IR unit, such as SCoPs, with no extension points to add support. Hence LLVM crashes when dumping IR between SCoP passes (such as `-print-before-changed` with Polly being active). The new PhaseManager uses some command line options that previously belonged to Polly's legacy passes, such as `-polly-print-detect` (so the option will continue to work). Hence the LPM support is incompatible with the new approach and support for it is removed.
Haven't yet addressed this pass
…riantUnswitchConditionalBranch` (llvm#164270) A new branch is created on the same condition as a branch for which we have a profile. We can reuse that profile in this case. Issue llvm#147390
llvm#165907) We currently do not handle errors in task_set_exc_guard_behavior. If this fails, mmap can unexpectedly crash. We also do not currently provide a clear warning if no external symbolizers are found. rdar://163798535
This is especially important for writing i32 values larger than 2gb which need to be encoded as negative SLEB vales in the binary. Without this change offsets over 2gb are wrongly encoded and cause validation errors. Fixes: emscripten-core/emscripten#25706
Avoid cloning constant island helps to reduce app size, especially for BOLT optimization in which cloning would happen when a function is split into multiple fragments. Add an option to make the cloning optional, and we will introduce a new pass to handle the reference too far error that may result from disabling constant island cloning (llvm#165787).
…165736) Added NVVM dialect operations for stochastic rounding (.rs) conversions from F32 to various packed floating-point formats. These operations map to existing PTX instructions and LLVM intrinsics. Supported conversions: - F32x2 to F16x2/BF16x2 (with optional relu and satfinite modifiers) - F32x4 to packed F8 formats (E4M3, E5M2) - F32x4 to packed F6 formats (E2M3, E3M2) - F32x4 to packed F4 format (E2M1) All operations support stochastic rounding with randomness provided via an rbits parameter, and optional relu and saturation modifiers.
…lvm#165885) Guest exit thunks serve as glue for performing direct calls, so they shouldn’t treat the target as an indirect one. Spotted by @coneco-cy in llvm#165504.
This would trigger error in ptxas.
Drop hfsort in favor of a more modern function reordering algorithm.
…#165180) When converting a function, convert only the entry block signature. The remaining block signatures should be converted by the respective branching ops. The `FuncToLLVM` / `ControlFlowToLLVM` patterns already use that design. ```c++ struct BranchOpLowering : public ConvertOpToLLVMPattern<cf::BranchOp> { LogicalResult matchAndRewrite(cf::BranchOp op, OneToNOpAdaptor adaptor, ConversionPatternRewriter &rewriter) const override { // Convert successor block. SmallVector<Value> flattenedAdaptor = flattenValues(adaptor.getOperands()); FailureOr<Block *> convertedBlock = getConvertedBlock(rewriter, getTypeConverter(), op, op.getSuccessor(), TypeRange(ValueRange(flattenedAdaptor))); // ... } }; ``` This is consistent with the fact that operations from unreachable blocks are not put on the initial worklist. With this change, parent ops are no longer recursively legalized when inserting a block, simplifying the conversion driver a bit. Note for LLVM integration: If you are seeing failures, make sure to: - Drop `converter.isLegal(&op.getBody())` when checking the legality of a function op. Only the entry block signature / function type should be taken into account. - If you need to convert all reachable blocks and are using `cf` branching ops, add `populateCFStructuralTypeConversionsAndLegality`. - If you need to convert all reachable blocks and are using custom branching ops, implement and populate custom structural type conversion patterns, similar to `populateCFStructuralTypeConversionsAndLegality`.
…lvm#166040)" (llvm#166262) This reverts commit a522ae3. The ABI handling doesn't account for matching the C ABI, only implicit sret.
Make sure the iOS with/without sincos_stret are tested
…xits (llvm#164990) "!$acc loop" directive may be placed above loops with early exits. Currently flang lowers loop with early exits to explicit control flow (this may be revisited when MLIR allows early exits in structured region). The acc loop directive cannot simply be ignored in such case in lowering because it may hold data clauses that should be applied when reaching that point. This patch adds an "unstructured" attribute to acc.loop to support that case. An acc.loop with such attributes may hold data operands but must have no controls. It is expected that the loop logic is implemented in its body in a way that the acc dialect may not understand. Such acc.loop is just a container and the loop with early exit will be executed sequentially.
dpalermo
approved these changes
Nov 4, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.