Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 4, 2025

No description provided.

AmrDeveloper and others added 30 commits November 3, 2025 18:08
Upstream the FPToFP Builtin CeilOp
…lvm#165731)

Use a flag to determine whether this macro should be set when
intializing the preprocessor.

This macro was added to the driver in
9d117e7 because it can be conditionally
disabled, but before that, the flag to gate behavior was removed under
the assumption it wasn't conditional in
b5b622a. This patch is to connect the
macro with the preexisting flag
…tweak (llvm#163726)

Prevents the tweak from splitting **qualified names** (e.g.,
`foo::Type`) by incorrectly inserting a space around the scope
resolution (`::`).

**Before:**

```cpp
// input:
virtual foo::Type::func() = 0 
// output:
foo :: Type :: func()
```

**After:**

```cpp
// input:
virtual foo::Type::func() = 0 
// output:
foo::Type::func()
```
The code to apply relocations was sometimes creating unaligned
destination pointers. Instead of giving them an explicit type (i.e.
`uint64_t *`) and forcing the compiler to generate unaligned stores,
mark the pointer as `void *`. The compiler will figure out the correct
series of store instructions.
…#166055)

Report a diagnostic in case vector_size or ext_vector_type attributes
are used with a negative size. The same evaluation result can be used
for other checks, for example, the too big a size.

Issue llvm#165463
We had a couple in the llvm/actions repository that were pinned to main.
Pin them to the latest SHA in main to keep them consistent with
everything else. These also ensures we are compliant with our own CI
best practices and also cleans up the remaining CodeQL findings for this
specific issue.
8dae17b refactors basic_string for more
code reuse. This makes sense in most cases, but has performance overhead
in the case of ~basic_string. The refactoring of ~basic_string to call
__reset_internal_buffer() added a redundant (inside the destructor)
reset of the object, which the optimizer is unable to optimize away in
many cases. This patch prevents a ~1% regression we observed on an
internal workload when applying the original refactoring. This does
slightly pessimize the code readability, but I think this change is
worth it given the performance impact.

I'm hoping to add a benchmark(s) to the upstream libc++ benchmark suite
around string construction/destruction to ensure that this case does not
regress as it seems common in real world applications. I will put up a
separate PR for that when I figure out a reasonable way to write it.
Nikita Popov reported an issue ([1]) where a dangling weak symbol
__bpf_trap is in the final binary and this caused libbpf failing like
below:

  $ veristat -v ./t.o
  Processing 't.o'...
  libbpf: elf: skipping unrecognized data section(4) .eh_frame
  libbpf: elf: skipping relo section(5) .rel.eh_frame for section(4) .eh_frame
  libbpf: failed to find BTF for extern '__bpf_trap': -3
  Failed to open './t.o': -3

In llvm, the dag selection phase generates __bpf_trap in code. Later the
UnreachableBlockElim pass removed __bpf_trap from the code, but
__bpf_trap symbol survives in the symbol table.

Having a dangling __bpf_trap weak symbol is not good for old kernels as
seen in the above veristat failure. Although users could use compiler
flag `-mllvm -bpf-disable-trap-unreachable` to workaround the issue,
this patch fixed the issue by removing the dangling __bpf_trap.

  [1] llvm#165696
…PRs (llvm#165801)

When a PR is submitted the macos-14 workflow will run with LTO/PGO
disabled. This makes it possible to run the workflow on the free runners
with the six hour timeout and will allow us to test the workflow on pull
requests.
All of the users of this function are guarded by LLVM_ON_UNIX and
LLVM_ENABLE_THREADS ifdefs, so wrap the function itself in these guards
as well to avoid the unused function warning.
[llvm#159474](llvm#159474)

- All printf variants set errno and consistently return -1 on error,
instead of returning various predefined error codes
- Return value overflow handling is added
It did compute the length only on the first line, and thus the following
lines could be (and in the test example were) moved over the column
limit, when the = was aligned.
Fix the parameter name in the BuildExtVectorType function, also updating
the code style to be consistent with BuildVectorType

Discovered in llvm#166055
…t for callbr instruction with inline-asm (llvm#152161) (llvm#166195)

Reapply llvm#152161 with fixed 'changed' flags.
… path (llvm#164893)

This is a follow up to llvm#162509.

Using the `SearchPathW` API, we can ensure that the correct version of
Python is installed before `liblldb` is loaded (and `python.dll`
subsequently). If it's not, we try to add it to the search path with the
methods introduced in llvm#162509.
If that fails or if that method is `#ifdef`'d out, we print an error
which will appear before lldb crashes due to the missing dll.

Before llvm#162509, when invoked
from Powershell, lldb would silently crash (no error message/crash
report). After llvm#162509, it
crashes without any indications that the root cause is the missing
python.dll. With this patch, we print the error before crashing.
\llvm#166081 forgot to actually use this as the body.
…166226)

Due to me not double-checking my PR, an overly eager AI auto-completion
made it into my previous PR :/
…64281)

This commit adds move constructor, move assignment and `swap`
to `exception_ptr`. Adding those operators allows us to avoid
unnecessary calls to `__cxa_{inc,dec}rement_refcount`.

Performance results (from libc++'s CI):

```
Benchmark                               Baseline    Candidate    Difference    % Difference
------------------------------------  ----------  -----------  ------------  --------------
bm_exception_ptr_copy_assign_nonnull        9.77         9.94          0.18           1.79%
bm_exception_ptr_copy_assign_null          10.29        10.65          0.35           3.42%
bm_exception_ptr_copy_ctor_nonnull          7.02         7.01         -0.01          -0.13%
bm_exception_ptr_copy_ctor_null            10.54        10.60          0.06           0.56%
bm_exception_ptr_move_assign_nonnull       16.92        13.76         -3.16         -18.70%
bm_exception_ptr_move_assign_null          10.61        10.76          0.14           1.36%
bm_exception_ptr_move_ctor_nonnull         13.31        10.25         -3.06         -23.02%
bm_exception_ptr_move_ctor_null            10.28         7.30         -2.98         -28.95%
bm_exception_ptr_swap_nonnull              19.22         0.63        -18.59         -96.74%
bm_exception_ptr_swap_null                 20.02         7.79        -12.23         -61.07%
```

As expected, the `bm_exception_ptr_copy_*` benchmarks are not influenced by
this change. `bm_exception_ptr_move_*` benefits between 18% and 30%. The
`bm_exception_ptr_swap_*` tests show the biggest improvements since multiple
calls to the copy constructor are replaced by a simple pointer swap.

While `bm_exception_ptr_move_assign_null` did not show a regression in the CI
measurements, local measurements showed a regression from 3.98 to 4.71, i.e. by
18%. This is due to the additional `__tmp` inside `operator=`. The destructor
of `__other` is a no-op after the move because `__other.__ptr` will be a
nullptr. However, the compiler does not realize this, since the destructor is
not inlined and is lacking a fast-path. As such, the swap-based implementation
leads to an additional destructor call. `bm_exception_ptr_move_assign_nonnull`
still benefits because the swap-based move constructor avoids unnecessary
__cxa_{in,de}crement_refcount calls. As soon as we inline the destructor, this
regression should disappear again.

Works towards llvm#44892
…member function from a CallExpr. (llvm#166101)

There's a bug illustrated by this example:

```
template <typename T>
struct Holder {
	T value;
	
	T& operator*() { return value; }
};

struct X {
	using Dispatch = float (X::*)() [[clang::nonblocking]];
    
	void fails(Holder<Dispatch>& holder) [[clang::nonblocking]]
	{
		(this->*(*holder))();   <<< the expression is incorrectly determined not to be nonblocking
	}

	void succeeds(Holder<Dispatch>& holder) [[clang::nonblocking]]
	{
		auto func = *holder;
		(this->*func)();
	}
};
```

In both cases we have a `CXXMemberCallExpr`. In `succeeds`, the
expression refers to a `Decl` (`func`) and gets a useful PTMF type. In
`fails`, the expression does not refer to a `Decl` and its type is
special, printed as `bound member function`. `Expr` provides a method
for extracting the true type so we can use that in this situation.

---------

Co-authored-by: Doug Wyatt <dwyatt@apple.com>
Co-authored-by: Sirraide <aeternalmail@gmail.com>
…name. NFC (llvm#165912)

Note, X86 forces a frame pointer for stackmaps/patchpoint. So they use
RBP where we use SP.
…#165293)

The default behavior is to _not_ copy such swiftmodules into the dSYM,
as perviously implemented in 96f95c9.
This patch adds the option to override the behavior, so that such
swiftmodules can be copied into the dSYM.

This is useful when the dSYM will be used on a machine which has a
different Xcode/SDK than where the swiftmodules were built. Without
this, when LLDB is asked to "p/po" a Swift variable, the underlying
Swift compiler code would rebuild the dependent `.swiftmodule` files of
the Swift stdlibs, which takes ~1 minute in some cases.

See PR for tests.
All builtin Clang headers need to be covered by the modulemap.

This fixes llvm#166173
…ainerGlobals.cpp (llvm#166231)

This PR fixes the appearance of the following warning message when
building LLVM with clang (21.1.2)
```
[48/100] Building CXX object lib/Target/DirectX/CMakeFiles/LLVMDirectXCodeGen.dir/DXContainerGlobals.cpp.o
In file included from /nix/store/ffrg0560kj0066s4k9pznjand907nlnz-gcc-14.3.0/include/c++/14.3.0/cassert:44,
                 from /workspace/llvm-project/llvm/include/llvm/Support/Endian.h:19,
                 from /workspace/llvm-project/llvm/include/llvm/Support/MD5.h:33,
                 from /workspace/llvm-project/llvm/lib/Target/DirectX/DXContainerGlobals.cpp:28:
/workspace/llvm-project/llvm/lib/Target/DirectX/DXContainerGlobals.cpp: In lambda function:
/workspace/llvm-project/llvm/lib/Target/DirectX/DXContainerGlobals.cpp:198:78: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
  198 |                (uint64_t)Binding.LowerBound + Binding.Size - 1 <= UINT32_MAX &&
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
  199 |                    "Resource range is too large");
      |                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ```
 
 I marked this PR as an NFC because it only modifies an assertion condition to remove a compiler warning.
This commit aggregates the following changes:
1. Fix the format (i.e., indentation) when printing stop hooks via `target
   stop-hook list`.
2. Add `IndentScope Stream::MakeIndentScope()` to make managing (and restoring!)
   of the indentation level on `Stream` instances more ergonomic and less error
   prone.
3. Simplify printing of stop hooks using the new `IndentScope`.
mtrofin and others added 22 commits November 3, 2025 22:12
Excluding test areas that (1) don't really pertain to the profcheck effort, and (2) are easier to maintain this way.
…6040)

LowerCallTo handles all of the ABI details, including the load of
implicit sret return to the expected result positions.
Match the IR type that clang uses here: https://godbolt.org/z/KzbodEcxh

This was manually selecting the IR legal type. Instead just set the
flag to ensure legal types.
…m#165737)

Using invalid syncscopes on certain NVVM intrinsics causes an obscure
error to appear: (error 9: NVVM_ERROR_COMPILATION), libNVVM extra log:
Could not find scope ID=5.

This is not a very helpful error. A much more useful error would be
something like 'NVPTX does not support syncscope "agent"'

This would immediately make it clear that the issue is not NVPTX
specific, but actually from code being fed to NVPTX. This would save
users time in debugging issues related to this.
…8710)

Currently, Scudo always returns the exact size allocated when calling
getUsableSize. This can be a performance issue where some programs will
get the usable size and do unnecessary calls to realloc since they think
there isn't enough space in the allocation. By default, usable size will
still return the exact size of the allocation.

Note that if the exact behavior is disabled and MTE is on, then the code
will still give an exact usable size.
Instead of relying on any pass manager to schedule Polly's passes, add
Polly's own pipeline manager which is seen as a monolithic pass in
LLVM's pass manager. Polly's former passes are now phases of the new
PhaseManager component.

Relying on LLVM's pass manager (the legacy as well as the New Pass
Manager) to manage Polly's phases never was a good fit that the
PhaseManager resolves:

* Polly passes were modifying analysis results, in particular RegionInfo
and ScopInfo. This means that there was not just one unique and
"definite" analysis result, the actual result depended on which analyses
ran prior, and the pass manager was not allowed to throw away cached
analyses or prior SCoP optimizations would have been forgotten. The LLVM
pass manger's persistance of analysis results is not contractual but
designed for caching.

* Polly depends on a particular execution order of passes and regions
(e.g. regression tests, invalidation of consecutive SCoPs). LLVM's pass
manager does not guarantee any excecution order.

* Polly does not completely preserve DominatorTree, RegionInfo,
LoopInfo, or ScalarEvolution, but only as-needed for Polly's own uses.
Because the ScopDetection object stores references to those analyses, it
still had to lie to the pass manager that they would be preserved, or
the pass manager would have released and recomputed the invalidated
analysis objects that ScopDetection/ScopInfo was still referencing. To
ensure that no non-Polly pass would see these not-completely-preserved
analyses, all analyses still had to be thrown away after the
ScopPassManager, respectively with a BarrierNoopPass in case of the LPM.
 
* The NPM's PassInstrumentation wraps the IR unit into an `llvm::Any`
object, but implementations such as PrintIRInstrumentation call
llvm_unreachable on encountering an unknown IR unit, such as SCoPs, with
no extension points to add support. Hence LLVM crashes when dumping IR
between SCoP passes (such as `-print-before-changed` with Polly being
active).

The new PhaseManager uses some command line options that previously
belonged to Polly's legacy passes, such as `-polly-print-detect` (so the
option will continue to work). Hence the LPM support is incompatible
with the new approach and support for it is removed.
…riantUnswitchConditionalBranch` (llvm#164270)

A new branch is created on the same condition as a branch for which we have a profile. We can reuse that profile in this case.

Issue llvm#147390
llvm#165907)

We currently do not handle errors in task_set_exc_guard_behavior. If
this fails, mmap can unexpectedly crash.
We also do not currently provide a clear warning if no external
symbolizers are found.

rdar://163798535
This is especially important for writing i32 values larger than 2gb
which need to be encoded as negative SLEB vales in the binary.

Without this change offsets over 2gb are wrongly encoded and cause
validation errors.

Fixes: emscripten-core/emscripten#25706
Avoid cloning constant island helps to reduce app size, especially for
BOLT optimization in which cloning would happen when a function is split
into multiple fragments. Add an option to make the cloning optional, and
we will introduce a new pass to handle the reference too far error that
may result from disabling constant island cloning (llvm#165787).
…165736)

Added NVVM dialect operations for stochastic rounding (.rs) conversions
from F32 to various packed floating-point formats. These operations map
to existing PTX instructions and LLVM intrinsics.

Supported conversions:
- F32x2 to F16x2/BF16x2 (with optional relu and satfinite modifiers)
- F32x4 to packed F8 formats (E4M3, E5M2)
- F32x4 to packed F6 formats (E2M3, E3M2)
- F32x4 to packed F4 format (E2M1)

All operations support stochastic rounding with randomness provided via
an rbits parameter, and optional relu and saturation modifiers.
…lvm#165885)

Guest exit thunks serve as glue for performing direct calls, so they
shouldn’t treat the target as an indirect one.

Spotted by @coneco-cy in llvm#165504.
Drop hfsort in favor of a more modern function reordering algorithm.
…#165180)

When converting a function, convert only the entry block signature. The
remaining block signatures should be converted by the respective
branching ops. The `FuncToLLVM` / `ControlFlowToLLVM` patterns already
use that design.

```c++
struct BranchOpLowering : public ConvertOpToLLVMPattern<cf::BranchOp> {

  LogicalResult
  matchAndRewrite(cf::BranchOp op, OneToNOpAdaptor adaptor,
                  ConversionPatternRewriter &rewriter) const override {
    // Convert successor block.
    SmallVector<Value> flattenedAdaptor = flattenValues(adaptor.getOperands());
    FailureOr<Block *> convertedBlock =
        getConvertedBlock(rewriter, getTypeConverter(), op, op.getSuccessor(),
                          TypeRange(ValueRange(flattenedAdaptor)));
    // ...
  }
};
```

This is consistent with the fact that operations from unreachable blocks
are not put on the initial worklist.

With this change, parent ops are no longer recursively legalized when
inserting a block, simplifying the conversion driver a bit.

Note for LLVM integration: If you are seeing failures, make sure to:
- Drop `converter.isLegal(&op.getBody())` when checking the legality of
a function op. Only the entry block signature / function type should be
taken into account.
- If you need to convert all reachable blocks and are using `cf`
branching ops, add `populateCFStructuralTypeConversionsAndLegality`.
- If you need to convert all reachable blocks and are using custom
branching ops, implement and populate custom structural type conversion
patterns, similar to `populateCFStructuralTypeConversionsAndLegality`.
…lvm#166040)" (llvm#166262)

This reverts commit a522ae3.

The ABI handling doesn't account for matching the C ABI, only implicit
sret.
Make sure the iOS with/without sincos_stret are tested
…xits (llvm#164990)

"!$acc loop" directive may be placed above loops with early exits.

Currently flang lowers loop with early exits to explicit control flow
(this may be revisited when MLIR allows early exits in structured
region). The acc loop directive cannot simply be ignored in such case in
lowering because it may hold data clauses that should be applied when
reaching that point.

This patch adds an "unstructured" attribute to acc.loop to support that
case.
An acc.loop with such attributes may hold data operands but must have no
controls. It is expected that the loop logic is implemented in its body
in a way that the acc dialect may not understand.

Such acc.loop is just a container and the loop with early exit will be
executed sequentially.
@ronlieb ronlieb requested review from a team and dpalermo November 4, 2025 01:09
@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit 264a8fc into amd-staging Nov 4, 2025
24 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251103184739 branch November 4, 2025 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.