Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ModuleSplitter] llvm module splitter for parallel llvm compilation. #2

Draft
wants to merge 479 commits into
base: main
Choose a base branch
from

Conversation

weiweichen
Copy link
Owner

No description provided.

spall and others added 30 commits March 27, 2025 09:34
Add int overloads which cast the various ints to a float and call the
float builtin.
These overloads are conditional on hlsl version 202x or earlier.
Add tests and puts tests in own files, including some of the tests added
for double overloads.
Closes llvm#128229
Since Clang 20 has been release we no longer support Clang 18 per our
policy.

Note the Clang 18 workarounds will be removed in a follow-up patch.
This adds some initial documentation about freestanding requirements for
Clang. The most critical part of the documentation is spelling out that
a conforming freestanding C Standard Library is required; Clang will not
be providing the headers for <string.h> in C23 which expose a number of
symbols in freestanding mode.

The docs also make it clear that in addition to a conforming
freestanding C standard library, the library must provide some
additional symbols which LLVM requires.

These docs are not comprehensive, this is just getting the bare bones in
place so that they can be expanded later.

This also updates the C status page to make it clear that we don't have
anything to do for WG14 N2524 which adds string interfaces to
freestanding mode.
C2y adds the `_Countof` operator which returns the number of elements in
an array. As with `sizeof`, `_Countof` either accepts a parenthesized
type name or an expression. Its operand must be (of) an array type. When
passed a constant-size array operand, the operator is a constant
expression which is valid for use as an integer constant expression.

This is being exposed as an extension in earlier C language modes, but
not in C++. C++ already has `std::extent` and `std::size` to cover these
needs, so the operator doesn't seem to get the user enough benefit to
warrant carrying this as an extension.

Fixes llvm#102836
The main goal of this patch is to improve the
performance of concept subsumption by

- Making sure literal (atomic) clauses are de-duplicated (Whether 2
atomic constraint is established during the initial normal form
production).
 - Eagerly removing duplicated clauses.

This should minimize the risks of exponentially large formulas that can
be produced by a naive {C,D}NF transformation.

While at it, I restructured that part of the code to be a bit clearer.

Subsumption of fold expanded constraint is also cached.

---

Note that removing duplicated clauses seems to be necessary and
sufficient to have acceptable performance on anything that could be
construed as reasonable code.

Ultimately, the number of clauses is always going to be fairly small
(but $2^{fairly\ small}$ is quickly *fairly large*..).

I went too far in the rabbit hole of Tseitin transformations etc, which
was much faster but would then require to check satisfiabiliy to
establish subsumption between some constraints (although it was good
enough to pass all but ones of our tests...).

It doesn't help that the C++ standard has a very specific definition of
subsumption that is really more of an implication...

While that sort of musing is fascinating, it was ultimately a fool's
errand, at least until such time that there is more motivation for a SAT
solver in clang (clang-tidy can after all use z3!).

Here be dragons.

Fixes llvm#122581
We have a lot of missing Codesize costs for vector operations. This
patch starts things off by adding codesize costs for getVectorInstrCost,
returning a single cost instead of the VectorInsertExtractBaseCost
(which is typically 2). Insert of a load are given a cost of 0 as they
use ld1, otherwise the cost is 1.
)

Current usage of alignstack is restricted to LLVM pointer types, whereas
when it's used in parameters it's possible to use it for other types,
see examples like `{i8, i8}, [2 x float], etc` in `llvm/test/CodeGen`.
This PR lifts the restriction and add testcases.
…duling model (llvm#133165)

P500-series cores should have a floating point load latency closer to 5
cycles, just like P400- and P600-series cores.
…rface (llvm#133274)

The primary reason is that if you pass a TypeSize without explicitly
converting to LocationSize, you otherwise implicit convert to uint64_t
to call the respective LocationSize constructor. This means that any
scalable value becomes a runtime assertion failure.

By replacing uint64_t with TypeSize in this API, we avoid the implicit
conversion for TypeSize. uint64_t callers implicit convert to
LocationSize (via the raw constructor) which should have unchanged
behavior.
Test just needs an explicit triple that was missed.
Commit 20b7f59 includes a case that checks diagnostics for for
loops using thread locals.
This fails on platforms which do not support TLS.
This change adds guards to run this part of the test iff the feature is
supported.
llvm#132690)

Keep the start value as operand of ComputeFindLastIVResult. A follow-up
patch will use this to make sure the start value is frozen if needed.

Depends on llvm#132689

PR: llvm#132690
…llvm#132742)

This patch adds the `IsolatedFromAbove` trait as a dependent trait to
the `DataLayoutOpInterface` op interface.

The motivation behind this change comes from the implementation of the
`ptr` dialect, specifically the `ptr.type_offset` op. This op produces
an int-like value that equates to the size of a memory element. This is
useful for ptr arithmetic and indexing arrays. For example:

```mlir
%f32_off = ptr.type_offset f32 : index
%addr = ptr.ptradd %ptr, %f32_off : !ptr, index
%x = ptr.load %addr : !ptr -> f32 // Read ptr[1]
```

Without the `IsolatedFromAvobe` trait in the DL interface, the
`ptr.type_offset` cannot be `ConstantLike`. Why?
Take the example:
```mlir
op {DL1} {
  %f32_off0 = ptr.type_offset f32 : index
  op {DL2} {
    %f32_off1 = ptr.type_offset f32 : index
  }
}
```
If `ptr.type_offset` were to be `ConstantLike` then `canonicalize` would
hoist and unique the value. However, that could be wrong as DL2 could
have an entry to specify the size that's different from the size in DL1.

The best solution to the above problem is to make
`DataLayoutOpInterface` require the `IsolatedFromAbove` trait, as it
preserves the constness of values in the DL with respect to the
canonicalizer.
…and [nfc]

These were added in d584cea.  This change runs through existing uses and
simplifies where obvious.
Summary:
I am probably the person most familiar with the offloading pipeline in
clang at this point.
This patch bumps the CI container to the latest LLVM Release and gets rid of
the patch that we were carrying that is in 20.1.1.

Reviewers: tstellar

Reviewed By: tstellar

Pull Request: llvm#132567
This helps keep things up to date, and should not cause any issues given we do
not need to care about binary compatibility for things built in the CI
container. This patch also changes the name of the container which allows
incrementally moving jobs over after this lands.

Reviewers: tstellar

Reviewed By: tstellar

Pull Request: llvm#132568
…llvm#133283)

Summary:
The llvm#128509 patch introduced
`--flto-partitions`. This was marked as a HIP only argument, and was
also spelled and handled incorrectly for an `-f` option. This patch
makes the handling generic for `ld.lld` consumers.

This also fixes some issues with emitting the flags being put after the
default arguments, preventing users from overriding them. Also, forwards
things properly for the new driver so we can test this.
The new function will return `std::nullopt` when any error occurs.
This patch refactors the generate_test_report script, namely turning it
into a proper library, and pulling the script/unittests out into
separate files, as is standard with most python scripts. The main
purpose of this is to enable reusing the library for the new Github
premerge.

Reviewers: tstellar, DavidSpickett, Keenuts, lnihlen

Reviewed By: DavidSpickett

Pull Request: llvm#133196
Adds wide integer emulation support for the `arith.subi` op. `(i2N, i2N)
-> (i2N)` ops are emulated as `(vector<2xiN>, vector<2xiN>) ->
(vector<2xiN>)`, just as the other emulation patterns.

The emulation uses the following scheme:

```
resLow = lhsLow - rhsLow;      // carry = 1 if rhsLow > lhsLow
resHigh = lhsLow - carry - rhsLow;
```

Signed-off-by: Ege Beysel <beysel@roofline.ai>
`env -u` is not supported by the system `env` utility on AIX.

`/opt/freeware/bin/env` is the standard path for the GNU coreutils `env`
utility as distributed by the AIX Toolbox for Open Source Software.

Adding `/opt/freeware/bin` to `PATH` causes issues by picking up other
utilities that are less capable, in an AIX context, than the system
ones.

This patch modifies the relevant usage of `env` to use (on AIX) the full
path to `/opt/freeware/bin/env`.
Emit progress events from SymbolFileDWARFDebugMap. Because we know the
number of OSOs, we can show determinate progress. This is based on a
patch from Adrian, and part of what prompted me to look into improving
how LLDB shows progress events. Before the statusline, all these
progress events would get shadowed and never displayed on the command
line.
…lvm#129262)

Added a new setting called `lldb-dap.arguments` and a debug
configuration attribute called `debugAdapterArgs` that can be used to
set the arguments used to launch the debug adapter. Right now this is
mostly useful for debugging purposes to add the `--wait-for-debugger`
option to lldb-dap.

Additionally, the extension will now check for a changed lldb-dap
executable or arguments when launching a debug session in server mode. I
had to add a new `DebugConfigurationProvider` to do this because VSCode
will show an unhelpful error modal when the
`DebugAdapterDescriptorFactory` returns `undefined`.

In order to facilitate this, I had to add two new properties to the
launch configuration that are used by the
`DebugAdapterDescriptorFactory` to tell VS Code how to launch the debug
adapter:

- `debugAdapterHostname` - the hostname for an existing lldb-dap server
- `debugAdapterPort` - the port for an existing lldb-dap server

I've also removed the check for the `executable` argument in
`LLDBDapDescriptorFactory.createDebugAdapterDescriptor()`. This argument
is only set by VS Code when the debug adapter executable properties are
set in the `package.json`. The LLDB DAP extension does not currently do
this (and I don't think it ever will). So, this makes the debug adapter
descriptor factory a little easier to read.

The check for whether or not `lldb-dap` exists has been moved into the
new `DebugConfigurationProvider` as well. This way the extension won't
get in the user's way unless they actually try to start a debugging
session. The error will show up as a modal which will also make it more
obvious when something goes wrong, rather than popping up as a warning
at the bottom right of the screen.
…lvm#133176)

Currently only ctor/dtor list and their priorities are supported. This
PR adds support for the missing data field.

Few implementation notes:
- The assembly printer has a fixed form because previous `attr_dict`
will sort the dict by key name, making global_dtor and global_ctor
differ in the order of printed arguments.
- LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd
have to create a region to use the default operation conversion from
`ptr null`, which is silly given that the field only support null or a
symbol.
farzonl and others added 30 commits March 29, 2025 00:45
Do cleanup in DXILFinalizeLinkage.cpp where intrinsic declares are getting orphaned.

This change reduces "Unsupported intrinsic for DXIL lowering" errors
when compiling DML shaders from 12218 to 415. and improves our
compilation success rate from less than 1% to 44%.
Closes llvm#99156.


Tasks completed:
- Implement `smoothstep` using HLSL source in `hlsl_intrinsics.h`
- Implement the `smoothstep` SPIR-V target built-in in
`clang/include/clang/Basic/BuiltinsSPIRV.td`
- Add sema checks for `smoothstep` to `CheckSPIRVBuiltinFunctionCall` in
`clang/lib/Sema/SemaSPIRV.cpp`
- Add codegen for spv `smoothstep` to `EmitSPIRVBuiltinExpr` in
`clang/lib/CodeGen/TargetBuiltins/SPIR.cpp`
- Add codegen tests to `clang/test/CodeGenHLSL/builtins/smoothstep.hlsl`
- Add spv codegen test to
`clang/test/CodeGenSPIRV/Builtins/smoothstep.c`
- Add sema tests to
`clang/test/SemaHLSL/BuiltIns/smoothstep-errors.hlsl`
- Add spv sema tests to
`clang/test/SemaSPIRV/BuiltIns/smoothstep-errors.c`
- Create the `int_spv_smoothstep` intrinsic in `IntrinsicsSPIRV.td`
- In SPIRVInstructionSelector.cpp create the `smoothstep` lowering and
map it to `int_spv_smoothstep` in
`SPIRVInstructionSelector::selectIntrinsic`
- Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/smoothstep.ll`
- Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/opencl/smoothstep.ll`
This patch migrates the CI over to the new compute_projects.py script
for calculating what projects need to be tested based on a change to
LLVM.

Reviewers: lnihlen, ldionne, tstellar, Endilll, joker-eph, Keenuts

Reviewed By: Keenuts, tstellar

Pull Request: llvm#132642
Currently when someone touches a docs directory in a subproject, it is
treated as if the source code of that project got touched, so the
project is built, it is tested, and the same for all of its enumerated
dependents. This is wasteful, particularly for patches just touching
docs in places like LLVM where we might spend an hour of node time to do
nothing useful given changes in the docs shouldn't cause test failures
and there is already another workflow that tests the documentation build
completes successfully.

Reviewers: Keenuts, tstellar, lnihlen

Reviewed By: tstellar

Pull Request: llvm#133185
This patch adds rich test failure information to the Github output,
using the same library that is used for the buildkite pipeline.
Eventually I think we want to add more information like reproduction
information using the containers, but that is very divergent between
Github and Buildkite, so we probably want to wait until we've switched
over before doing that.

Reviewers: Keenuts, tstellar, lnihlen, DavidSpickett

Reviewed By: DavidSpickett, Keenuts

Pull Request: llvm#133197
The coding standards states that error messages should start with
a lowercase. Also use WithColor, and add missing test coverage for
the failed to write to output file case.
Avoid capitalized Error. This loses the "Error constructing pass pipeline"
part, and just forwards the error to the default report_fatal_error case. Not
sure if it's worth trying to keep.
- Moves the verification logic to the `verifyRegions` method of the
parent operation.
- Fixes a crash during verification when the last block lacks a
terminator.

Fixes llvm#132850.
Add an entry for pcsections Metadata that references the PC Sections
Metadata document.

Fixes: llvm#130552
Both traits do the same thing. This patch renames __can_reference to
__referenceable and moves it to the is_referenceable header.
…33466)

- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
Implement asinh for Float16 along with tests. Closes llvm#131001
Setting `EnableLoopTermFold` enables `loop-term-fold` pass.
…e DSA (llvm#133232)

Issue: Cray Pointer is not associated to Cray Pointee, leading to
Segmentation fault

Fix: GetUltimate, retrieves the base symbol in the current scope, which
gets passed all the references and returns the original symbol

---------

Co-authored-by: Michael Klemm <michael.klemm@amd.com>
Unlike `flat_map` and `flat_multimap`, The two function template
overloads `flat_set::insert`'s wording strongly suggest we should use
the transparent comparator
https://eel.is/c++draft/flat.set#modifiers-1

Both the code and the tests were not using the transparent comparator,
which needs to be fixed
This patch implements a simple printing pass for regions. This is meant
to be used in tests and for debugging.
…demanded elts across all EXTRACT_SUBVECTOR uses (REAPPLIED) (llvm#133401)

Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use.

Second try after llvm#133130 was reverted by llvm#133331 due to it affecting reverted test files
…pcrel

clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).

The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a067175

RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.

This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.

* MCValue::SymA can no longer have a SymbolVariant. Add an assert
  similar to that of AArch64ELFObjectWriter.cpp before
  https://reviews.llvm.org/D81446 (see my analysis at
  https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
  if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.

Pull Request: llvm#132569
…se-time constants

An immediate operand is encoded as an `MCExpr`, with `RISCVMCExpr`
specifying an operand that includes a relocation specifier. When
https://reviews.llvm.org/D23568 added initial fixup and relocation
support in 2017, it adapted code from `PPCMCExpr` (for `@l` `@ha`) to
evaluate the `RISCVMCExpr` operand. (PPCAsmParser had considerable
technical debt, though I’ve recently streamlined it somewhat, e.g.
8560da2)

Evaluating RISCVMCExpr during parsing is unnecessary. For example,
there's no need to treat `lui a0, %hi(2)` differently from `lui a0,
%hi(foo)` when foo has not been defined yet.

This evaluation introduces unnecessary complexity. For instance, parser
functions require an extra check like `VK == RISCVMCExpr::VK_None`, as
seen in these examples:

```
 if (!evaluateConstantImm(getImm(), Imm, VK) || VK != RISCVMCExpr::VK_None)

 return IsConstantImm && isUInt<N>(Imm) && VK == RISCVMCExpr::VK_None;
```

This PR eliminates the parse-time evaluation of `RISCVMCExpr`, aligning
it more closely with other targets.

---

`abs = 0x12345; lui t3, %hi(abs)` now generates
R_RISCV_HI20/R_RISCV_RELAX with linker relaxation.
(Tested by test/MC/RISCV/linker-relaxation.s)

(Notably, since commit ba2de8f in
lld/ELF, the linker can handle R_RISCV_HI relocations with a symbol
index of 0 in -pie mode.)

Pull Request: llvm#133377
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment