Skip to content

Conversation

matthiaskrgr
Copy link
Member

@matthiaskrgr matthiaskrgr commented Sep 28, 2025

Successful merges:

r? @ghost
@rustbot modify labels: rollup

Create a similar rollup

Kivooeo and others added 28 commits August 31, 2025 17:23
Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
  - Add F128 support to TypeTree Kind enum
  - Implement TypeTree FFI bindings and conversion functions
  - Add typetree.rs module for metadata attachment to LLVM functions
  - Integrate TypeTree generation with autodiff intrinsic pipeline
  - Support scalar types: f32, f64, integers, f16, f128
  - Attach enzyme_type attributes as LLVM string metadata for Enzyme

Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
  - Fix nott-flag test to emit LLVM IR and check enzyme_type attributes
  - Replace TODO comments with actual TypeTree metadata verification
  - Test that NoTT flag properly disables TypeTree generation
  - Test that TypeTree enabled generates proper enzyme_type attributes

Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
  - Add specific tests for f32, f64, i32, f16, f128 TypeTree generation
  - Verify correct enzyme_type metadata for each scalar type
  - Ensure TypeTree metadata matches expected Enzyme format

Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
This patch introduces an LSX-optimized version of `analyze_source_file`
for the `loongarch64` target. Similar to existing SSE2 implementation
for x86, this version:

- Processes 16-byte chunks at a time using LSX vector intrinsics.
- Quickly identifies newlines in ASCII-only chunks.
- Falls back to the generic implementation when multi-byte UTF-8
  characters are detected or in the tail portion.
Current implementation uses bin_name to check if it exists,
but it should use tool_root_dir/tool_bin_dir/bin_name instead.
Otherwise the check fails every time, hence the function falls back to
install the binary.
…n, r=joboet

std::net: update tcp deferaccept delay type to Duration.

See comment [here](rust-lang#119639 (comment)).
…iler-errors

Allow `&raw [mut | const]` for union field in safe code

fixes rust-lang#141264

r? ``@Veykril``

Unresolved questions:

- [x] Any edge cases?
- [x] How this works with rust-analyzer (because all I've did is prevent compiler from emitting error in `&raw` context) (rust-lang/rust-analyzer#19867)
- [x] Should we allow `addr_of!` and `addr_of_mut!` as well? In current version they both (`&raw` and `addr_of!`) are allowed (They are the same)
- [x] Is chain of union fields is a safe? (Yes)
TypeTree support in autodiff

# TypeTrees for Autodiff

## What are TypeTrees?
Memory layout descriptors for Enzyme. Tell Enzyme exactly how types are structured in memory so it can compute derivatives efficiently.

## Structure
```rust
TypeTree(Vec<Type>)

Type {
    offset: isize,  // byte offset (-1 = everywhere)
    size: usize,    // size in bytes
    kind: Kind,     // Float, Integer, Pointer, etc.
    child: TypeTree // nested structure
}
```

## Example: `fn compute(x: &f32, data: &[f32]) -> f32`

**Input 0: `x: &f32`**
```rust
TypeTree(vec![Type {
    offset: -1, size: 8, kind: Pointer,
    child: TypeTree(vec![Type {
        offset: -1, size: 4, kind: Float,
        child: TypeTree::new()
    }])
}])
```

**Input 1: `data: &[f32]`**
```rust
TypeTree(vec![Type {
    offset: -1, size: 8, kind: Pointer,
    child: TypeTree(vec![Type {
        offset: -1, size: 4, kind: Float,  // -1 = all elements
        child: TypeTree::new()
    }])
}])
```

**Output: `f32`**
```rust
TypeTree(vec![Type {
    offset: -1, size: 4, kind: Float,
    child: TypeTree::new()
}])
```

## Why Needed?
- Enzyme can't deduce complex type layouts from LLVM IR
- Prevents slow memory pattern analysis
- Enables correct derivative computation for nested structures
- Tells Enzyme which bytes are differentiable vs metadata

## What Enzyme Does With This Information:

Without TypeTrees (current state):
```llvm
; Enzyme sees generic LLVM IR:
define float ``@distance(ptr*`` %p1, ptr* %p2) {
; Has to guess what these pointers point to
; Slow analysis of all memory operations
; May miss optimization opportunities
}
```

With TypeTrees (our implementation):
```llvm
define "enzyme_type"="{[]:Float@float}" float ``@distance(``
    ptr "enzyme_type"="{[]:Pointer}" %p1,
    ptr "enzyme_type"="{[]:Pointer}" %p2
) {
; Enzyme knows exact type layout
; Can generate efficient derivative code directly
}
```

# TypeTrees - Offset and -1 Explained

## Type Structure

```rust
Type {
    offset: isize, // WHERE this type starts
    size: usize,   // HOW BIG this type is
    kind: Kind,    // WHAT KIND of data (Float, Int, Pointer)
    child: TypeTree // WHAT'S INSIDE (for pointers/containers)
}
```

## Offset Values

### Regular Offset (0, 4, 8, etc.)
**Specific byte position within a structure**

```rust
struct Point {
    x: f32, // offset 0, size 4
    y: f32, // offset 4, size 4
    id: i32, // offset 8, size 4
}
```

TypeTree for `&Point` (internal representation):
```rust
TypeTree(vec![
    Type { offset: 0, size: 4, kind: Float },   // x at byte 0
    Type { offset: 4, size: 4, kind: Float },   // y at byte 4
    Type { offset: 8, size: 4, kind: Integer }  // id at byte 8
])
```

Generates LLVM:
```llvm
"enzyme_type"="{[]:Float@float}"
```

### Offset -1 (Special: "Everywhere")
**Means "this pattern repeats for ALL elements"**

#### Example 1: Array `[f32; 100]`
```rust
TypeTree(vec![Type {
    offset: -1, // ALL positions
    size: 4,    // each f32 is 4 bytes
    kind: Float, // every element is float
}])
```

Instead of listing 100 separate Types with offsets `0,4,8,12...396`

#### Example 2: Slice `&[i32]`
```rust
// Pointer to slice data
TypeTree(vec![Type {
    offset: -1, size: 8, kind: Pointer,
    child: TypeTree(vec![Type {
        offset: -1, // ALL slice elements
        size: 4,    // each i32 is 4 bytes
        kind: Integer
    }])
}])
```

#### Example 3: Mixed Structure
```rust
struct Container {
    header: i64,        // offset 0
    data: [f32; 1000],  // offset 8, but elements use -1
}
```

```rust
TypeTree(vec![
    Type { offset: 0, size: 8, kind: Integer }, // header
    Type { offset: 8, size: 4000, kind: Pointer,
        child: TypeTree(vec![Type {
            offset: -1, size: 4, kind: Float // ALL array elements
        }])
    }
])
```
… r=Mark-Simulacrum

Allow shared access to `Exclusive<T>` when `T: Sync`

Addresses libs-api request in rust-lang#98407 (comment).

Adds the following trait impls to `Exclusive<T>`, all bounded on `T: Sync`:

- `AsRef<T>`
- `Clone`
- `Copy`
- `PartialEq`
- `StructuralPartialEq`
- `Eq`
- `Hash`
- `PartialOrd`
- `Ord`
- `Fn`

``@rustbot`` label T-libs-api
Reland "Add LSX accelerated implementation for source file analysis"

This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version:

- Processes 16-byte chunks at a time using LSX vector intrinsics.
- Quickly identifies newlines in ASCII-only chunks.
- Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.

Reland rust-lang#145963

r? ``@lqd``
Fix --extra-checks=spellcheck to prevent cargo install every time

Fixes rust-lang#147105

## Background
Current implementation of `ensure_version_of_cargo_install` uses `bin_name` to check if it exists, but it should use `<tool_root_dir>/<tool_bin_dir>/<bin_name>` instead. Otherwise the check fails every time, hence the function falls back to install the binary.

## Change
Move lines which define bin_path at the top of the function, and use bin_path for the check
@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-run-make Area: port run-make Makefiles to rmake.rs labels Sep 28, 2025
@rustbot rustbot added A-tidy Area: The tidy tool F-autodiff `#![feature(autodiff)]` F-explicit_tail_calls `#![feature(explicit_tail_calls)]` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver) rollup A PR which is a rollup labels Sep 28, 2025
@matthiaskrgr
Copy link
Member Author

@bors r+ rollup=never p=5

@bors
Copy link
Collaborator

bors commented Sep 28, 2025

📌 Commit 4eb6b8f has been approved by matthiaskrgr

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 28, 2025
@bors
Copy link
Collaborator

bors commented Sep 28, 2025

⌛ Testing commit 4eb6b8f with merge c8905ea...

@bors
Copy link
Collaborator

bors commented Sep 28, 2025

☀️ Test successful - checks-actions
Approved by: matthiaskrgr
Pushing c8905ea to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 28, 2025
@bors bors merged commit c8905ea into rust-lang:master Sep 28, 2025
11 checks passed
@rustbot rustbot added this to the 1.92.0 milestone Sep 28, 2025
@rust-timer
Copy link
Collaborator

📌 Perf builds for each rolled up PR:

PR# Message Perf Build Sha
#140482 std::net: update tcp deferaccept delay type to Duration. 37c7bc28c30ba66236a7d15966647b966cf1b5d1 (link)
#141469 Allow &raw [mut | const] for union field in safe code be75a939c573021fc3b9cd5bb66b97322cc86ad4 (link)
#144197 TypeTree support in autodiff f2e1b2b93a5ec5198f11f91fd53682e25e6ac4f8 (link)
#146675 Allow shared access to Exclusive<T> when T: Sync 4a0f000a1c37c77848eb5a05361de87ecd99155a (link)
#147113 Reland "Add LSX accelerated implementation for source file … afd65bbc8bb246d525838303e3354acbec7519f4 (link)
#147120 Fix --extra-checks=spellcheck to prevent cargo install ever… 715744e78d311fc40a19a297219b5a94a6f46358 (link)

previous master: 8d72d3e1e9

In the case of a perf regression, run the following command for each PR you suspect might be the cause: @rust-timer build $SHA

Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 8d72d3e (parent) -> c8905ea (this PR)

Test differences

Show 79 test diffs

Stage 1

  • [codegen] tests/codegen-llvm/autodiff/typetree.rs: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/array-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/memcpy-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/mixed-struct-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/nott-flag: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/recursion-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f128-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f16-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f32-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f64-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/i32-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/slice-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/struct-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [run-make] tests/run-make/autodiff/type-trees/tuple-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [ui] tests/ui/autodiff/flag_nott.rs: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)

Stage 2

  • [run-make] tests/run-make/autodiff/type-trees/array-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/memcpy-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/mixed-struct-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/nott-flag: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/recursion-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f128-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f16-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f32-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f64-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/i32-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/slice-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/struct-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [run-make] tests/run-make/autodiff/type-trees/tuple-typetree: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [ui] tests/ui/autodiff/flag_nott.rs: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J2)
  • [run-make] tests/run-make/autodiff/type-trees/array-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/memcpy-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/mixed-struct-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/nott-flag: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/recursion-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f128-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f16-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f32-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/f64-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/scalar-types/i32-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/slice-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/struct-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [run-make] tests/run-make/autodiff/type-trees/tuple-typetree: [missing] -> ignore (ignored when cross-compiling) (J3)
  • [codegen] tests/codegen-llvm/autodiff/typetree.rs: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J4)

Additionally, 36 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard c8905eaa66e0c35a33626e974b9ce6955c739b5b --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. pr-check-1: 1364.2s -> 1757.8s (28.9%)
  2. dist-aarch64-linux: 8634.8s -> 6185.8s (-28.4%)
  3. aarch64-gnu-llvm-20-2: 2166.9s -> 2690.6s (24.2%)
  4. aarch64-gnu-llvm-20-1: 3159.1s -> 3708.3s (17.4%)
  5. tidy: 162.7s -> 189.8s (16.7%)
  6. i686-gnu-2: 5533.0s -> 6382.0s (15.3%)
  7. x86_64-rust-for-linux: 2711.2s -> 3103.8s (14.5%)
  8. pr-check-2: 2348.7s -> 2633.3s (12.1%)
  9. x86_64-gnu-tools: 3254.0s -> 3614.0s (11.1%)
  10. i686-gnu-nopt-1: 7277.4s -> 8057.3s (10.7%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c8905ea): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.3%, -0.0%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary 0.5%, secondary 1.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.6% [3.6%, 3.6%] 1
Regressions ❌
(secondary)
1.5% [0.6%, 2.4%] 4
Improvements ✅
(primary)
-1.0% [-1.1%, -1.0%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [-1.1%, 3.6%] 3

Cycles

Results (primary 2.6%, secondary -2.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.6% [2.3%, 2.8%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.7% [-3.2%, -2.2%] 2
All ❌✅ (primary) 2.6% [2.3%, 2.8%] 3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 469.737s -> 470.874s (0.24%)
Artifact size: 387.61 MiB -> 387.66 MiB (0.01%)

@bors bors mentioned this pull request Sep 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-run-make Area: port run-make Makefiles to rmake.rs A-tidy Area: The tidy tool F-autodiff `#![feature(autodiff)]` F-explicit_tail_calls `#![feature(explicit_tail_calls)]` merged-by-bors This PR was explicitly merged by bors. rollup A PR which is a rollup S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants