-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Rollup of 6 pull requests #147128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup of 6 pull requests #147128
Conversation
Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
- Add F128 support to TypeTree Kind enum - Implement TypeTree FFI bindings and conversion functions - Add typetree.rs module for metadata attachment to LLVM functions - Integrate TypeTree generation with autodiff intrinsic pipeline - Support scalar types: f32, f64, integers, f16, f128 - Attach enzyme_type attributes as LLVM string metadata for Enzyme Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
- Fix nott-flag test to emit LLVM IR and check enzyme_type attributes - Replace TODO comments with actual TypeTree metadata verification - Test that NoTT flag properly disables TypeTree generation - Test that TypeTree enabled generates proper enzyme_type attributes Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
- Add specific tests for f32, f64, i32, f16, f128 TypeTree generation - Verify correct enzyme_type metadata for each scalar type - Ensure TypeTree metadata matches expected Enzyme format Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
Current implementation uses bin_name to check if it exists, but it should use tool_root_dir/tool_bin_dir/bin_name instead. Otherwise the check fails every time, hence the function falls back to install the binary.
…n, r=joboet std::net: update tcp deferaccept delay type to Duration. See comment [here](rust-lang#119639 (comment)).
…iler-errors Allow `&raw [mut | const]` for union field in safe code fixes rust-lang#141264 r? ``@Veykril`` Unresolved questions: - [x] Any edge cases? - [x] How this works with rust-analyzer (because all I've did is prevent compiler from emitting error in `&raw` context) (rust-lang/rust-analyzer#19867) - [x] Should we allow `addr_of!` and `addr_of_mut!` as well? In current version they both (`&raw` and `addr_of!`) are allowed (They are the same) - [x] Is chain of union fields is a safe? (Yes)
TypeTree support in autodiff # TypeTrees for Autodiff ## What are TypeTrees? Memory layout descriptors for Enzyme. Tell Enzyme exactly how types are structured in memory so it can compute derivatives efficiently. ## Structure ```rust TypeTree(Vec<Type>) Type { offset: isize, // byte offset (-1 = everywhere) size: usize, // size in bytes kind: Kind, // Float, Integer, Pointer, etc. child: TypeTree // nested structure } ``` ## Example: `fn compute(x: &f32, data: &[f32]) -> f32` **Input 0: `x: &f32`** ```rust TypeTree(vec![Type { offset: -1, size: 8, kind: Pointer, child: TypeTree(vec![Type { offset: -1, size: 4, kind: Float, child: TypeTree::new() }]) }]) ``` **Input 1: `data: &[f32]`** ```rust TypeTree(vec![Type { offset: -1, size: 8, kind: Pointer, child: TypeTree(vec![Type { offset: -1, size: 4, kind: Float, // -1 = all elements child: TypeTree::new() }]) }]) ``` **Output: `f32`** ```rust TypeTree(vec![Type { offset: -1, size: 4, kind: Float, child: TypeTree::new() }]) ``` ## Why Needed? - Enzyme can't deduce complex type layouts from LLVM IR - Prevents slow memory pattern analysis - Enables correct derivative computation for nested structures - Tells Enzyme which bytes are differentiable vs metadata ## What Enzyme Does With This Information: Without TypeTrees (current state): ```llvm ; Enzyme sees generic LLVM IR: define float ``@distance(ptr*`` %p1, ptr* %p2) { ; Has to guess what these pointers point to ; Slow analysis of all memory operations ; May miss optimization opportunities } ``` With TypeTrees (our implementation): ```llvm define "enzyme_type"="{[]:Float@float}" float ``@distance(`` ptr "enzyme_type"="{[]:Pointer}" %p1, ptr "enzyme_type"="{[]:Pointer}" %p2 ) { ; Enzyme knows exact type layout ; Can generate efficient derivative code directly } ``` # TypeTrees - Offset and -1 Explained ## Type Structure ```rust Type { offset: isize, // WHERE this type starts size: usize, // HOW BIG this type is kind: Kind, // WHAT KIND of data (Float, Int, Pointer) child: TypeTree // WHAT'S INSIDE (for pointers/containers) } ``` ## Offset Values ### Regular Offset (0, 4, 8, etc.) **Specific byte position within a structure** ```rust struct Point { x: f32, // offset 0, size 4 y: f32, // offset 4, size 4 id: i32, // offset 8, size 4 } ``` TypeTree for `&Point` (internal representation): ```rust TypeTree(vec![ Type { offset: 0, size: 4, kind: Float }, // x at byte 0 Type { offset: 4, size: 4, kind: Float }, // y at byte 4 Type { offset: 8, size: 4, kind: Integer } // id at byte 8 ]) ``` Generates LLVM: ```llvm "enzyme_type"="{[]:Float@float}" ``` ### Offset -1 (Special: "Everywhere") **Means "this pattern repeats for ALL elements"** #### Example 1: Array `[f32; 100]` ```rust TypeTree(vec![Type { offset: -1, // ALL positions size: 4, // each f32 is 4 bytes kind: Float, // every element is float }]) ``` Instead of listing 100 separate Types with offsets `0,4,8,12...396` #### Example 2: Slice `&[i32]` ```rust // Pointer to slice data TypeTree(vec![Type { offset: -1, size: 8, kind: Pointer, child: TypeTree(vec![Type { offset: -1, // ALL slice elements size: 4, // each i32 is 4 bytes kind: Integer }]) }]) ``` #### Example 3: Mixed Structure ```rust struct Container { header: i64, // offset 0 data: [f32; 1000], // offset 8, but elements use -1 } ``` ```rust TypeTree(vec![ Type { offset: 0, size: 8, kind: Integer }, // header Type { offset: 8, size: 4000, kind: Pointer, child: TypeTree(vec![Type { offset: -1, size: 4, kind: Float // ALL array elements }]) } ]) ```
… r=Mark-Simulacrum Allow shared access to `Exclusive<T>` when `T: Sync` Addresses libs-api request in rust-lang#98407 (comment). Adds the following trait impls to `Exclusive<T>`, all bounded on `T: Sync`: - `AsRef<T>` - `Clone` - `Copy` - `PartialEq` - `StructuralPartialEq` - `Eq` - `Hash` - `PartialOrd` - `Ord` - `Fn` ``@rustbot`` label T-libs-api
Reland "Add LSX accelerated implementation for source file analysis" This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion. Reland rust-lang#145963 r? ``@lqd``
Fix --extra-checks=spellcheck to prevent cargo install every time Fixes rust-lang#147105 ## Background Current implementation of `ensure_version_of_cargo_install` uses `bin_name` to check if it exists, but it should use `<tool_root_dir>/<tool_bin_dir>/<bin_name>` instead. Otherwise the check fails every time, hence the function falls back to install the binary. ## Change Move lines which define bin_path at the top of the function, and use bin_path for the check
@bors r+ rollup=never p=5 |
☀️ Test successful - checks-actions |
📌 Perf builds for each rolled up PR:
previous master: 8d72d3e1e9 In the case of a perf regression, run the following command for each PR you suspect might be the cause: |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 8d72d3e (parent) -> c8905ea (this PR) Test differencesShow 79 test diffsStage 1
Stage 2
Additionally, 36 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard c8905eaa66e0c35a33626e974b9ce6955c739b5b --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (c8905ea): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 0.5%, secondary 1.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.6%, secondary -2.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 469.737s -> 470.874s (0.24%) |
Successful merges:
&raw [mut | const]
for union field in safe code #141469 (Allow&raw [mut | const]
for union field in safe code)Exclusive<T>
whenT: Sync
#146675 (Allow shared access toExclusive<T>
whenT: Sync
)r? @ghost
@rustbot modify labels: rollup
Create a similar rollup