forked from rust-lang/rust
-
Notifications
You must be signed in to change notification settings - Fork 8
Enzyme tracking issue #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It basically works by now, though it could need some more love. |
bytesnake
pushed a commit
that referenced
this issue
May 27, 2023
…t, r=tmiasko Encode def span for foreign return-position `impl Trait` in trait Fixes rust-lang#111031, yet another def-span encoding issue :/ Includes a smaller repro than the issue, but I can confirm it ICEs: ``` query stack during panic: #0 [def_span] looking up span for `rpitit::Foo::bar::{opaque#0}` #1 [object_safety_violations] determining object safety of trait `rpitit::Foo` #2 [check_is_object_safe] checking if trait `rpitit::Foo` is object safe #3 [typeck] type-checking `main` #4 [used_trait_imports] finding used_trait_imports `main` #5 [analysis] running analysis passes on this crate ``` Luckily since this only affects nightly, this desn't need to be backported.
bytesnake
pushed a commit
that referenced
this issue
May 27, 2023
Add Terminator conversion from MIR to SMIR, part #1 This adds internal MIR TerminatorKind to SMIR Terminator conversion. r? ```@oli-obk```
bytesnake
pushed a commit
that referenced
this issue
May 27, 2023
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#110577 (Use fulfillment to check `Drop` impl compatibility) - rust-lang#110610 (Add Terminator conversion from MIR to SMIR, part #1) - rust-lang#110985 (Fix spans in LLVM-generated inline asm errors) - rust-lang#110989 (Make the BUG_REPORT_URL configurable by tools ) - rust-lang#111167 (debuginfo: split method declaration and definition) - rust-lang#111230 (add hint for =< as <=) - rust-lang#111279 (More robust debug assertions for `Instance::resolve` on built-in traits with non-standard trait items) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
ZuseZ4
pushed a commit
that referenced
this issue
Sep 23, 2024
better implementation of signed div_floor/ceil Tracking issue for signed `div_floor`/`div_ceil`: rust-lang#88581. This PR improves the implementation of those two functions by adding a better branchless algorithm. Side-by-side comparison of `i32::div_floor` on x86-64: ```asm div_floor_new: div_floor_old: push rax push rax test esi, esi test esi, esi je .LBB0_3 je .LBB1_6 mov eax, esi mov eax, esi not eax not eax lea ecx, [rdi - 2147483648] lea ecx, [rdi - 2147483648] or ecx, eax or ecx, eax je .LBB0_2 je .LBB1_7 mov eax, edi mov eax, edi cdq cdq idiv esi idiv esi xor esi, edi test edx, edx sar esi, 31 setg cl test edx, edx test esi, esi cmove esi, edx sets dil add eax, esi test dil, cl pop rcx jne .LBB1_4 ret test edx, edx .LBB0_3: setns cl lea rdi, [rip + .L__unnamed_1] test esi, esi call qword ptr [rip + panic...] setle dl .LBB0_2: or dl, cl lea rdi, [rip + .L__unnamed_1] jne .LBB1_5 call qword ptr [rip + panic...] .LBB1_4: dec eax .LBB1_5: pop rcx ret .LBB1_6: lea rdi, [rip + .L__unnamed_2] call qword ptr [rip + panic...] .LBB1_7: lea rdi, [rip + .L__unnamed_2] call qword ptr [rip + panic...] ``` And on Aarch64: ```asm _div_floor_new: _div_floor_old: stp x29, x30, [sp, #-16]! stp x29, x30, [sp, #-16]! mov x29, sp mov x29, sp cbz w1, LBB0_4 cbz w1, LBB1_9 mov w8, #-2147483648 mov x8, x0 cmp w0, w8 mov w9, #-2147483648 b.ne LBB0_3 cmp w0, w9 cmn w1, #1 b.ne LBB1_3 b.eq LBB0_5 cmn w1, #1 LBB0_3: b.eq LBB1_10 sdiv w8, w0, w1 LBB1_3: msub w9, w8, w1, w0 sdiv w0, w8, w1 eor w10, w1, w0 msub w8, w0, w1, w8 asr w10, w10, #31 tbz w1, #31, LBB1_5 cmp w9, #0 cmp w8, #0 csel w9, wzr, w10, eq b.gt LBB1_7 add w0, w9, w8 LBB1_5: ldp x29, x30, [sp], #16 cmp w1, #1 ret b.lt LBB1_8 LBB0_4: tbz w8, #31, LBB1_8 adrp x0, l___unnamed_1@PAGE LBB1_7: add x0, x0, l___unnamed_1@PAGEOFF sub w0, w0, #1 bl panic... LBB1_8: LBB0_5: ldp x29, x30, [sp], #16 adrp x0, l___unnamed_1@PAGE ret add x0, x0, l___unnamed_1@PAGEOFF LBB1_9: bl panic... adrp x0, l___unnamed_2@PAGE add x0, x0, l___unnamed_2@PAGEOFF bl panic... LBB1_10: adrp x0, l___unnamed_2@PAGE add x0, x0, l___unnamed_2@PAGEOFF bl panic... ```
ZuseZ4
pushed a commit
that referenced
this issue
Oct 27, 2024
…raheemdev Optimize `Box::default` and `Arc::default` to construct more types in place Both the `Arc` and `Box` `Default` impls currently call `T::default()` before allocating, and then moving the resulting `T` into the allocation. Most `Default` impls are trivial, which should in theory allow LLVM to construct `T: Default` directly in the `Box` allocation when calling `<Box<T>>::default()`. However, the allocation may fail, which necessitates calling `T`'s destructor if it has one. If the destructor is non-trivial, then LLVM has a hard time proving that it's sound to elide, which makes it construct `T` on the stack first, and then copy it into the allocation. Change both of these impls to allocate first, and then call `T::default` into the uninitialized allocation, so that LLVM doesn't have to prove that it's sound to elide the destructor/initial stack copy. For example, given the following Rust code: ```rust #[derive(Default, Clone)] struct Foo { x: Vec<u8>, z: String, y: Vec<u8>, } #[no_mangle] pub fn src() -> Box<Foo> { Box::default() } ``` <details open> <summary>Before this PR:</summary> ```llvm `@__rust_no_alloc_shim_is_unstable` = external global i8 ; drop_in_place() generated in case the allocation fails ; core::ptr::drop_in_place<playground::Foo> ; Function Attrs: nounwind nonlazybind uwtable define internal fastcc void `@"_ZN4core3ptr36drop_in_place$LT$playground..Foo$GT$17hff376aece491233bE"(ptr` noalias nocapture noundef readonly align 8 dereferenceable(72) %_1) unnamed_addr #0 personality ptr `@rust_eh_personality` { start: %_1.val = load i64, ptr %_1, align 8 %0 = icmp eq i64 %_1.val, 0 br i1 %0, label %bb6, label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i" "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i": ; preds = %start %1 = getelementptr inbounds i8, ptr %_1, i64 8 %_1.val6 = load ptr, ptr %1, align 8, !nonnull !3, !noundef !3 tail call void `@__rust_dealloc(ptr` noundef nonnull %_1.val6, i64 noundef %_1.val, i64 noundef 1) #8 br label %bb6 bb6: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i", %start %2 = getelementptr inbounds i8, ptr %_1, i64 24 %.val9 = load i64, ptr %2, align 8 %3 = icmp eq i64 %.val9, 0 br i1 %3, label %bb5, label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i.i11" "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i.i11": ; preds = %bb6 %4 = getelementptr inbounds i8, ptr %_1, i64 32 %.val10 = load ptr, ptr %4, align 8, !nonnull !3, !noundef !3 tail call void `@__rust_dealloc(ptr` noundef nonnull %.val10, i64 noundef %.val9, i64 noundef 1) #8 br label %bb5 bb5: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i.i11", %bb6 %5 = getelementptr inbounds i8, ptr %_1, i64 48 %.val4 = load i64, ptr %5, align 8 %6 = icmp eq i64 %.val4, 0 br i1 %6, label %"_ZN4core3ptr46drop_in_place$LT$alloc..vec..Vec$LT$u8$GT$$GT$17hb5ca95423e113cf7E.exit16", label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i15" "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i15": ; preds = %bb5 %7 = getelementptr inbounds i8, ptr %_1, i64 56 %.val5 = load ptr, ptr %7, align 8, !nonnull !3, !noundef !3 tail call void `@__rust_dealloc(ptr` noundef nonnull %.val5, i64 noundef %.val4, i64 noundef 1) #8 br label %"_ZN4core3ptr46drop_in_place$LT$alloc..vec..Vec$LT$u8$GT$$GT$17hb5ca95423e113cf7E.exit16" "_ZN4core3ptr46drop_in_place$LT$alloc..vec..Vec$LT$u8$GT$$GT$17hb5ca95423e113cf7E.exit16": ; preds = %bb5, %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17heaa87468709346b1E.exit.i.i.i4.i15" ret void } ; Function Attrs: nonlazybind uwtable define noalias noundef nonnull align 8 ptr `@src()` unnamed_addr #1 personality ptr `@rust_eh_personality` { start: ; alloca to place `Foo` in. %_1 = alloca [72 x i8], align 8 call void `@llvm.lifetime.start.p0(i64` 72, ptr nonnull %_1) store i64 0, ptr %_1, align 8 %_2.sroa.4.0._1.sroa_idx = getelementptr inbounds i8, ptr %_1, i64 8 store ptr inttoptr (i64 1 to ptr), ptr %_2.sroa.4.0._1.sroa_idx, align 8 %_2.sroa.5.0._1.sroa_idx = getelementptr inbounds i8, ptr %_1, i64 16 %_3.sroa.4.0..sroa_idx = getelementptr inbounds i8, ptr %_1, i64 32 call void `@llvm.memset.p0.i64(ptr` noundef nonnull align 8 dereferenceable(16) %_2.sroa.5.0._1.sroa_idx, i8 0, i64 16, i1 false) store ptr inttoptr (i64 1 to ptr), ptr %_3.sroa.4.0..sroa_idx, align 8 %_3.sroa.5.0..sroa_idx = getelementptr inbounds i8, ptr %_1, i64 40 %_4.sroa.4.0..sroa_idx = getelementptr inbounds i8, ptr %_1, i64 56 call void `@llvm.memset.p0.i64(ptr` noundef nonnull align 8 dereferenceable(16) %_3.sroa.5.0..sroa_idx, i8 0, i64 16, i1 false) store ptr inttoptr (i64 1 to ptr), ptr %_4.sroa.4.0..sroa_idx, align 8 %_4.sroa.5.0..sroa_idx = getelementptr inbounds i8, ptr %_1, i64 64 store i64 0, ptr %_4.sroa.5.0..sroa_idx, align 8 %0 = load volatile i8, ptr `@__rust_no_alloc_shim_is_unstable,` align 1, !noalias !4 %_0.i.i.i = tail call noalias noundef align 8 dereferenceable_or_null(72) ptr `@__rust_alloc(i64` noundef 72, i64 noundef 8) #8, !noalias !4 %1 = icmp eq ptr %_0.i.i.i, null br i1 %1, label %bb2.i, label %"_ZN5alloc5boxed12Box$LT$T$GT$3new17h0864de14f863a27aE.exit" bb2.i: ; preds = %start ; invoke alloc::alloc::handle_alloc_error invoke void `@_ZN5alloc5alloc18handle_alloc_error17h98142d0d8d74161bE(i64` noundef 8, i64 noundef 72) #9 to label %.noexc unwind label %cleanup.i .noexc: ; preds = %bb2.i unreachable cleanup.i: ; preds = %bb2.i %2 = landingpad { ptr, i32 } cleanup ; call core::ptr::drop_in_place<playground::Foo> call fastcc void `@"_ZN4core3ptr36drop_in_place$LT$playground..Foo$GT$17hff376aece491233bE"(ptr` noalias noundef nonnull align 8 dereferenceable(72) %_1) #10 resume { ptr, i32 } %2 "_ZN5alloc5boxed12Box$LT$T$GT$3new17h0864de14f863a27aE.exit": ; preds = %start ; Copy from stack to heap if allocation is successful call void `@llvm.memcpy.p0.p0.i64(ptr` noundef nonnull align 8 dereferenceable(72) %_0.i.i.i, ptr noundef nonnull align 8 dereferenceable(72) %_1, i64 72, i1 false) call void `@llvm.lifetime.end.p0(i64` 72, ptr nonnull %_1) ret ptr %_0.i.i.i } ``` </details> <details> <summary>After this PR</summary> ```llvm ; Notice how there's no `drop_in_place()` generated as well define noalias noundef nonnull align 8 ptr `@src()` unnamed_addr #0 personality ptr `@rust_eh_personality` { start: ; no stack allocation %0 = load volatile i8, ptr `@__rust_no_alloc_shim_is_unstable,` align 1 %_0.i.i.i.i.i = tail call noalias noundef align 8 dereferenceable_or_null(72) ptr `@__rust_alloc(i64` noundef 72, i64 noundef 8) #5 %1 = icmp eq ptr %_0.i.i.i.i.i, null br i1 %1, label %bb3.i, label %"_ZN5alloc5boxed16Box$LT$T$C$A$GT$13new_uninit_in17h80d6355ef4b73ea3E.exit" bb3.i: ; preds = %start ; call alloc::alloc::handle_alloc_error tail call void `@_ZN5alloc5alloc18handle_alloc_error17h98142d0d8d74161bE(i64` noundef 8, i64 noundef 72) #6 unreachable "_ZN5alloc5boxed16Box$LT$T$C$A$GT$13new_uninit_in17h80d6355ef4b73ea3E.exit": ; preds = %start ; construct `Foo` directly into the allocation if successful store i64 0, ptr %_0.i.i.i.i.i, align 8 %_8.sroa.4.0._1.sroa_idx = getelementptr inbounds i8, ptr %_0.i.i.i.i.i, i64 8 store ptr inttoptr (i64 1 to ptr), ptr %_8.sroa.4.0._1.sroa_idx, align 8 %_8.sroa.5.0._1.sroa_idx = getelementptr inbounds i8, ptr %_0.i.i.i.i.i, i64 16 %_8.sroa.7.0._1.sroa_idx = getelementptr inbounds i8, ptr %_0.i.i.i.i.i, i64 32 tail call void `@llvm.memset.p0.i64(ptr` noundef nonnull align 8 dereferenceable(16) %_8.sroa.5.0._1.sroa_idx, i8 0, i64 16, i1 false) store ptr inttoptr (i64 1 to ptr), ptr %_8.sroa.7.0._1.sroa_idx, align 8 %_8.sroa.8.0._1.sroa_idx = getelementptr inbounds i8, ptr %_0.i.i.i.i.i, i64 40 %_8.sroa.10.0._1.sroa_idx = getelementptr inbounds i8, ptr %_0.i.i.i.i.i, i64 56 tail call void `@llvm.memset.p0.i64(ptr` noundef nonnull align 8 dereferenceable(16) %_8.sroa.8.0._1.sroa_idx, i8 0, i64 16, i1 false) store ptr inttoptr (i64 1 to ptr), ptr %_8.sroa.10.0._1.sroa_idx, align 8 %_8.sroa.11.0._1.sroa_idx = getelementptr inbounds i8, ptr %_0.i.i.i.i.i, i64 64 store i64 0, ptr %_8.sroa.11.0._1.sroa_idx, align 8 ret ptr %_0.i.i.i.i.i } ``` </details>
ZuseZ4
pushed a commit
that referenced
this issue
Oct 27, 2024
…Kobzol shave 150ms off bootstrap This starts `git` commands inside `GitInfo`and the submodule updates in parallel. Git should already perform internal locking in cases where it needs to serialize a modification. ``` OLD Benchmark #1: ./x check core Time (mean ± σ): 608.7 ms ± 4.4 ms [User: 368.3 ms, System: 455.1 ms] Range (min … max): 602.3 ms … 618.8 ms 10 runs NEW Benchmark #1: ./x check core Time (mean ± σ): 462.8 ms ± 2.6 ms [User: 350.2 ms, System: 485.1 ms] Range (min … max): 457.5 ms … 465.6 ms 10 runs ``` This should help with the rust-analyzer setup which issues many individual `./x check` calls. There's more that could be done but these were the lowest-hanging fruits that I saw.
ZuseZ4
pushed a commit
that referenced
this issue
Nov 21, 2024
The failure output is: ``` SplitVectorOperand Op #1: t51: i32 = llvm.wasm.alltrue TargetConstant:i32<12408>, t50 rustc-LLVM ERROR: Do not know how to split this operator's operand! ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Based on our HackMD, steps in order to differentiate code using this rust fork. First iteration:
Add enzyme flag to x.py. Propagate that flag trough. We only request it at build time, not when running cargo/rustc.
Usage:
./configure --enable-llvm-link-shared --enable-llvm-plugins --enable-llvm-enzyme --release-channel=nightly --enable-llvm-assertions --enable-clang --enable-lld --enable-option-checking --enable-ninja
andx build --stage 1
Add enzyme as tree along with pre-build wrappers conditionally.
Add a trait file to
rustc_codegen_ssa/src/traits/autodiff.rs
@ZuseZ4, @bytesnakeWrite a special proc-macro which ~ follows this poc here, Add a hidden node similar to the asm macro and propagate the macro "parameters" trough. @bytesnake
Tighten macro assertions if possible, drop
extern "C"
generation, further improvements. @bytesnakeAdd (cfg-gated) code in
rustc_codegen_llvm/src/lib.rs
andrustc_codegen_ssa/src/back/write.rs
to differentiate full-LTO builds.Testing
Notes / changes:
The text was updated successfully, but these errors were encountered: