Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodiff Upstreaming - rustc_codegen_llvm changes #178

Open
wants to merge 1,960 commits into
base: master
Choose a base branch
from
Open

Conversation

ZuseZ4
Copy link
Member

@ZuseZ4 ZuseZ4 commented Sep 7, 2024

Now that the autodiff/Enzyme backend is merged, this is an upstream PR for the rustc_codegen_llvm changes.
It also includes small changes to three files under compiler/rustc_ast, which overlap with my frontend PR.
Here I only include minimal definitions of structs and enums to be able to build this backend code.
The same goes for minimal changes to compiler/rustc_codegen_ssa, the majority of changes there will be in another PR, once either this or the frontend gets merged.

We currently have 68 files left to merge, 19 in the frontend PR, 21 (+3 from the frontend) in this PR, and then ~30 in the middle-end.

To already highlight the things which reviewers might want to discuss:

  1. enzyme_ffi.rs: I do have a fallback module to make sure that we don't link rustc against Enzyme when we build rustc without autodiff support.

  2. add_panic_msg_to_global was a pain to write and I currently can't even use it. Enzyme writes gradients into shadow memory. Pass in one float scalar? We'll allocate and return an extra float telling you how this float affected the output. Pass in a slice of floats? We'll let you allocate the vector and pass in a mutable reference to a float slice, we'll then write the gradient into that slice. It should be at least as large as your original slice, so we check that and panic if not. Currently we panic silently, but I already generate a nicer panic message with this function. I just don't know how to print it to the user. yet. I discussed this with a few rustc devs and the best we could come up with (for now), was to look for mangled panic calls in the IR and pick one, which works surprising reliably. If someone knows a good way to clean this up and print the panic message I'm all in, otherwise I can remove the code that writes the nicer panic message and keep the silent panic, since it's enough for soundness. Especially since this PR is already a bit larger.

  3. SanitizeHWAddress: When differentiating C++, Enzyme can use TBAA to "understand" enums/unions, but for Rust we don't have this information. LLVM might to speculative loads which (without TBAA) confuse Enzyme, so we disable those with this attribute. This attribute is only set during the first opt run before Enzyme differentiates code. We then remove it again once we are done with autodiff and run the opt pipeline a second time. Since enums are everywhere in Rust, support for them is crucial, but if this looks too cursed I can remove these ~100 lines and keep them in my fork for now, we can then discuss them separately to make this PR simpler?

  4. Duplicated llvm-opt runs: Differentiating already optimized code (and being able to do additional optimizations on the fly, e.g. for GPU code) is the reason why Enzyme is so fast, so the compile time is acceptable for autodiff users: https://enzyme.mit.edu/talks/Publications/ (There are also algorithmic issues in Enzyme core which are more serious than running opt twice).

  5. I assume that if we merge these minimal cg_ssa changes here already, I also need to fix the other backends (GCC and cliff) to have dummy implementations, correct?

  6. I'm happy to split this PR up further if reviewers have recommendations on how to.

For the full implementation, see: rust-lang#129175

Tracking:

bors and others added 25 commits November 15, 2024 03:29
…kingjubilee

Rollup of 8 pull requests

Successful merges:

 - rust-lang#132790 (Add as_slice/into_slice for IoSlice/IoSliceMut.)
 - rust-lang#132905 ([AIX] Add crate "unwind" to link with libunwind)
 - rust-lang#132977 (Fix compilation error on Solaris due to flock usage)
 - rust-lang#132984 ([illumos] use pipe2 to create anonymous pipes)
 - rust-lang#133019 (docs: Fix missing period and colon in methods for primitive types)
 - rust-lang#133048 (use `&raw` in `{read, write}_unaligned` documentation)
 - rust-lang#133050 (Always inline functions signatures containing `f16` or `f128`)
 - rust-lang#133053 (tests: Fix the SIMD FFI tests with certain x86 configuration)

r? `@ghost`
`@rustbot` modify labels: rollup
Over in Zed we've noticed that loading crates for a large-ish workspace can take almost 200ms. We've pinned it down to how rustc searches for paths, as it performs a linear search over the list of candidate paths. In our case the candidate list had about 20k entries which we had to iterate over for each dependency being loaded.

This commit introduces a simple FilesIndex that's just a sorted Vec under the hood. Since crates are looked up by both prefix and suffix, we perform a range search on said Vec (which constraints the search space based on prefix) and follow up with a linear scan of entries with matching suffixes.
FilesIndex is also pre-filtered before any queries are performed using available target information; query prefixes/sufixes are based on the target we are compiling for, so we can remove entries that can never match up front.

Overall, this commit brings down build time for us in dev scenarios by about 6%.
100ms might not seem like much, but this is a constant cost that each of our workspace crates has to pay, even when said crate is miniscule.
…s, r=saethlin

rustc_metadata: Preprocess search paths for better performance

Over in Zed we've noticed that loading crates for a large-ish workspace (~100 members of workspace, over 1000 crates being built for the main binary target) can take almost 200ms. We've pinned it down to how rustc searches for paths to dependency files, as it performs a linear search over the list of candidate paths. In our case the candidate list had about 20k entries which we had to iterate over for each dependency being loaded. Our workspace is also pretty bottom-heavy, e.g. most of the workspace members pull in most of the transitive dependencies one way or another, which means that we spend quite some time loading crates at rustc startup.

This commit introduces a simple FilesIndex that's just a BTreeMap under the hood. Since crates are looked up by both prefix and suffix, we perform a range search on said BTree (which constraints the search space based on prefix) and follow up with a linear scan of entries with matching suffixes.

Overall, this commit brings down build time for us in dev scenarios by about 6%. 100ms might not seem like much, but this is a constant cost that each of our workspace crates has to pay, even when said crate is miniscule.
Extend the test for pac-ret with clang and LTO by checking that
different branch protection flags are preserved after the LTO step.
There was an issue in older LLVM versions that was causing this to
behave incorrectly.

Tests the LLVM behaviour added in:
llvm/llvm-project@1782810
the internal representation of `std::sync::Mutex` depends on the compilation target. due to this,
the compiler produces different number of errors for UI test
`issue-17431-6.rs` depending on the compilation target.

for example, when compiling the UI test to an `*-apple-*` or `*-qnx7*` target, the "cycle detected"
error is not reported

``` console
$ cat src/lib.rs
use std::sync::Mutex;

enum Foo {
    X(Mutex<Option<Foo>>),
}

impl Foo {
    fn bar(self) {}
}

fn main() {}

$ cargo check --target x86_64-apple-ios 2>&1 | rg '^error\['
error[E0072]: recursive type `Foo` has infinite size
```

whereas rustc produces two errors for other OSes, like Linux, which is what the UI test expects

``` console
$ cargo check --target x86_64-unknown-linux-gnu 2>&1 | rg '^error\['
error[E0072]: recursive type `Foo` has infinite size
error[E0391]: cycle detected when computing when `Foo` needs drop
```

this commit replaces the problematic `Mutex` with `UnsafeCell`, which has the same internal
representation regardless of the compilation target. with that change, rustc reports two errors for
all compilation targets.

``` console
$ cat src/lib.rs
use std::cell::UnsafeCell;

enum Foo {
    X(UnsafeCell<Option<Foo>>),
}

impl Foo {
    fn bar(self) {}
}

fn main() {}

$ cargo check --target x86_64-apple-ios 2>&1 | rg '^error\['
error[E0072]: recursive type `Foo` has infinite size
error[E0391]: cycle detected when computing when `Foo` needs drop
```

with this change, we can remove the `ignore-apple` directive as the UI test now also passes on apple
targets.
…r=compiler-errors

check_consts: fix error requesting feature gate when that gate is not actually needed

When working on rust-lang/hashbrown#586 I noticed that the compiler asks for the `rustc_private` feature to be enabled if one forgets to set `rustc_const_stable_indirect` on a function -- but enabling `rustc_private` would not actually help. This fixes the diagnostics.

r? `@compiler-errors`
Separate `RemoveLet` span into primary span for `let` and removal
suggestion span for `let `, so that primary span does not include
whitespace.

Fixes: rust-lang#133031

Signed-off-by: Tyrone Wu <wudevelops@gmail.com>
…tures-apit, r=BoxyUwU

Recurse into APITs in `impl_trait_overcaptures`

We were previously not detecting cases where an RPIT was located in the return type of an async function, leading to underfiring of the `impl_trait_overcaptures`. This PR does this recursion properly now.

cc rust-lang#132809
…otatable, r=petrochenkov

Refactor `configure_annotatable`

This PR streamlines `configure_annotatable` and nearby code considerably.

r? `@petrochenkov`
…ag-merge, r=jieyouxu

tests: Test pac-ret flag merging on clang with LTO

Extend the test for pac-ret with clang and LTO by checking that different branch protection flags are preserved after the LTO step. There was an issue in older LLVM versions that was causing this to behave incorrectly.

try-job: aarch64-gnu-debug
…g_arg, r=compiler-errors

Change Visitor::visit_precise_capturing_arg so it returns a Visitor::Result

r? `@petrochenkov`

related to rust-lang#128974
…rder_fields

Migrate `reorder_fields` Assist to Use `SyntaxFactory`
This fixes a problem where code generated by an external macro with an
RPIT would end up using the call-site edition instead of the macro's
edition for the RPIT. When used from a 2024 crate, this caused the code
to change behavior to the 2024 capturing rules, which we don't want.

This was caused by the impl-trait lowering code would replace the span
with one marked with `DesugaringKind::OpaqueTy` desugaring. However, it
was also overriding the edition of the span with the edition of the
local crate. Instead it should be using the edition of the span itself.

Fixes rust-lang#132917
…iaskrgr

Rollup of 4 pull requests

Successful merges:

 - rust-lang#132817 (Recurse into APITs in `impl_trait_overcaptures`)
 - rust-lang#133021 (Refactor `configure_annotatable`)
 - rust-lang#133045 (tests: Test pac-ret flag merging on clang with LTO)
 - rust-lang#133049 (Change Visitor::visit_precise_capturing_arg so it returns a Visitor::Result)

r? `@ghost`
`@rustbot` modify labels: rollup
fixes rust-lang#129707

this can be used to show all items in a module,
or all associated items for a type.
currently sufferes slightly due to case insensitivity,
so `Option::` will also show items in the `option::` module.

it disables the checking of the last path element,
otherwise only items with short names will be shown
For expr `return (_ = 42);` unused_paren lint should not be triggered

fixes rust-lang#131989
…ochenkov

Add visit_coroutine_kind to ast::Visitor

r? ``@petrochenkov``

related to rust-lang#128974
RalfJung and others added 9 commits November 20, 2024 11:05
Update cargo

5 commits in 69e595908e2c420e7f0d1be34e6c5b984c8cfb84..66221abdeca2002d318fde6efff516aab091df0e
2024-11-16 01:26:11 +0000 to 2024-11-19 21:30:02 +0000
- Docs for optional registry JSON fields (rust-lang/cargo#14839)
- Allow registries to omit empty/default fields in JSON (rust-lang/cargo#14838)
- docs(unstable): Link to -Zwarnings issue, tracking issue (rust-lang/cargo#14836)
- fix(): error context for git_fetch refspec not found (rust-lang/cargo#14806)
- you we distinction (rust-lang/cargo#14829)
take 2

open up coroutines

tweak the wordings

the lint works up until 2021

We were missing one case, for ADTs, which was
causing `Result` to yield incorrect results.

only include field spans with significant types

deduplicate and eliminate field spans

switch to emit spans to impl Drops

Co-authored-by: Niko Matsakis <nikomat@amazon.com>

collect drops instead of taking liveness diff

apply some suggestions and add explantory notes

small fix on the cache

let the query recurse through coroutine

new suggestion format with extracted variable name

fine-tune the drop span and messages

bugfix on runtime borrows

tweak message wording

filter out ecosystem types earlier

apply suggestions

clippy

check lint level at session level

further restrict applicability of the lint

translate bid into nop for stable mir

detect cycle in type structure
…ss35

Stabilize const_pin_2

Tracking issue rust-lang#76654; review before FCP in progress.
…in7-windows-msvc, r=ChrisDenton

Fix LLVM target triple for `x86_64-win7-windows-msvc`

The vendor field needs to be `pc` rather than `win7`.
interpret: make typing_env field private

This was made public in rust-lang#133212 but IMO it should remain private. (Specifically, this prevents it from being mutated.)

r? `@lcnr`
…iaskrgr

Rollup of 3 pull requests

Successful merges:

 - rust-lang#131904 (Stabilize const_pin_2)
 - rust-lang#133239 (Fix LLVM target triple for `x86_64-win7-windows-msvc`)
 - rust-lang#133241 (interpret: make typing_env field private)

r? `@ghost`
`@rustbot` modify labels: rollup
…t-2, r=nikomatsakis

Reduce false positives of tail-expr-drop-order from consumed values (attempt #2)

r? `@nikomatsakis`

Tracked by rust-lang#123739.

Related to rust-lang#129864 but not replacing, yet.

Related to rust-lang#130836.

This is an implementation of the approach suggested in the [Zulip stream](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/temporary.20drop.20order.20changes). A new MIR statement `BackwardsIncompatibleDrop` is added to the MIR syntax. The lint now works by inspecting possibly live move paths before at the `BackwardsIncompatibleDrop` location and the actual drop under the current edition, which should be one before Edition 2024 in practice.
@ZuseZ4 ZuseZ4 force-pushed the enzyme-cg-llvm branch 12 times, most recently from 469aaf0 to 79a3e2b Compare November 24, 2024 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.