forked from rust-lang/rust
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodiff Upstreaming - rustc_codegen_llvm changes #178
Open
ZuseZ4
wants to merge
1,960
commits into
master
Choose a base branch
from
enzyme-cg-llvm
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ZuseZ4
force-pushed
the
master
branch
3 times, most recently
from
October 28, 2024 02:32
4f0389e
to
012062a
Compare
…kingjubilee Rollup of 8 pull requests Successful merges: - rust-lang#132790 (Add as_slice/into_slice for IoSlice/IoSliceMut.) - rust-lang#132905 ([AIX] Add crate "unwind" to link with libunwind) - rust-lang#132977 (Fix compilation error on Solaris due to flock usage) - rust-lang#132984 ([illumos] use pipe2 to create anonymous pipes) - rust-lang#133019 (docs: Fix missing period and colon in methods for primitive types) - rust-lang#133048 (use `&raw` in `{read, write}_unaligned` documentation) - rust-lang#133050 (Always inline functions signatures containing `f16` or `f128`) - rust-lang#133053 (tests: Fix the SIMD FFI tests with certain x86 configuration) r? `@ghost` `@rustbot` modify labels: rollup
fix REGISTRY_USERNAME to reuse cache between auto and pr jobs see https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/reuse.20.28some.29.20docker.20images.20for.20pr.2Fauto.3F
Over in Zed we've noticed that loading crates for a large-ish workspace can take almost 200ms. We've pinned it down to how rustc searches for paths, as it performs a linear search over the list of candidate paths. In our case the candidate list had about 20k entries which we had to iterate over for each dependency being loaded. This commit introduces a simple FilesIndex that's just a sorted Vec under the hood. Since crates are looked up by both prefix and suffix, we perform a range search on said Vec (which constraints the search space based on prefix) and follow up with a linear scan of entries with matching suffixes. FilesIndex is also pre-filtered before any queries are performed using available target information; query prefixes/sufixes are based on the target we are compiling for, so we can remove entries that can never match up front. Overall, this commit brings down build time for us in dev scenarios by about 6%. 100ms might not seem like much, but this is a constant cost that each of our workspace crates has to pay, even when said crate is miniscule.
…s, r=saethlin rustc_metadata: Preprocess search paths for better performance Over in Zed we've noticed that loading crates for a large-ish workspace (~100 members of workspace, over 1000 crates being built for the main binary target) can take almost 200ms. We've pinned it down to how rustc searches for paths to dependency files, as it performs a linear search over the list of candidate paths. In our case the candidate list had about 20k entries which we had to iterate over for each dependency being loaded. Our workspace is also pretty bottom-heavy, e.g. most of the workspace members pull in most of the transitive dependencies one way or another, which means that we spend quite some time loading crates at rustc startup. This commit introduces a simple FilesIndex that's just a BTreeMap under the hood. Since crates are looked up by both prefix and suffix, we perform a range search on said BTree (which constraints the search space based on prefix) and follow up with a linear scan of entries with matching suffixes. Overall, this commit brings down build time for us in dev scenarios by about 6%. 100ms might not seem like much, but this is a constant cost that each of our workspace crates has to pay, even when said crate is miniscule.
Extend the test for pac-ret with clang and LTO by checking that different branch protection flags are preserved after the LTO step. There was an issue in older LLVM versions that was causing this to behave incorrectly. Tests the LLVM behaviour added in: llvm/llvm-project@1782810
the internal representation of `std::sync::Mutex` depends on the compilation target. due to this, the compiler produces different number of errors for UI test `issue-17431-6.rs` depending on the compilation target. for example, when compiling the UI test to an `*-apple-*` or `*-qnx7*` target, the "cycle detected" error is not reported ``` console $ cat src/lib.rs use std::sync::Mutex; enum Foo { X(Mutex<Option<Foo>>), } impl Foo { fn bar(self) {} } fn main() {} $ cargo check --target x86_64-apple-ios 2>&1 | rg '^error\[' error[E0072]: recursive type `Foo` has infinite size ``` whereas rustc produces two errors for other OSes, like Linux, which is what the UI test expects ``` console $ cargo check --target x86_64-unknown-linux-gnu 2>&1 | rg '^error\[' error[E0072]: recursive type `Foo` has infinite size error[E0391]: cycle detected when computing when `Foo` needs drop ``` this commit replaces the problematic `Mutex` with `UnsafeCell`, which has the same internal representation regardless of the compilation target. with that change, rustc reports two errors for all compilation targets. ``` console $ cat src/lib.rs use std::cell::UnsafeCell; enum Foo { X(UnsafeCell<Option<Foo>>), } impl Foo { fn bar(self) {} } fn main() {} $ cargo check --target x86_64-apple-ios 2>&1 | rg '^error\[' error[E0072]: recursive type `Foo` has infinite size error[E0391]: cycle detected when computing when `Foo` needs drop ``` with this change, we can remove the `ignore-apple` directive as the UI test now also passes on apple targets.
…r=compiler-errors check_consts: fix error requesting feature gate when that gate is not actually needed When working on rust-lang/hashbrown#586 I noticed that the compiler asks for the `rustc_private` feature to be enabled if one forgets to set `rustc_const_stable_indirect` on a function -- but enabling `rustc_private` would not actually help. This fixes the diagnostics. r? `@compiler-errors`
Separate `RemoveLet` span into primary span for `let` and removal suggestion span for `let `, so that primary span does not include whitespace. Fixes: rust-lang#133031 Signed-off-by: Tyrone Wu <wudevelops@gmail.com>
…tures-apit, r=BoxyUwU Recurse into APITs in `impl_trait_overcaptures` We were previously not detecting cases where an RPIT was located in the return type of an async function, leading to underfiring of the `impl_trait_overcaptures`. This PR does this recursion properly now. cc rust-lang#132809
…otatable, r=petrochenkov Refactor `configure_annotatable` This PR streamlines `configure_annotatable` and nearby code considerably. r? `@petrochenkov`
…ag-merge, r=jieyouxu tests: Test pac-ret flag merging on clang with LTO Extend the test for pac-ret with clang and LTO by checking that different branch protection flags are preserved after the LTO step. There was an issue in older LLVM versions that was causing this to behave incorrectly. try-job: aarch64-gnu-debug
…g_arg, r=compiler-errors Change Visitor::visit_precise_capturing_arg so it returns a Visitor::Result r? `@petrochenkov` related to rust-lang#128974
…rder_fields Migrate `reorder_fields` Assist to Use `SyntaxFactory`
This fixes a problem where code generated by an external macro with an RPIT would end up using the call-site edition instead of the macro's edition for the RPIT. When used from a 2024 crate, this caused the code to change behavior to the 2024 capturing rules, which we don't want. This was caused by the impl-trait lowering code would replace the span with one marked with `DesugaringKind::OpaqueTy` desugaring. However, it was also overriding the edition of the span with the edition of the local crate. Instead it should be using the edition of the span itself. Fixes rust-lang#132917
…iaskrgr Rollup of 4 pull requests Successful merges: - rust-lang#132817 (Recurse into APITs in `impl_trait_overcaptures`) - rust-lang#133021 (Refactor `configure_annotatable`) - rust-lang#133045 (tests: Test pac-ret flag merging on clang with LTO) - rust-lang#133049 (Change Visitor::visit_precise_capturing_arg so it returns a Visitor::Result) r? `@ghost` `@rustbot` modify labels: rollup
fixes rust-lang#129707 this can be used to show all items in a module, or all associated items for a type. currently sufferes slightly due to case insensitivity, so `Option::` will also show items in the `option::` module. it disables the checking of the last path element, otherwise only items with short names will be shown
For expr `return (_ = 42);` unused_paren lint should not be triggered fixes rust-lang#131989
…ochenkov Add visit_coroutine_kind to ast::Visitor r? ``@petrochenkov`` related to rust-lang#128974
Update cargo 5 commits in 69e595908e2c420e7f0d1be34e6c5b984c8cfb84..66221abdeca2002d318fde6efff516aab091df0e 2024-11-16 01:26:11 +0000 to 2024-11-19 21:30:02 +0000 - Docs for optional registry JSON fields (rust-lang/cargo#14839) - Allow registries to omit empty/default fields in JSON (rust-lang/cargo#14838) - docs(unstable): Link to -Zwarnings issue, tracking issue (rust-lang/cargo#14836) - fix(): error context for git_fetch refspec not found (rust-lang/cargo#14806) - you we distinction (rust-lang/cargo#14829)
take 2 open up coroutines tweak the wordings the lint works up until 2021 We were missing one case, for ADTs, which was causing `Result` to yield incorrect results. only include field spans with significant types deduplicate and eliminate field spans switch to emit spans to impl Drops Co-authored-by: Niko Matsakis <nikomat@amazon.com> collect drops instead of taking liveness diff apply some suggestions and add explantory notes small fix on the cache let the query recurse through coroutine new suggestion format with extracted variable name fine-tune the drop span and messages bugfix on runtime borrows tweak message wording filter out ecosystem types earlier apply suggestions clippy check lint level at session level further restrict applicability of the lint translate bid into nop for stable mir detect cycle in type structure
…ss35 Stabilize const_pin_2 Tracking issue rust-lang#76654; review before FCP in progress.
…in7-windows-msvc, r=ChrisDenton Fix LLVM target triple for `x86_64-win7-windows-msvc` The vendor field needs to be `pc` rather than `win7`.
interpret: make typing_env field private This was made public in rust-lang#133212 but IMO it should remain private. (Specifically, this prevents it from being mutated.) r? `@lcnr`
…iaskrgr Rollup of 3 pull requests Successful merges: - rust-lang#131904 (Stabilize const_pin_2) - rust-lang#133239 (Fix LLVM target triple for `x86_64-win7-windows-msvc`) - rust-lang#133241 (interpret: make typing_env field private) r? `@ghost` `@rustbot` modify labels: rollup
…t-2, r=nikomatsakis Reduce false positives of tail-expr-drop-order from consumed values (attempt #2) r? `@nikomatsakis` Tracked by rust-lang#123739. Related to rust-lang#129864 but not replacing, yet. Related to rust-lang#130836. This is an implementation of the approach suggested in the [Zulip stream](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/temporary.20drop.20order.20changes). A new MIR statement `BackwardsIncompatibleDrop` is added to the MIR syntax. The lint now works by inspecting possibly live move paths before at the `BackwardsIncompatibleDrop` location and the actual drop under the current edition, which should be one before Edition 2024 in practice.
ZuseZ4
force-pushed
the
enzyme-cg-llvm
branch
12 times, most recently
from
November 24, 2024 04:09
469aaf0
to
79a3e2b
Compare
ZuseZ4
force-pushed
the
enzyme-cg-llvm
branch
from
December 3, 2024 02:53
b8fa359
to
119beb4
Compare
ZuseZ4
force-pushed
the
enzyme-cg-llvm
branch
from
December 12, 2024 18:34
67a6b87
to
3d2e36c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now that the autodiff/Enzyme backend is merged, this is an upstream PR for the
rustc_codegen_llvm
changes.It also includes small changes to three files under
compiler/rustc_ast
, which overlap with my frontend PR.Here I only include minimal definitions of structs and enums to be able to build this backend code.
The same goes for minimal changes to
compiler/rustc_codegen_ssa
, the majority of changes there will be in another PR, once either this or the frontend gets merged.We currently have 68 files left to merge, 19 in the frontend PR, 21 (+3 from the frontend) in this PR, and then ~30 in the middle-end.
To already highlight the things which reviewers might want to discuss:
enzyme_ffi.rs
: I do have a fallback module to make sure that we don't link rustc against Enzyme when we build rustc without autodiff support.add_panic_msg_to_global
was a pain to write and I currently can't even use it. Enzyme writes gradients into shadow memory. Pass in one float scalar? We'll allocate and return an extra float telling you how this float affected the output. Pass in a slice of floats? We'll let you allocate the vector and pass in a mutable reference to a float slice, we'll then write the gradient into that slice. It should be at least as large as your original slice, so we check that and panic if not. Currently we panic silently, but I already generate a nicer panic message with this function. I just don't know how to print it to the user. yet. I discussed this with a few rustc devs and the best we could come up with (for now), was to look for mangled panic calls in the IR and pick one, which works surprising reliably. If someone knows a good way to clean this up and print the panic message I'm all in, otherwise I can remove the code that writes the nicer panic message and keep the silent panic, since it's enough for soundness. Especially since this PR is already a bit larger.SanitizeHWAddress
: When differentiating C++, Enzyme can use TBAA to "understand" enums/unions, but for Rust we don't have this information. LLVM might to speculative loads which (without TBAA) confuse Enzyme, so we disable those with this attribute. This attribute is only set during the first opt run before Enzyme differentiates code. We then remove it again once we are done with autodiff and run the opt pipeline a second time. Since enums are everywhere in Rust, support for them is crucial, but if this looks too cursed I can remove these ~100 lines and keep them in my fork for now, we can then discuss them separately to make this PR simpler?Duplicated llvm-opt runs: Differentiating already optimized code (and being able to do additional optimizations on the fly, e.g. for GPU code) is the reason why Enzyme is so fast, so the compile time is acceptable for autodiff users: https://enzyme.mit.edu/talks/Publications/ (There are also algorithmic issues in Enzyme core which are more serious than running opt twice).
I assume that if we merge these minimal cg_ssa changes here already, I also need to fix the other backends (GCC and cliff) to have dummy implementations, correct?
I'm happy to split this PR up further if reviewers have recommendations on how to.
For the full implementation, see: rust-lang#129175
Tracking: