Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodiff Upstreaming - rustc_codegen_llvm changes #130060

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ZuseZ4
Copy link
Contributor

@ZuseZ4 ZuseZ4 commented Sep 7, 2024

Now that the autodiff/Enzyme backend is merged, this is an upstream PR for the rustc_codegen_llvm changes.
It also includes small changes to three files under compiler/rustc_ast, which overlap with my frontend PR (#129458).
Here I only include minimal definitions of structs and enums to be able to build this backend code.
The same goes for minimal changes to compiler/rustc_codegen_ssa, the majority of changes there will be in another PR, once either this or the frontend gets merged.

We currently have 68 files left to merge, 19 in the frontend PR, 21 (+3 from the frontend) in this PR, and then ~30 in the middle-end.

This PR is large because it includes two of my three large files (~800 loc each). I could also first only upstream enzyme_ffi.rs, but I think people might want to see some use of these bindings in the same PR?

To already highlight the things which reviewers might want to discuss:

  1. enzyme_ffi.rs: I do have a fallback module to make sure that we don't link rustc against Enzyme when we build rustc without autodiff support.

  2. add_panic_msg_to_global was a pain to write and I currently can't even use it. Enzyme writes gradients into shadow memory. Pass in one float scalar? We'll allocate and return an extra float telling you how this float affected the output. Pass in a slice of floats? We'll let you allocate the vector and pass in a mutable reference to a float slice, we'll then write the gradient into that slice. It should be at least as large as your original slice, so we check that and panic if not. Currently we panic silently, but I already generate a nicer panic message with this function. I just don't know how to print it to the user. yet. I discussed this with a few rustc devs and the best we could come up with (for now), was to look for mangled panic calls in the IR and pick one, which works surprisingly reliably. If someone knows a good way to clean this up and print the panic message I'm all in, otherwise I can remove the code that writes the nicer panic message and keep the silent panic, since it's enough for soundness. Especially since this PR is already a bit larger.

  3. SanitizeHWAddress: When differentiating C++, Enzyme can use TBAA to "understand" enums/unions, but for Rust we don't have this information. LLVM might to speculative loads which (without TBAA) confuse Enzyme, so we disable those with this attribute. This attribute is only set during the first opt run before Enzyme differentiates code. We then remove it again once we are done with autodiff and run the opt pipeline a second time. Since enums are everywhere in Rust, support for them is crucial, but if this looks too cursed I can remove these ~100 lines and keep them in my fork for now, we can then discuss them separately to make this PR simpler?

  4. Duplicated llvm-opt runs: Differentiating already optimized code (and being able to do additional optimizations on the fly, e.g. for GPU code) is the reason why Enzyme is so fast, so the compile time is acceptable for autodiff users: https://enzyme.mit.edu/talks/Publications/ (There are also algorithmic issues in Enzyme core which are more serious than running opt twice).

  5. I assume that if we merge these minimal cg_ssa changes here already, I also need to fix the other backends (GCC and cliff) to have dummy implementations, correct?

  6. I'm happy to split this PR up further if reviewers have recommendations on how to.

For the full implementation, see: #129175

Tracking:

@rustbot
Copy link
Collaborator

rustbot commented Sep 7, 2024

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot
Copy link
Collaborator

rustbot commented Sep 7, 2024

⚠️ Warning ⚠️

  • These commits modify submodules.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 7, 2024
@rustbot
Copy link
Collaborator

rustbot commented Sep 7, 2024

This PR modifies config.example.toml.

If appropriate, please update CONFIG_CHANGE_HISTORY in src/bootstrap/src/utils/change_tracker.rs.

Some changes occurred in cfg and check-cfg configuration

cc @Urgau

@rust-log-analyzer

This comment has been minimized.

@@ -176,6 +176,8 @@ pub(crate) fn default_configuration(sess: &Session) -> Cfg {
// NOTE: These insertions should be kept in sync with
// `CheckCfg::fill_well_known` below.

ins_none!(sym::autodiff_fallback);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be insta stable, it should be at least gated behind nightly compiler.

Suggested change
ins_none!(sym::autodiff_fallback);
if sess.is_nightly_build() {
ins_none!(sym::autodiff_fallback);
}

Please also follow all the steps regarding a new cfg as defined in the top of this file (as well as the tests files):

//! ## Adding a new cfg
//!
//! Adding a new feature requires two new symbols one for the cfg it-self
//! and the second one for the unstable feature gate, those are defined in
//! `rustc_span::symbol`.
//!
//! As well as the following points,
//! - Add the activation logic in [`default_configuration`]
//! - Add the cfg to [`CheckCfg::fill_well_known`] (and related files),
//! so that the compiler can know the cfg is expected
//! - Add the cfg in [`disallow_cfgs`] to disallow users from setting it via `--cfg`
//! - Add the feature gating in `compiler/rustc_feature/src/builtin_attrs.rs`

@jieyouxu jieyouxu added the F-autodiff `#![feature(autodiff)]` label Sep 7, 2024
@fee1-dead
Copy link
Member

r? compiler

@rust-log-analyzer

This comment has been minimized.

@bors

This comment was marked as resolved.

@michaelwoerister
Copy link
Member

r? compiler

@rustbot rustbot assigned davidtwco and unassigned michaelwoerister Oct 7, 2024
@davidtwco
Copy link
Member

There's very little chance of this being merged in one PR with one commit of this size. You'll need to split this up into well-commented/motivated PRs that can be landed one at a time. I haven't spent much time looking at this PR, so I don't have any suggestions on how to split this up. I'd recommend finding someone on the compiler team who is interested in these changes and who you can work with to do the reviews.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to review this, so just one drive-by note: It looks like a decent part of the extra FFI APIs in enzyme_ffi.rs are essentially duplicates of things that we already have bindings for under slightly different names and signatures. Like we already have LLVMRustAddFunctionAttributes/LLVMRustAddCallSiteAttributes and this introduces LLVMRustAddEnumAttributeAtIndex. It also looks like the code doesn't make use of the Builder abstraction and instead calls FFI APIs directly everywhere, which is probably also where the duplication comes from.

@davidtwco davidtwco added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 11, 2024
@alex-semenyuk alex-semenyuk added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 13, 2024
@ZuseZ4 ZuseZ4 force-pushed the enzyme-cg-llvm branch 3 times, most recently from 11c3bae to 78297a9 Compare November 21, 2024 01:21
@bors
Copy link
Contributor

bors commented Dec 13, 2024

⌛ Testing commit 3d2e36c with merge 3ac8ed9...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 13, 2024
Autodiff Upstreaming - rustc_codegen_llvm changes

Now that the autodiff/Enzyme backend is merged, this is an upstream PR for the `rustc_codegen_llvm` changes.
It also includes small changes to three files under `compiler/rustc_ast`, which overlap with my frontend PR (rust-lang#129458).
Here I only include minimal definitions of structs and enums to be able to build this backend code.
The same goes for minimal changes to `compiler/rustc_codegen_ssa`, the majority of changes there will be in another PR, once either this or the frontend gets merged.

We currently have 68 files left to merge, 19 in the frontend PR, 21 (+3 from the frontend) in this PR, and then ~30 in the middle-end.

This PR is large because it includes two of my three large files (~800 loc each). I could also first only upstream enzyme_ffi.rs, but I think people might want to see some use of these bindings in the same PR?

To already highlight the things which reviewers might want to discuss:

1) `enzyme_ffi.rs`: I do have a fallback module to make sure that we don't link rustc against Enzyme when we build rustc without autodiff support.

2) `add_panic_msg_to_global` was a pain to write and I currently can't even use it. Enzyme writes gradients into shadow memory. Pass in one float scalar? We'll allocate and return an extra float telling you how this float affected the output. Pass in a slice of floats? We'll let you allocate the vector and pass in a mutable reference to a float slice, we'll then write the gradient into that slice. It should be at least as large as your original slice, so we check that and panic if not. Currently we panic silently, but I already generate a nicer panic message with this function. I just don't know how to print it to the user. yet. I discussed this with a few rustc devs and the best we could come up with (for now), was to look for mangled panic calls in the IR and pick one, which works surprisingly reliably. If someone knows a good way to clean this up and print the panic message I'm all in, otherwise I can remove the code that writes the nicer panic message and keep the silent panic, since it's enough for soundness. Especially since this PR is already a bit larger.

3) `SanitizeHWAddress`: When differentiating C++, Enzyme can use TBAA to "understand" enums/unions, but for Rust we don't have this information. LLVM might to speculative loads which (without TBAA) confuse Enzyme, so we disable those with this attribute. This attribute is only set during the first opt run before Enzyme differentiates code. We then remove it again once we are done with autodiff and run the opt pipeline a second time. Since enums are everywhere in Rust, support for them is crucial, but if this looks too cursed I can remove these ~100 lines and keep them in my fork for now, we can then discuss them separately to make this PR simpler?

4) Duplicated llvm-opt runs: Differentiating already optimized code (and being able to do additional optimizations on the fly, e.g. for GPU code) is _the_ reason why Enzyme is so fast, so the compile time is acceptable for autodiff users:  https://enzyme.mit.edu/talks/Publications/ (There are also algorithmic issues in Enzyme core which are more serious than running opt twice).

5) I assume that if we merge these minimal cg_ssa changes here already, I also need to fix the other backends (GCC and cliff) to have dummy implementations, correct?

6) *I'm happy to split this PR up further if reviewers have recommendations on how to.*

For the full implementation, see: rust-lang#129175

Tracking:

- rust-lang#124509
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Dec 13, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 13, 2024
@ZuseZ4
Copy link
Contributor Author

ZuseZ4 commented Dec 13, 2024

SIGILL ?
The middle-end which would call my code here hasn't been merged yet, so it shouldn't even run. So spurious?

@ZuseZ4
Copy link
Contributor Author

ZuseZ4 commented Dec 13, 2024

Let's see if I can

@bors r=davidtwco

@bors

This comment was marked as resolved.

@jieyouxu
Copy link
Member

dist-x86_64-apple failing with SIGILL here is making me suspicious that this might be some manifestation or variation of #134220, which is concerning, but I suspect it's not from your PR.

@jieyouxu
Copy link
Member

I am suspecting an unrelated failure.
@bors r=@davidtwco

@bors

This comment was marked as resolved.

@bors
Copy link
Contributor

bors commented Dec 13, 2024

📌 Commit 3d2e36c has been approved by davidtwco

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 13, 2024
@jieyouxu jieyouxu added the CI-spurious-x86_64-apple-SIGSEGV-SIGILL CI spurious failure: SIGSEGV/SIGILL while building rustc itself on x86_64-apple label Dec 13, 2024
@bors
Copy link
Contributor

bors commented Dec 13, 2024

☔ The latest upstream changes (presumably #133099) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 13, 2024
return unsafe { llvm_optimize(cgcx, dcx, module, config, opt_level, opt_stage) };

// If we know that we will later run AD, then we disable vectorization and loop unrolling
let skip_size_increasing_opts = cfg!(llvm_enzyme);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be true like at the other site, considering llvm_optimize internally checks the cfg again?

compiler/rustc_codegen_llvm/src/back/write.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/write.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/write.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/write.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/write.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/builder.rs Outdated Show resolved Hide resolved
@oli-obk oli-obk self-assigned this Dec 14, 2024
Comment on lines 130 to 140
let enzyme_ty = llvm::LLVMFunctionType(ret_ty, ptr::null(), 0, True);
let ad_fn = llvm::LLVMRustGetOrInsertFunction(
llmod,
ad_name.as_ptr() as *const c_char,
ad_name.len().try_into().unwrap(),
enzyme_ty,
);
// Otherwise LLVM might inline our temporary code before the enzyme pass has a chance to
// do it's work.
let attr = llvm::AttributeKind::NoInline.create_attr(llcx);
attributes::apply_to_llfn(ad_fn, Function, &[attr]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shares some code with declare_raw_fn, please deduplicate logic between this and declare_raw_fn with another "even rawer" 😆 helper function

let br = llvm::LLVMRustGetTerminator(entry);
llvm::LLVMRustEraseInstFromParent(br);

let builder = llvm::LLVMCreateBuilderInContext(llcx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be using the Builder::with_cx function and use methods on the Builder instead of accessing the raw llvm builder directly for everything

Copy link
Contributor Author

@ZuseZ4 ZuseZ4 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying this for a bit, but right now I use a &CodegenContext when defining the autodiff function in compiler/rustc_codegen_ssa/src/traits/write.rs under the WriteBackendMethods trait, instantiated to &CodegenContext<LlvmCodegenBackend> later.

The autodiff implementation takes a LlvmCodegenBackend, so I have the llcx, but not the different CodegenCx needed to create the Builder, defined in compiler/rustc_codegen_llvm/src/context.rs. I might miss a way to get the CodegenCx, but right now I don't see how I can move the code over, while keeping autodiff under the WriteBackendMethods trait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for context, we must run autodiff on the single fat-lto module (for the foreseeable future, though there are efforts to remove it over the next months/years) and the CodegenCx seems more like something more low-level, per individual CGU, so it wouldn't be the right level, despite otherwise having nice helpers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this isn't possible without refactoring the builder/codegen_cx quite a bit. It's indeed for one cgu, though it doesn't seem too bad to generate a dummy cgu (it's just a rust struct with no LLVM information in it) for now and use

It gives autodiff strictly more powers than it needs to (e.g. all the caches are useless for it), but then we at least are using the existing helpers and can then analyze what to refactor.

@nikic what do you think?

Comment on lines 155 to 159
let enzyme_const = llvm::create_md_string(llcx, "enzyme_const");
let enzyme_out = llvm::create_md_string(llcx, "enzyme_out");
let enzyme_dup = llvm::create_md_string(llcx, "enzyme_dup");
let enzyme_dupnoneed = llvm::create_md_string(llcx, "enzyme_dupnoneed");
let enzyme_primal_ret = llvm::create_md_string(llcx, "enzyme_primal_return");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could then all be a call to CodegenCx::typeid_metadata

Comment on lines 172 to 175
// We now handle the issue that Rust level arguments not always match the llvm-ir level
// arguments. A slice, `&[f32]`, for example, is represented as a pointer and a length on
// llvm-ir level. The number of activities matches the number of Rust level arguments, so we
// need to match those.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index math is a bit unusual. I think I'd prefer using iterators (peekable in the input case), and just stepping them manually.

// int2 >= int1, which means the shadow vector is large enough to store the gradient.
assert!(llvm::LLVMRustGetTypeKind(next_outer_ty) == llvm::TypeKind::Integer);
let next_outer_arg2 = outer_args[outer_pos + 2];
let next_outer_ty2 = llvm::LLVMTypeOf(next_outer_arg2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer val_ty

}
}

let call = llvm::LLVMBuildCall2(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a Builder method similar to callbr or call

llvm::LLVMRustEraseInstBefore(entry, last_inst);
llvm::LLVMRustEraseInstFromParent(last_inst);

let void_ty = llvm::LLVMVoidTypeInContext(llcx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_void

@bors
Copy link
Contributor

bors commented Dec 15, 2024

🔒 Merge conflict

This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again.

How do I rebase?

Assuming self is your fork and upstream is this repository, you can resolve the conflict following these steps:

  1. git checkout enzyme-cg-llvm (switch to your branch)
  2. git fetch upstream master (retrieve the latest master)
  3. git rebase upstream/master -p (rebase on top of it)
  4. Follow the on-screen instruction to resolve conflicts (check git status if you got lost).
  5. git push self enzyme-cg-llvm --force-with-lease (update this PR)

You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial.

Please avoid the "Resolve conflicts" button on GitHub. It uses git merge instead of git rebase which makes the PR commit history more difficult to read.

Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Cargo.lock conflict is handled during merge and rebase. This is normal, and you should still perform step 5 to update this PR.

Error message
Auto-merging compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp
Auto-merging compiler/rustc_codegen_llvm/src/lib.rs
CONFLICT (content): Merge conflict in compiler/rustc_codegen_llvm/src/lib.rs
Auto-merging compiler/rustc_codegen_llvm/src/back/write.rs
Auto-merging compiler/rustc_codegen_llvm/src/back/lto.rs
Auto-merging compiler/rustc_codegen_gcc/src/lib.rs
Automatic merge failed; fix conflicts and then commit the result.

@davidtwco
Copy link
Member

r? @oli-obk

@rustbot
Copy link
Collaborator

rustbot commented Dec 20, 2024

Could not assign reviewer from: oli-obk.
User(s) oli-obk are either the PR author, already assigned, or on vacation, and there are no other candidates.
Use r? to specify someone else to assign.


let unroll_loops;
let vectorize_slp;
let vectorize_loop;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal, can happen on master: these three could be fields of a struct, as we're passing them around together anyway

let br = llvm::LLVMRustGetTerminator(entry);
llvm::LLVMRustEraseInstFromParent(br);

let builder = llvm::LLVMCreateBuilderInContext(llcx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this isn't possible without refactoring the builder/codegen_cx quite a bit. It's indeed for one cgu, though it doesn't seem too bad to generate a dummy cgu (it's just a rust struct with no LLVM information in it) for now and use

It gives autodiff strictly more powers than it needs to (e.g. all the caches are useless for it), but then we at least are using the existing helpers and can then analyze what to refactor.

@nikic what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-spurious-x86_64-apple-SIGSEGV-SIGILL CI spurious failure: SIGSEGV/SIGILL while building rustc itself on x86_64-apple F-autodiff `#![feature(autodiff)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.