-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodiff Upstreaming - rustc_codegen_llvm changes #130060
base: master
Are you sure you want to change the base?
Conversation
r? @fee1-dead rustbot has assigned @fee1-dead. Use |
|
This PR modifies If appropriate, please update Some changes occurred in cfg and check-cfg configuration cc @Urgau |
This comment has been minimized.
This comment has been minimized.
@@ -176,6 +176,8 @@ pub(crate) fn default_configuration(sess: &Session) -> Cfg { | |||
// NOTE: These insertions should be kept in sync with | |||
// `CheckCfg::fill_well_known` below. | |||
|
|||
ins_none!(sym::autodiff_fallback); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be insta stable, it should be at least gated behind nightly compiler.
ins_none!(sym::autodiff_fallback); | |
if sess.is_nightly_build() { | |
ins_none!(sym::autodiff_fallback); | |
} |
Please also follow all the steps regarding a new cfg as defined in the top of this file (as well as the tests files):
rust/compiler/rustc_session/src/config/cfg.rs
Lines 10 to 21 in e26b02a
//! ## Adding a new cfg | |
//! | |
//! Adding a new feature requires two new symbols one for the cfg it-self | |
//! and the second one for the unstable feature gate, those are defined in | |
//! `rustc_span::symbol`. | |
//! | |
//! As well as the following points, | |
//! - Add the activation logic in [`default_configuration`] | |
//! - Add the cfg to [`CheckCfg::fill_well_known`] (and related files), | |
//! so that the compiler can know the cfg is expected | |
//! - Add the cfg in [`disallow_cfgs`] to disallow users from setting it via `--cfg` | |
//! - Add the feature gating in `compiler/rustc_feature/src/builtin_attrs.rs` |
r? compiler |
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #131237) made this pull request unmergeable. Please resolve the merge conflicts. |
r? compiler |
There's very little chance of this being merged in one PR with one commit of this size. You'll need to split this up into well-commented/motivated PRs that can be landed one at a time. I haven't spent much time looking at this PR, so I don't have any suggestions on how to split this up. I'd recommend finding someone on the compiler team who is interested in these changes and who you can work with to do the reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time to review this, so just one drive-by note: It looks like a decent part of the extra FFI APIs in enzyme_ffi.rs are essentially duplicates of things that we already have bindings for under slightly different names and signatures. Like we already have LLVMRustAddFunctionAttributes/LLVMRustAddCallSiteAttributes and this introduces LLVMRustAddEnumAttributeAtIndex. It also looks like the code doesn't make use of the Builder abstraction and instead calls FFI APIs directly everywhere, which is probably also where the duplication comes from.
a170908
to
11c3bae
Compare
11c3bae
to
78297a9
Compare
The job Click to see the possible cause of the failure (guessed by this bot)
|
Ok, I reimplemented autodiff using a different approach and dropped safety checks as well as some perf optimizations. That brought the size down from 2.5k for the previous master to 1.1k here: EnzymeAD#186 This PR here still just contains changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need another review pass but here are some initial comments
@@ -56,6 +56,10 @@ codegen_llvm_prepare_thin_lto_module_with_llvm_err = failed to prepare thin LTO | |||
codegen_llvm_run_passes = failed to run LLVM passes | |||
codegen_llvm_run_passes_with_llvm_err = failed to run LLVM passes: {$llvm_err} | |||
codegen_llvm_prepare_autodiff = failed to prepare AutoDiff: src: {$src}, target: {$target}, {$error} | |||
codegen_llvm_prepare_autodiff_with_llvm_err = failed to prepare AutoDiff: {$llvm_err}, src: {$src}, target: {$target}, {$error} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: some diagnostics say "AutoDiff" and some "autodiff", would be nice if this were consistent
@@ -604,7 +604,12 @@ pub(crate) fn run_pass_manager( | |||
debug!("running the pass manager"); | |||
let opt_stage = if thin { llvm::OptStage::ThinLTO } else { llvm::OptStage::FatLTO }; | |||
let opt_level = config.opt_level.unwrap_or(config::OptLevel::No); | |||
unsafe { write::llvm_optimize(cgcx, dcx, module, config, opt_level, opt_stage) }?; | |||
// We will run this again with different values in the context of automatic differentiation. | |||
let first_run = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this happening regardless of whether Enzyme is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only if Enzyme support is enabled when building rustc, and if the user additionally applied at least one autodiff macro.
If that is the case we check that fat-lto is enabled (in the other PR), and then we will run opt passes twice.
The fat-lto requirement is something that will be lifted in the future.
Optimizing the whole module twice is also a bit more than what we really want, so I'll make that more granular in the future, to only optimize code that is getting differentiated twice. I'll add this to the comments.
} | ||
}; | ||
let tgt_name = CString::new(item.target.clone()).unwrap(); | ||
dbg!("Target name: {:?}", &tgt_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbg!("Target name: {:?}", &tgt_name); | |
debug!("target name: {:?}", &tgt_name); |
})); | ||
} | ||
}; | ||
let tgt_name = CString::new(item.target.clone()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not sure we use the tgt
abbreviation anywhere
} | ||
} | ||
|
||
pub(crate) fn add_opt_dbg_helper2<'ll>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could do with a comment describing what it does
_ => {} | ||
} | ||
|
||
trace!("Matching args"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is the biggest nitpick, but we tend to have lowercase log messages
) -> Result<(), FatalError> { | ||
if cgcx.lto != Lto::Fat { | ||
let dcx = cgcx.create_dcx(); | ||
return Err(dcx.handle().emit_almost_fatal(AutoDiffWithoutLTO {})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Err(dcx.handle().emit_almost_fatal(AutoDiffWithoutLTO {})); | |
return Err(dcx.handle().emit_almost_fatal(AutoDiffWithoutLTO)); |
LtoModuleCodegen::Fat(module) => { | ||
B::autodiff(cgcx, &module, diff_fncs, config)?; | ||
} | ||
_ => panic!("Unreachable? Autodiff called with non-fat LTO module"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => panic!("Unreachable? Autodiff called with non-fat LTO module"), | |
_ => panic!("autodiff called with non-fat LTO module"), |
let md_todiff = llvm::LLVMMetadataAsValue(llcx, md); | ||
let _md2 = llvm::LLVMSetMetadata(call, md_ty, md_todiff); | ||
} else { | ||
trace!("No dbg info"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trace!("No dbg info"); | |
trace!("no dbg info"); |
// ret double %0 | ||
// } | ||
|
||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything in this block needs more comments explaining what it is doing
Now that the autodiff/Enzyme backend is merged, this is an upstream PR for the
rustc_codegen_llvm
changes.It also includes small changes to three files under
compiler/rustc_ast
, which overlap with my frontend PR (#129458).Here I only include minimal definitions of structs and enums to be able to build this backend code.
The same goes for minimal changes to
compiler/rustc_codegen_ssa
, the majority of changes there will be in another PR, once either this or the frontend gets merged.We currently have 68 files left to merge, 19 in the frontend PR, 21 (+3 from the frontend) in this PR, and then ~30 in the middle-end.
This PR is large because it includes two of my three large files (~800 loc each). I could also first only upstream enzyme_ffi.rs, but I think people might want to see some use of these bindings in the same PR?
To already highlight the things which reviewers might want to discuss:
enzyme_ffi.rs
: I do have a fallback module to make sure that we don't link rustc against Enzyme when we build rustc without autodiff support.add_panic_msg_to_global
was a pain to write and I currently can't even use it. Enzyme writes gradients into shadow memory. Pass in one float scalar? We'll allocate and return an extra float telling you how this float affected the output. Pass in a slice of floats? We'll let you allocate the vector and pass in a mutable reference to a float slice, we'll then write the gradient into that slice. It should be at least as large as your original slice, so we check that and panic if not. Currently we panic silently, but I already generate a nicer panic message with this function. I just don't know how to print it to the user. yet. I discussed this with a few rustc devs and the best we could come up with (for now), was to look for mangled panic calls in the IR and pick one, which works surprisingly reliably. If someone knows a good way to clean this up and print the panic message I'm all in, otherwise I can remove the code that writes the nicer panic message and keep the silent panic, since it's enough for soundness. Especially since this PR is already a bit larger.SanitizeHWAddress
: When differentiating C++, Enzyme can use TBAA to "understand" enums/unions, but for Rust we don't have this information. LLVM might to speculative loads which (without TBAA) confuse Enzyme, so we disable those with this attribute. This attribute is only set during the first opt run before Enzyme differentiates code. We then remove it again once we are done with autodiff and run the opt pipeline a second time. Since enums are everywhere in Rust, support for them is crucial, but if this looks too cursed I can remove these ~100 lines and keep them in my fork for now, we can then discuss them separately to make this PR simpler?Duplicated llvm-opt runs: Differentiating already optimized code (and being able to do additional optimizations on the fly, e.g. for GPU code) is the reason why Enzyme is so fast, so the compile time is acceptable for autodiff users: https://enzyme.mit.edu/talks/Publications/ (There are also algorithmic issues in Enzyme core which are more serious than running opt twice).
I assume that if we merge these minimal cg_ssa changes here already, I also need to fix the other backends (GCC and cliff) to have dummy implementations, correct?
I'm happy to split this PR up further if reviewers have recommendations on how to.
For the full implementation, see: #129175
Tracking: