-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/RFC: Split GC lowering pass into a platform-agnostic pass and a CPU-specific pass #31135
Conversation
What's the reasoning behind two passes as opposed to just doing codegen based on the target triple, or potentially making the pass a parent class with the codegen parts behind a virtual method? Seems like a lot of the complexity here stems from trying to preserve information across the pass boundary. |
Being able to provide an implementation for the second pass in CUDAnative, written in Julia, using the LLVM C API. Kind-of what we do now, providing our own implementation for all of I was also imagining this "final" pass taking care of possible other platform-dependent intrinsics, either as the result of other LLVM-level lowering passes or maybe even direct |
I get the motivation. On the other hand, this pass is very tightly coupled to the representation of the GC intrinsics, so I'm not really sure much is lost by just branching on the target architecture in the code generator and putting those details there. That seems far simpler than trying to encode the entire state data structure into intrinsics in the IR. |
@Keno thanks for your feedback! :) I'll try and address your concerns.
EDIT: I forgot about an intrinsic ( |
8a11203
to
8153b08
Compare
8153b08
to
7ef54d7
Compare
@Keno Hello. It's been a while since I opened this PR and it's kind of been dormant since then. In the meantime, I've successfully implemented a garbage collector for CUDAnative on top of the changes I proposed here. Garbage collection underpins a wide range of Julia features, so I'd like to upstream my GC into CUDAnative's master branch. However, I'm fairly sure that's just not going to happen until the intrinsics I'm proposing here—or some alternative but equivalent mechanism—are included in Julia itself. I understand if you're not very excited about this admittedly rather large PR, but I would really like to salvage its core ideas. Code-wise, it consists of two main additions:
I'm not sure how to proceed here. Do you see an avenue for me to upstream some mechanism to support my GPU GC along with a " I mostly just want a roadmap I can stick to for adding support for my CUDAnative GC to Julia. I'd be happy to rework this PR into one or more PRs that are more to your liking. But for that I need you to tell me what those PRs should look like. Also, I'd like to point out that my thesis is due by the end of May. I'm very motivated to work on this PR at the moment, but I also have a lot of other things I need to take care of before that end of May deadline. So I can't really afford to experiment with a variety of different approaches before settling on some "optimal" solution. Do you think that—given these constraints—we can make that deadline? Thanks! |
CI failure is #30365 |
char FinalLowerGC::ID = 0; | ||
static RegisterPass<FinalLowerGC> X("FinalLowerGC", "Final GC intrinsic lowering pass", false, false); | ||
|
||
Pass *createFinalLowerGCPass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a C-API function like
Line 103 in 5741c7f
extern "C" JL_DLLEXPORT void LLVMExtraAddCombineMulAddPass(LLVMPassManagerRef PM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I added one.
@Keno so far we have been trying to separate the different backend implementations from the main Julia implementation and I think that strategy has served us reasonably well. It allowed the GPU backend to experiment and iterate quicker than Julia release cycles and branching on the target would strongly couple the GPU backend to versions of Julia (in addition to as Jonathan says there are several strategies that backends might want to experiment with.) |
9f7a683
to
d13ae6d
Compare
@vchuravy Thanks for chiming in! I rebased the PR again and added the C-API function you asked for. |
If it's useful to the GPU backend I'm ok with this. I'll try to find the time to read through the details this week. |
Codecov Report
@@ Coverage Diff @@
## master #31135 +/- ##
=======================================
Coverage 81.21% 81.21%
=======================================
Files 372 372
Lines 62957 62957
=======================================
Hits 51131 51131
Misses 11826 11826 Continue to review full report at Codecov.
|
Awesome, thanks! |
Jonathan, it would be great if you can squash some of these commits (the goal would be to have self contained changes that pass tests) |
Sure thing Valentin. I'll see what I can do tomorrow. Thanks for helping me navigate this! |
Factor a loop out of LateLowerGCFrame::runOnFunction and into GCLoweringRefs Move TBAA NDNode init from LateLowerGCFrame to GCLoweringRefs Rename 'llvm-late-gc-helpers.{cpp,h}' to 'llvm-pass-helpers.{cpp,h}'
Fix a typo in the final GC lowering pass Split up 'julia.push_new_gc_frame' into two intrinsics Steal name from existing GC frame when lowering 'julia.new_gc_frame' Move GC frame pop lowering to final GC lowering pass Change how GC helper intrinsics are accessed Remove fields from GCLoweringRefs Capture module pointer in GCLoweringRefs Make the 'julia.new_gc_frame' intrinsic noalias Move GC frame access lowering to final GC lowering pass Update some outdated comments Add some sanity checks to final GC lowering pass Include 'llvm-version.h' in final GC lowering pass Define tests that test GC frame lowering passes in isolation Use 'i32' instead of 'size_t' for 'julia.{new,push}_gc_frame' size args Try to fix GC lowering tests These tests work fine on most machines, but they break on x86 Travis CI builds. The breakage is almost certainly due to a mismatch between integer types in the tests and 'size_t', exposed by recent changes to the GC lowering pass. Define 'LLVMExtraAddFinalLowerGCPass'
d13ae6d
to
ec5eac9
Compare
I rebased my branch to move bug-fix commits adjacent to buggy commits (bugs were usually related to ABIs different from my own machine's), then squashed those commits. I also squashed related commits to reduce the amount of noise. My chronomancy seems to have confused GitHub, though: the commits in this PR are laid out in a more or less random order now. Sorry about that. Didn't see it coming. At least Here's
|
Delay GC allocation lowering into the final GC lowering pass Fix a broken LLVM pass test Define GC alloc lowering pass tests Cast 'julia.gc_alloc_obj' argument Add missing annotation to test Relax final GC lowering pass test
Delay 'jl_gc_queue_root' call generation until final GC lowering pass Fix two broken assertions
Use helpers to grab well-known functions
Use getOrDeclare to get 'julia.gc_alloc_obj' intrinsic
Insert a newline
ec5eac9
to
ef31437
Compare
@Keno good to merge? |
@Keno Any chance we can get this merged in the near future? |
I reviewed this a while back and thought it was broadly fine, so I went ahead and merged this. There were a couple minor things, but we can do those as part of a larger look at how this all works, that I don't think we need to do right this second. |
Awesome! Thanks! |
Might have been good to fix the commit message for that merge, so that it didn't still say work-in-progress, but seems good to have this now! (with the proviso that we may break it whenever we find that we need to—but which is unlikely to be frequent) |
Hmmm. That's interesting. Your first error (" However, your second error (" |
Hello! This is my first PR here. I'm an MSc student working on making CUDAnative interact more organically with the Julia compiler. @maleadt is my supervisor. (Hi, Tim)
Some background: I'm proposing the changes in this PR in order to make GPU codegen more efficient for constructs that require GC support. The
LateLowerGCFrame
pass performs desirable platform-agnostic optimizations and we'd like to enable it in the CUDAnative pass pipeline. Sadly, we actually can't do that at the moment becauseLateLowerGCFrame
also lowers GC intrinsics to LLVM IR that can only be expected to work on a CPU. See 1 for an issue where this is discussed in a little more depth.This PR retains only the platform-agnostic aspects of
LateLowerGCFrame
and moves the CPU-specific parts into a new pass calledFinalLowerGC
. This doesn't change any of the resulting codegen for CPU platforms, but it should makeLateLowerGCFrame
eligible for reuse by CUDAnative.I also added a helper file (
llvm-pass-helpers.{cpp,h}
) that contains functionality used by both theLateLowerGCFrame
andFinalLowerGC
passes—creating some helper functions and types seemed like a much more appropriate solution than duplicating code shared byLateLowerGCFrame
andFinalLowerGC
.The helper functions/types mostly deal with scanning an
llvm::Module&
for Julia types, intrinsics and well-known functions. I did my best to design the helpers in such a way that they'll also be useful to other passes. For example, the initialization logic inAllocOpt
could be simplified by using the helpers.Anyway, I'd love to get some feedback on this PR and I'm happy to make changes accordingly. Thanks!