-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix llvm not generating proper address space loads and stores in PTX #3428
Conversation
If the special dcompute pointer types get instantiated in host code for any reason, ldc holds on to the llvm type created during the host codegen phase during the device codegen phase and won't mark the proper address space.
@@ -1,15 +1,28 @@ | |||
// Check that we can generate code for both the host and device in one compiler invocation | |||
// REQUIRES: target_NVPTX | |||
// RUN: %ldc -mdcompute-targets=cuda-350 -mdcompute-file-prefix=host_and_device -Iinputs %s %S/inputs/kernel.d | |||
// RUN: %ldc -mdcompute-targets=cuda-350 -mdcompute-file-prefix=host_and_device -Iinputs %s %S/inputs/kernel.d && FileCheck %s --check-prefix=PTX < host_and_device_cuda350_64.ptx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check-prefix is not needed. Instead of PTX:
you can use CHECK:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait, please also add checking of the host code LLVM IR. In that case it is nice indeed to check the ptx with PTX, and the LLVM IR with LLVMIR:
or something like that.
tlGlobal = 0; | ||
gGlobal = 0; | ||
string s = foo.mangleof; | ||
} | ||
|
||
// PTX: ld.global.f32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more checks here? (Also for host code)
- add test for loading from global_x and the others in host code aswell
- add CHECK-LABEL: to make sure the loads happen in the correct functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New additions look good.
I guess I'm still confused as to why the host code checking is so much more verbose than the PTX checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both cases I’m testing all the places where address space info shows up. Those same lines in the device IR have address space info so I figured this would be the most complete way to ensure no address space info gets leaked into the host code. By the time that the ptx gets generated there are far fewer places where the address space info shows up which is why it looks a little lop-sided.
If you think It seems prudent, I can add the device side IR checks as well as the ptx checks to ensure both the IR and PTX have the proper address spaces but I had (perhaps incorrectly) assumed that if the PTX had the address spaces, the IR would too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know anything about PTX, so I trust you on this one ;)
Makes sense, I’ll get on that. |
Additional checks to ensure host codegen doesn't erroneously get address space info in its IR. 2 additional PTX checks for loading from shared and local address spaces.
…ing on windows x86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I feel the test for host code generation is testing for too many things here, the test will break upon many unrelated changes. The only thing I think you should test here is that the addressspace is correctly set in both cases. (what would happen if we first generate host code, then PTX, then again host code? Or simply if we first generate PTX and then host code? That bug should be caught by this test.) |
That's fair. Is there a way to have wildcards in lit checks for the registers so I can have something like:
So i can decouple the address space check from the specific registers? Then I can have the host checks ensure their is no address space info in the IR. Or is there something else you're thinking? |
yes there is: http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-regex-matching-syntax Edit: no need to make it super fancy, human readability also matters. We use |
The host side IR tests should be less brittle and track the load/store register uses.
} | ||
|
||
void foo(GlobalPointer!float x_in) { | ||
// CHECK-LABEL: foo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this check now applies to both the PTX and the LL checks.
You can also do selectively: LL-LABEL:
and PTX-LABEL:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'm starting to understand lit a bit better now, thanks for putting up with my inexperience with this system. I've pushed a change to that.
…run on there respective files
Can you rename |
So the RecursiveWalker looks like it does 99% of what I want, but I noticed that for its FuncDeclaration visitor it doesn't recurse over the function parameters which I require. Is this an intentional omission? It seems like the cleanest option is to move my VarDeclaration visitor to a class derived from StoppableVisitor and use the RecursiveWalker as is (with the update to its FuncDeclaration visitor) but I don't want to introduce too many changes outside the scope of this PR. I've tried putting in the parameter recursion and all tests pass, but I don't want some unseen regression sneaking in. Barring that I can just inherit RecursiveWalker and overload the FuncDeclaration visitor to do what I need, but I thought I would bring this in for discussion first. |
Hmm, seems like keeping the Some care is of course necessary then, so as to not slow down the usual single-pass code generation too much. |
I was thinking along the same lines. Similar to Lines 50 to 51 in 1b52bf0
|
I apologize I've been delinquent on this issue, real life happened. So if I understand what you are saying correctly, you're proposing keeping a static list of |
Based on feedback from the ldc team, I've implemented a better way to reset the dcompute types that does not rely on walking the AST. It does seem more robust.
So I've implemented the suggestions and agree that this seems less fragile. Let me know what you think @kinke @dnadlinger @JohanEngelen. |
if(p = toDcomputePointer(sd)) { | ||
dcomputeTypes.push_back(sd->type); | ||
} | ||
|
||
if (gIR->dcomputetarget && (p = toDcomputePointer(sd))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've already computed p
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, adding it to the vector should happen here, preventing the toDcomputePointer()
call for non-dcompute compiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reply to your second comment (missed it the first time). We explicitly need to add it to the vector when dcompute is not the IR generator as normal passes set the ctype, but without the required address space data needed for dcompute.
So perhaps instead of calling toDComputePointer there and getting the pointer, it should call isFromLDC_DCompute to push to the vector, and then only get the pointer type when dcompute is the IR generator.
ir/irtype.cpp
Outdated
type->ctype = nullptr; | ||
} | ||
|
||
dcomputeTypes.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the vector should be cleared here (i.e., at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way so I don't have to check the vector to see if the Type*
is already in there when I push a new one onto the vector.
Would you recommend a checking and never clearing, or perhaps moving to a std::map and keying off the associated StructDeclaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there's a 1:0..1 relation between Type* and IrTypeStruct* (across all compiled modules), so no duplicates should end up there.
[Without having a broader overview of this (and the Ir*
hierarchy/workings in general), I'd think that a custom IrType
(not sure if it has to derive from IrTypeStruct) subclass for the dcompute pointers would be the elegant way to represent them and handle their specifics, incl. getting rid of that separate DcomputePointer
struct.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplicates will show up when we compile the same module twice; once for host, once for device using @compute(CompileFor.hostAndDevice)
.
The host pass happens first generating standard struct code for the dcompute types. This sets the ctype pointer. When the device codegen happens after the host, we will enter IrTypeStruct::get
again (due to the deleting of the host ctype), and if dcomputeTypes hasn't been cleared it will still be holding valid Type*
left over from the host codegen pass.
Now I suppose since device codegen always happens second and nothing ever happens after it, having duplicates is not an issue, but if that ever changes, the resetDComputeTypes is going to delete an already deleted ctype.
My latest commit does a find on the vector before pushing a new Type*
to it if it doesn't already exist.
Perhaps having a special Ir* type is the best way, but I too am not familiar enough with ldc to really comment. What I do know however is that those special structs should behave just as there underlying type when generating host code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for clarifying, now (I think) I understand - the full associated IrTypeStructs are deleted whenever starting to emit a new dcompute module and recreated when needed; I thought it'd recycle and just reset the cached llvm::Type*
for those IrTypeStructs/DcomputePointer/whatever.
In this case, clearing the vector makes perfect sense indeed to avoid any duplicates, sorry for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem at all. I've implemented your other change suggestions and all tests pass on my side.
Instead of checking directly against a returned DcomputePointer, now check if its a DCompute type based directly on the StructDeclaration so we don't generate the pointer type when dcompute isn't the IR generator. Never clear the vector, instead check the vector to see if the type has been previously associated. Removed unnecessary loop in target.cpp
gen/dcompute/target.cpp
Outdated
#include "gen/runtime.h" | ||
#include "ir/irtype.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get rid of the other new includes (this will be squashed anyway, so a fixup commit is fine).
ir/irtype.cpp
Outdated
@@ -18,6 +18,8 @@ | |||
#include "llvm/IR/DerivedTypes.h" | |||
#include "llvm/IR/LLVMContext.h" | |||
|
|||
std::vector<Type*> IrType::dcomputeTypes = std::vector<Type*>(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default construction is fine.
ir/irtype.cpp
Outdated
@@ -27,6 +29,15 @@ IrType::IrType(Type *dt, LLType *lt) : dtype(dt), type(lt) { | |||
assert(!dt->ctype && "already has IrType"); | |||
} | |||
|
|||
void IrType::resetDComputeTypes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a little comment here, à la Deletes all IrTypeStructs currently associated with the special dcompute pointer structs, to enforce re-creation when needed (with proper address space)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just final nits.
ir/irtype.h
Outdated
protected: | ||
/// | ||
IrType(Type *dt, llvm::Type *lt); | ||
|
||
/// | ||
static std::vector<Type*> dcomputeTypes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A std::vector<IrTypeStruct *>
would probably be clearer. (The associated Type
is IrType::dtype
).
Moved dcomputeTypes into IrTypeStruct and made it store the IrTypeStruct* instead of the Type* for clarity. Added description to resetDComputeTypes Use default constructor for dcomputeTypes vector. Removed un-used headers from target.cpp
Thanks for the feedback, I've implemented and pushed the suggested changes. |
@JohanEngelen any more thoughts? |
lgtm |
If the special dcompute pointer types get instantiated in host code
for any reason, ldc holds on to the llvm type created during the host
codegen phase during the device codegen phase and won't mark the
proper address space.