Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix llvm not generating proper address space loads and stores in PTX #3428

Merged
merged 11 commits into from
Jul 3, 2020

Conversation

Rob-Rau
Copy link
Contributor

@Rob-Rau Rob-Rau commented May 9, 2020

If the special dcompute pointer types get instantiated in host code
for any reason, ldc holds on to the llvm type created during the host
codegen phase during the device codegen phase and won't mark the
proper address space.

Rob-Rau added 2 commits May 8, 2020 21:39
If the special dcompute pointer types get instantiated in host code
for any reason, ldc holds on to the llvm type created during the host
codegen phase during the device codegen phase and won't mark the
proper address space.
@@ -1,15 +1,28 @@
// Check that we can generate code for both the host and device in one compiler invocation
// REQUIRES: target_NVPTX
// RUN: %ldc -mdcompute-targets=cuda-350 -mdcompute-file-prefix=host_and_device -Iinputs %s %S/inputs/kernel.d
// RUN: %ldc -mdcompute-targets=cuda-350 -mdcompute-file-prefix=host_and_device -Iinputs %s %S/inputs/kernel.d && FileCheck %s --check-prefix=PTX < host_and_device_cuda350_64.ptx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check-prefix is not needed. Instead of PTX: you can use CHECK:.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait, please also add checking of the host code LLVM IR. In that case it is nice indeed to check the ptx with PTX, and the LLVM IR with LLVMIR: or something like that.

tlGlobal = 0;
gGlobal = 0;
string s = foo.mangleof;
}

// PTX: ld.global.f32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more checks here? (Also for host code)

  • add test for loading from global_x and the others in host code aswell
  • add CHECK-LABEL: to make sure the loads happen in the correct functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New additions look good.
I guess I'm still confused as to why the host code checking is so much more verbose than the PTX checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both cases I’m testing all the places where address space info shows up. Those same lines in the device IR have address space info so I figured this would be the most complete way to ensure no address space info gets leaked into the host code. By the time that the ptx gets generated there are far fewer places where the address space info shows up which is why it looks a little lop-sided.

If you think It seems prudent, I can add the device side IR checks as well as the ptx checks to ensure both the IR and PTX have the proper address spaces but I had (perhaps incorrectly) assumed that if the PTX had the address spaces, the IR would too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about PTX, so I trust you on this one ;)

@Rob-Rau
Copy link
Contributor Author

Rob-Rau commented May 9, 2020

Makes sense, I’ll get on that.

Rob-Rau added 2 commits May 9, 2020 18:04
Additional checks to ensure host codegen doesn't
erroneously get address space info in its IR.

2 additional PTX checks for loading from shared and
local address spaces.
Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@JohanEngelen
Copy link
Member

I feel the test for host code generation is testing for too many things here, the test will break upon many unrelated changes. The only thing I think you should test here is that the addressspace is correctly set in both cases. (what would happen if we first generate host code, then PTX, then again host code? Or simply if we first generate PTX and then host code? That bug should be caught by this test.)

@Rob-Rau
Copy link
Contributor Author

Rob-Rau commented May 11, 2020

That's fair. Is there a way to have wildcards in lit checks for the registers so I can have something like:

// LL: store float <reg_wildcard>, float* <reg_wildcard>

So i can decouple the address space check from the specific registers? Then I can have the host checks ensure their is no address space info in the IR.

Or is there something else you're thinking?

@JohanEngelen
Copy link
Member

JohanEngelen commented May 11, 2020

That's fair. Is there a way to have wildcards in lit checks for the registers so I can have something like:

yes there is: http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-regex-matching-syntax

Edit: no need to make it super fancy, human readability also matters. We use {{.*}} a lot, maybe for a register you can use {{%[0-9]*}}. Also note the [[ ]] functionality. FileCheck is pretty nice ;-)

The host side IR tests should be less brittle and track the load/store
register uses.
}

void foo(GlobalPointer!float x_in) {
// CHECK-LABEL: foo:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this check now applies to both the PTX and the LL checks.
You can also do selectively: LL-LABEL: and PTX-LABEL:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'm starting to understand lit a bit better now, thanks for putting up with my inexperience with this system. I've pushed a change to that.

@JohanEngelen
Copy link
Member

Can you rename Declaration_reset to something more descriptive and capitalization in-line with our codebase? Can you make sure that your Visitor runs over all types, even they are deeply nested? Maybe better to subclass from one of the recursivevisitors? or use a recursivewalkers?
LGTM otherwise.

@Rob-Rau
Copy link
Contributor Author

Rob-Rau commented May 13, 2020

So the RecursiveWalker looks like it does 99% of what I want, but I noticed that for its FuncDeclaration visitor it doesn't recurse over the function parameters which I require. Is this an intentional omission?

It seems like the cleanest option is to move my VarDeclaration visitor to a class derived from StoppableVisitor and use the RecursiveWalker as is (with the update to its FuncDeclaration visitor) but I don't want to introduce too many changes outside the scope of this PR. I've tried putting in the parameter recursion and all tests pass, but I don't want some unseen regression sneaking in.

Barring that I can just inherit RecursiveWalker and overload the FuncDeclaration visitor to do what I need, but I thought I would bring this in for discussion first.

@dnadlinger
Copy link
Member

Hmm, seems like keeping the ctype mapping in a hash table instead (which can then be trivially reset), or keeping a list of all allocated IrType instances to reset all of them might be less fragile than walking the AST.

Some care is of course necessary then, so as to not slow down the usual single-pass code generation too much.

@kinke
Copy link
Member

kinke commented May 14, 2020

I was thinking along the same lines. Similar to

ldc/ir/irdsymbol.h

Lines 50 to 51 in 1b52bf0

static std::vector<IrDsymbol *> list;
static void resetAll();
but for the special dcompute pointer structs only (so that non-dcompute code isn't affected at all).

@Rob-Rau
Copy link
Contributor Author

Rob-Rau commented Jun 20, 2020

I apologize I've been delinquent on this issue, real life happened.

So if I understand what you are saying correctly, you're proposing keeping a static list of Type's around and then just clearing the ctype pointer? Shouldn't be too hard to implement.

Rob-Rau added 2 commits June 27, 2020 12:52
Based on feedback from the ldc team, I've implemented a
better way to reset the dcompute types that does not
rely on walking the AST. It does seem more robust.
@Rob-Rau
Copy link
Contributor Author

Rob-Rau commented Jun 27, 2020

So I've implemented the suggestions and agree that this seems less fragile. Let me know what you think @kinke @dnadlinger @JohanEngelen.

if(p = toDcomputePointer(sd)) {
dcomputeTypes.push_back(sd->type);
}

if (gIR->dcomputetarget && (p = toDcomputePointer(sd))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've already computed p here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, adding it to the vector should happen here, preventing the toDcomputePointer() call for non-dcompute compiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reply to your second comment (missed it the first time). We explicitly need to add it to the vector when dcompute is not the IR generator as normal passes set the ctype, but without the required address space data needed for dcompute.

So perhaps instead of calling toDComputePointer there and getting the pointer, it should call isFromLDC_DCompute to push to the vector, and then only get the pointer type when dcompute is the IR generator.

ir/irtype.cpp Outdated
type->ctype = nullptr;
}

dcomputeTypes.clear();
Copy link
Member

@kinke kinke Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the vector should be cleared here (i.e., at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way so I don't have to check the vector to see if the Type* is already in there when I push a new one onto the vector.

Would you recommend a checking and never clearing, or perhaps moving to a std::map and keying off the associated StructDeclaration?

Copy link
Member

@kinke kinke Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's a 1:0..1 relation between Type* and IrTypeStruct* (across all compiled modules), so no duplicates should end up there.

[Without having a broader overview of this (and the Ir* hierarchy/workings in general), I'd think that a custom IrType (not sure if it has to derive from IrTypeStruct) subclass for the dcompute pointers would be the elegant way to represent them and handle their specifics, incl. getting rid of that separate DcomputePointer struct.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicates will show up when we compile the same module twice; once for host, once for device using @compute(CompileFor.hostAndDevice).

The host pass happens first generating standard struct code for the dcompute types. This sets the ctype pointer. When the device codegen happens after the host, we will enter IrTypeStruct::get again (due to the deleting of the host ctype), and if dcomputeTypes hasn't been cleared it will still be holding valid Type* left over from the host codegen pass.

Now I suppose since device codegen always happens second and nothing ever happens after it, having duplicates is not an issue, but if that ever changes, the resetDComputeTypes is going to delete an already deleted ctype.

My latest commit does a find on the vector before pushing a new Type* to it if it doesn't already exist.

Perhaps having a special Ir* type is the best way, but I too am not familiar enough with ldc to really comment. What I do know however is that those special structs should behave just as there underlying type when generating host code.

Copy link
Member

@kinke kinke Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for clarifying, now (I think) I understand - the full associated IrTypeStructs are deleted whenever starting to emit a new dcompute module and recreated when needed; I thought it'd recycle and just reset the cached llvm::Type* for those IrTypeStructs/DcomputePointer/whatever.

In this case, clearing the vector makes perfect sense indeed to avoid any duplicates, sorry for the noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all. I've implemented your other change suggestions and all tests pass on my side.

Rob-Rau added 2 commits June 27, 2020 14:43
Instead of checking directly against a returned
DcomputePointer, now check if its a DCompute type
based directly on the StructDeclaration so we don't
generate the pointer type when dcompute isn't the IR
generator.

Never clear the vector, instead check the vector to
see if the type has been previously associated.

Removed unnecessary loop in target.cpp
#include "gen/runtime.h"
#include "ir/irtype.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get rid of the other new includes (this will be squashed anyway, so a fixup commit is fine).

ir/irtype.cpp Outdated
@@ -18,6 +18,8 @@
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/LLVMContext.h"

std::vector<Type*> IrType::dcomputeTypes = std::vector<Type*>(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default construction is fine.

ir/irtype.cpp Outdated
@@ -27,6 +29,15 @@ IrType::IrType(Type *dt, LLType *lt) : dtype(dt), type(lt) {
assert(!dt->ctype && "already has IrType");
}

void IrType::resetDComputeTypes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a little comment here, à la Deletes all IrTypeStructs currently associated with the special dcompute pointer structs, to enforce re-creation when needed (with proper address space).

Copy link
Member

@kinke kinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just final nits.

ir/irtype.h Outdated
protected:
///
IrType(Type *dt, llvm::Type *lt);

///
static std::vector<Type*> dcomputeTypes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A std::vector<IrTypeStruct *> would probably be clearer. (The associated Type is IrType::dtype).

Moved dcomputeTypes into IrTypeStruct and made
it store the IrTypeStruct* instead of the Type*
for clarity.

Added description to resetDComputeTypes

Use default constructor for dcomputeTypes vector.

Removed un-used headers from target.cpp
@Rob-Rau
Copy link
Contributor Author

Rob-Rau commented Jun 29, 2020

Thanks for the feedback, I've implemented and pushed the suggested changes.

@Rob-Rau
Copy link
Contributor Author

Rob-Rau commented Jul 1, 2020

@JohanEngelen any more thoughts?

@JohanEngelen
Copy link
Member

lgtm

@kinke kinke merged commit 9782862 into ldc-developers:master Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants