Fix llvm not generating proper address space loads and stores in PTX #3428

Rob-Rau · 2020-05-09T01:48:55Z

If the special dcompute pointer types get instantiated in host code
for any reason, ldc holds on to the llvm type created during the host
codegen phase during the device codegen phase and won't mark the
proper address space.

If the special dcompute pointer types get instantiated in host code for any reason, ldc holds on to the llvm type created during the host codegen phase during the device codegen phase and won't mark the proper address space.

JohanEngelen · 2020-05-09T11:29:26Z

tests/codegen/dcompute_host_and_device.d

@@ -1,15 +1,28 @@
 // Check that we can generate code for both the host and device in one compiler invocation
 // REQUIRES: target_NVPTX
-// RUN: %ldc -mdcompute-targets=cuda-350 -mdcompute-file-prefix=host_and_device -Iinputs %s %S/inputs/kernel.d
+// RUN: %ldc -mdcompute-targets=cuda-350 -mdcompute-file-prefix=host_and_device -Iinputs %s %S/inputs/kernel.d && FileCheck %s --check-prefix=PTX < host_and_device_cuda350_64.ptx


The check-prefix is not needed. Instead of PTX: you can use CHECK:.

Ah wait, please also add checking of the host code LLVM IR. In that case it is nice indeed to check the ptx with PTX, and the LLVM IR with LLVMIR: or something like that.

JohanEngelen · 2020-05-09T11:30:57Z

tests/codegen/dcompute_host_and_device.d

    tlGlobal = 0;
    gGlobal  = 0;
    string s = foo.mangleof;
 }
+
+// PTX: ld.global.f32


Can you add more checks here? (Also for host code)

add test for loading from global_x and the others in host code aswell

add CHECK-LABEL: to make sure the loads happen in the correct functions.

New additions look good.
I guess I'm still confused as to why the host code checking is so much more verbose than the PTX checking.

In both cases I’m testing all the places where address space info shows up. Those same lines in the device IR have address space info so I figured this would be the most complete way to ensure no address space info gets leaked into the host code. By the time that the ptx gets generated there are far fewer places where the address space info shows up which is why it looks a little lop-sided.

If you think It seems prudent, I can add the device side IR checks as well as the ptx checks to ensure both the IR and PTX have the proper address spaces but I had (perhaps incorrectly) assumed that if the PTX had the address spaces, the IR would too.

I don't know anything about PTX, so I trust you on this one ;)

Rob-Rau · 2020-05-09T15:27:54Z

Makes sense, I’ll get on that.

Additional checks to ensure host codegen doesn't erroneously get address space info in its IR. 2 additional PTX checks for loading from shared and local address spaces.

…ing on windows x86

thewilsonator

LGTM.

JohanEngelen · 2020-05-11T17:28:55Z

I feel the test for host code generation is testing for too many things here, the test will break upon many unrelated changes. The only thing I think you should test here is that the addressspace is correctly set in both cases. (what would happen if we first generate host code, then PTX, then again host code? Or simply if we first generate PTX and then host code? That bug should be caught by this test.)

Rob-Rau · 2020-05-11T18:19:25Z

That's fair. Is there a way to have wildcards in lit checks for the registers so I can have something like:

// LL: store float <reg_wildcard>, float* <reg_wildcard>

So i can decouple the address space check from the specific registers? Then I can have the host checks ensure their is no address space info in the IR.

Or is there something else you're thinking?

JohanEngelen · 2020-05-11T18:43:42Z

That's fair. Is there a way to have wildcards in lit checks for the registers so I can have something like:

yes there is: http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-regex-matching-syntax

Edit: no need to make it super fancy, human readability also matters. We use {{.*}} a lot, maybe for a register you can use {{%[0-9]*}}. Also note the [[ ]] functionality. FileCheck is pretty nice ;-)

The host side IR tests should be less brittle and track the load/store register uses.

JohanEngelen · 2020-05-12T16:55:21Z

tests/codegen/dcompute_host_and_device.d

+}
+
+void foo(GlobalPointer!float x_in) {
+    // CHECK-LABEL: foo:


Note that this check now applies to both the PTX and the LL checks.
You can also do selectively: LL-LABEL: and PTX-LABEL:.

I see, I'm starting to understand lit a bit better now, thanks for putting up with my inexperience with this system. I've pushed a change to that.

…run on there respective files

JohanEngelen · 2020-05-12T22:40:44Z

Can you rename Declaration_reset to something more descriptive and capitalization in-line with our codebase? Can you make sure that your Visitor runs over all types, even they are deeply nested? Maybe better to subclass from one of the recursivevisitors? or use a recursivewalkers?
LGTM otherwise.

Rob-Rau · 2020-05-13T23:03:59Z

So the RecursiveWalker looks like it does 99% of what I want, but I noticed that for its FuncDeclaration visitor it doesn't recurse over the function parameters which I require. Is this an intentional omission?

It seems like the cleanest option is to move my VarDeclaration visitor to a class derived from StoppableVisitor and use the RecursiveWalker as is (with the update to its FuncDeclaration visitor) but I don't want to introduce too many changes outside the scope of this PR. I've tried putting in the parameter recursion and all tests pass, but I don't want some unseen regression sneaking in.

Barring that I can just inherit RecursiveWalker and overload the FuncDeclaration visitor to do what I need, but I thought I would bring this in for discussion first.

dnadlinger · 2020-05-14T10:52:15Z

Hmm, seems like keeping the ctype mapping in a hash table instead (which can then be trivially reset), or keeping a list of all allocated IrType instances to reset all of them might be less fragile than walking the AST.

Some care is of course necessary then, so as to not slow down the usual single-pass code generation too much.

kinke · 2020-05-14T11:17:00Z

I was thinking along the same lines. Similar to

ldc/ir/irdsymbol.h

Lines 50 to 51 in 1b52bf0

    
           static std::vector<IrDsymbol *> list; 
        
           static void resetAll();

but for the special dcompute pointer structs only (so that non-dcompute code isn't affected at all).

Rob-Rau · 2020-06-20T21:02:36Z

I apologize I've been delinquent on this issue, real life happened.

So if I understand what you are saying correctly, you're proposing keeping a static list of Type's around and then just clearing the ctype pointer? Shouldn't be too hard to implement.

Based on feedback from the ldc team, I've implemented a better way to reset the dcompute types that does not rely on walking the AST. It does seem more robust.

Rob-Rau · 2020-06-27T17:27:30Z

So I've implemented the suggestions and agree that this seems less fragile. Let me know what you think @kinke @dnadlinger @JohanEngelen.

gen/dcompute/target.cpp

kinke · 2020-06-27T17:37:12Z

ir/irtypestruct.cpp

+  if(p = toDcomputePointer(sd)) {
+    dcomputeTypes.push_back(sd->type);
+  }
+
  if (gIR->dcomputetarget && (p = toDcomputePointer(sd))) {


You've already computed p here.

Actually, adding it to the vector should happen here, preventing the toDcomputePointer() call for non-dcompute compiles.

Yup will fix.

In reply to your second comment (missed it the first time). We explicitly need to add it to the vector when dcompute is not the IR generator as normal passes set the ctype, but without the required address space data needed for dcompute.

So perhaps instead of calling toDComputePointer there and getting the pointer, it should call isFromLDC_DCompute to push to the vector, and then only get the pointer type when dcompute is the IR generator.

kinke · 2020-06-27T17:40:01Z

ir/irtype.cpp

+    type->ctype = nullptr;
+  }
+
+  dcomputeTypes.clear();


I don't think the vector should be cleared here (i.e., at all).

I did it this way so I don't have to check the vector to see if the Type* is already in there when I push a new one onto the vector.

Would you recommend a checking and never clearing, or perhaps moving to a std::map and keying off the associated StructDeclaration?

I assume there's a 1:0..1 relation between Type* and IrTypeStruct* (across all compiled modules), so no duplicates should end up there.

[Without having a broader overview of this (and the Ir* hierarchy/workings in general), I'd think that a custom IrType (not sure if it has to derive from IrTypeStruct) subclass for the dcompute pointers would be the elegant way to represent them and handle their specifics, incl. getting rid of that separate DcomputePointer struct.]

The duplicates will show up when we compile the same module twice; once for host, once for device using @compute(CompileFor.hostAndDevice).

The host pass happens first generating standard struct code for the dcompute types. This sets the ctype pointer. When the device codegen happens after the host, we will enter IrTypeStruct::get again (due to the deleting of the host ctype), and if dcomputeTypes hasn't been cleared it will still be holding valid Type* left over from the host codegen pass.

Now I suppose since device codegen always happens second and nothing ever happens after it, having duplicates is not an issue, but if that ever changes, the resetDComputeTypes is going to delete an already deleted ctype.

My latest commit does a find on the vector before pushing a new Type* to it if it doesn't already exist.

Perhaps having a special Ir* type is the best way, but I too am not familiar enough with ldc to really comment. What I do know however is that those special structs should behave just as there underlying type when generating host code.

Thx for clarifying, now (I think) I understand - the full associated IrTypeStructs are deleted whenever starting to emit a new dcompute module and recreated when needed; I thought it'd recycle and just reset the cached llvm::Type* for those IrTypeStructs/DcomputePointer/whatever.

In this case, clearing the vector makes perfect sense indeed to avoid any duplicates, sorry for the noise.

No problem at all. I've implemented your other change suggestions and all tests pass on my side.

Instead of checking directly against a returned DcomputePointer, now check if its a DCompute type based directly on the StructDeclaration so we don't generate the pointer type when dcompute isn't the IR generator. Never clear the vector, instead check the vector to see if the type has been previously associated. Removed unnecessary loop in target.cpp

kinke · 2020-06-29T19:09:43Z

gen/dcompute/target.cpp

 #include "gen/runtime.h"
+#include "ir/irtype.h"


Please get rid of the other new includes (this will be squashed anyway, so a fixup commit is fine).

kinke · 2020-06-29T19:10:22Z

ir/irtype.cpp

@@ -18,6 +18,8 @@
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/LLVMContext.h"

+std::vector<Type*> IrType::dcomputeTypes = std::vector<Type*>(0);


Default construction is fine.

kinke · 2020-06-29T19:13:24Z

ir/irtype.cpp

@@ -27,6 +29,15 @@ IrType::IrType(Type *dt, LLType *lt) : dtype(dt), type(lt) {
  assert(!dt->ctype && "already has IrType");
 }

+void IrType::resetDComputeTypes() {


Please add a little comment here, à la Deletes all IrTypeStructs currently associated with the special dcompute pointer structs, to enforce re-creation when needed (with proper address space).

kinke

LGTM, just final nits.

kinke · 2020-06-29T19:27:12Z

ir/irtype.h

 protected:
  ///
  IrType(Type *dt, llvm::Type *lt);

+  ///
+  static std::vector<Type*> dcomputeTypes;


A std::vector<IrTypeStruct *> would probably be clearer. (The associated Type is IrType::dtype).

Moved dcomputeTypes into IrTypeStruct and made it store the IrTypeStruct* instead of the Type* for clarity. Added description to resetDComputeTypes Use default constructor for dcomputeTypes vector. Removed un-used headers from target.cpp

Rob-Rau · 2020-06-29T21:41:31Z

Thanks for the feedback, I've implemented and pushed the suggested changes.

Rob-Rau · 2020-07-01T15:47:54Z

@JohanEngelen any more thoughts?

JohanEngelen · 2020-07-03T17:39:01Z

lgtm

Rob-Rau added 2 commits May 8, 2020 21:39

Need Statment handler in visitor

c35617e

JohanEngelen reviewed May 9, 2020

View reviewed changes

Rob-Rau added 2 commits May 9, 2020 18:04

Additional tests for host code generation

32bf860

Additional checks to ensure host codegen doesn't erroneously get address space info in its IR. 2 additional PTX checks for loading from shared and local address spaces.

Added '-c' to dcompute_host_and_device test so it will build w/o link…

e820b63

…ing on windows x86

thewilsonator approved these changes May 11, 2020

View reviewed changes

Updated host code dcompute_host_and_device host tests

f663d67

The host side IR tests should be less brittle and track the load/store register uses.

JohanEngelen reviewed May 12, 2020

View reviewed changes

Changed CHECK-LABELs to LL-LABEL and PTX-LABEL so the tests are only …

9bb4cd2

…run on there respective files

Rob-Rau added 2 commits June 27, 2020 12:52

Merge remote-tracking branch 'upstream/master'

d3edc2a

New method to reset dcompute types based on feedback

5e209b2

Based on feedback from the ldc team, I've implemented a better way to reset the dcompute types that does not rely on walking the AST. It does seem more robust.

kinke reviewed Jun 27, 2020

View reviewed changes

gen/dcompute/target.cpp Show resolved Hide resolved

kinke reviewed Jun 27, 2020

View reviewed changes

Rob-Rau added 2 commits June 27, 2020 14:43

Reverted removal of vector clear, removed find call

eceecf2

kinke reviewed Jun 29, 2020

View reviewed changes

kinke approved these changes Jun 29, 2020

View reviewed changes

kinke reviewed Jun 29, 2020

View reviewed changes

Cleanup/fixes from review

b2000d7

Moved dcomputeTypes into IrTypeStruct and made it store the IrTypeStruct* instead of the Type* for clarity. Added description to resetDComputeTypes Use default constructor for dcomputeTypes vector. Removed un-used headers from target.cpp

kinke merged commit 9782862 into ldc-developers:master Jul 3, 2020

Fix llvm not generating proper address space loads and stores in PTX #3428

Fix llvm not generating proper address space loads and stores in PTX #3428

Conversation

Rob-Rau commented May 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rob-Rau commented May 9, 2020

thewilsonator left a comment

Choose a reason for hiding this comment

JohanEngelen commented May 11, 2020

Rob-Rau commented May 11, 2020

JohanEngelen commented May 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohanEngelen commented May 12, 2020

Rob-Rau commented May 13, 2020

dnadlinger commented May 14, 2020

kinke commented May 14, 2020

Rob-Rau commented Jun 20, 2020 • edited Loading

Rob-Rau commented Jun 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinke Jun 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinke Jun 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinke Jun 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rob-Rau commented Jun 29, 2020

Rob-Rau commented Jul 1, 2020

JohanEngelen commented Jul 3, 2020

JohanEngelen commented May 11, 2020 •

edited

Loading

Rob-Rau commented Jun 20, 2020 •

edited

Loading

kinke Jun 27, 2020 •

edited

Loading

kinke Jun 27, 2020 •

edited

Loading

kinke Jun 27, 2020 •

edited

Loading