Dcomputextra #3683

bcarneal · 2021-03-12T21:00:02Z

Updating valid_cuda_versions gives access to additional builtins.

The programmer is responsible for adapting to the evolving CUDA semantics but this should not be a breaking change.

thewilsonator · 2021-03-13T01:23:19Z

Thanks, Waiting on #3684 for test suite to pass. You might have to force push once that goes through to restart the CI

JohanEngelen · 2021-03-13T13:02:54Z

Are the valid cuda versions something we can get from LLVM?

utils/gen_gccbuiltins.cpp

kinke · 2021-03-13T14:42:03Z

driver/dcomputecodegenerator.cpp

- 500, 520, 600, 610, 620
-  const std::array<int, 14> valid_cuda_versions = {{CUDA_VALID_VER_INIT}};
+ 500, 520, 600, 610, 620, 700, 720, 750, 800
+  const std::array<int, 18> valid_cuda_versions = {{CUDA_VALID_VER_INIT}};


As Johan said, it'd be great to get rid of hardcoded stuff like this, to prevent history repeating itself in a few years. There are other missing versions too (530, 320); and with LLVM 11, it looks like no versions older than 200 are supported anymore. - FWIW, in case we cannot derive them from LLVM, I'm in favor of making it the dev's responsibility to provide a valid version rather than pretending to have an up-to-date hardcoded list in the compiler.

bcarneal · 2021-03-13T16:11:19Z

I'll go spelunking in clang. They may have a principled way to deal with their analogous "sm_xx" capability.

…

On Sat, Mar 13, 2021 at 6:42 AM Martin Kinkelin ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In driver/dcomputecodegenerator.cpp <#3683 (comment)>: > @@ -43,8 +43,8 @@ DComputeCodeGenManager::createComputeTarget(const std::string &s) { #if LDC_LLVM_SUPPORTED_TARGET_NVPTX #define CUDA_VALID_VER_INIT 100, 110, 120, 130, 200, 210, 300, 350, 370,\ - 500, 520, 600, 610, 620 - const std::array<int, 14> valid_cuda_versions = {{CUDA_VALID_VER_INIT}}; + 500, 520, 600, 610, 620, 700, 720, 750, 800 + const std::array<int, 18> valid_cuda_versions = {{CUDA_VALID_VER_INIT}}; As Johan said, it'd be great to get rid of hardcoded stuff like this, to prevent history repeating itself in a few years. There are other missing versions too (530, 320); and with LLVM 11, it looks like no versions older than 200 are supported anymore. - FWIW, in case we cannot derive them from LLVM, I'm in favor of making it the dev's responsibility to provide a valid version rather than pretending to have an up-to-date hardcoded list in the compiler. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3683 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASZLZVKZKWRQL4BPGSASWE3TDN2UTANCNFSM4ZC6JRKA> .

bcarneal · 2021-03-13T19:47:59Z

OK. My first descent into the clang cave located include/clang/Basic/Cuda.h and clang/lib/Basic/Cuda.cpp wherein we find various sm_xx and CUDA mappings. I saw no reliance on an LLVM authority in this area. clang appears to go its own way. So it looks like we have three options: 1) Isolate the cuda/sm version tracking to llvm and feed it upstream. 2) update the table from time to time (matching or exceeding clang table capability) 3) Remove the table and just flow the version thru from the command line (no checking/validation). Option 1) is "the right thing to do" but looks like a long term project (write code, convince clangers, convince llvmers, ...). Option 2) is what I prefer, at least in the near term while an effort towards 1) is "in process". We might use a sed script, or other, to automate this if an ldc build-time dependency on clang is permissible. Not sure which is uglier, requirement for manual attention or back door dependency on clang... Option 3) is less appealing as I believe it would defer error reporting til run time (when the .ptx file is jitted). I guess we could try to do a trial jit at compile time but that would give us another dependency, and a pretty ugly one at that... Another problem with 3) is that it cuts off a possible introspection dimension. If the compiler is not authoritative on what GPU variant is being targeted (could be one of a largeish command line list) then higher level introspection wrt GPU capabilities becomes significantly harder to do correctly, at least I'm not sure how you'd do it cleanly. OTOH if we can reliably introspect the GPU variant then we'd have a nice advantage wrt other languages targeting GPUs. Thanks for the earlier input. Let me know how I should proceed.

…

On Sat, Mar 13, 2021 at 8:11 AM Bruce Carneal ***@***.***> wrote: I'll go spelunking in clang. They may have a principled way to deal with their analogous "sm_xx" capability. On Sat, Mar 13, 2021 at 6:42 AM Martin Kinkelin ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In driver/dcomputecodegenerator.cpp > <#3683 (comment)>: > > > @@ -43,8 +43,8 @@ DComputeCodeGenManager::createComputeTarget(const std::string &s) { > > #if LDC_LLVM_SUPPORTED_TARGET_NVPTX > #define CUDA_VALID_VER_INIT 100, 110, 120, 130, 200, 210, 300, 350, 370,\ > - 500, 520, 600, 610, 620 > - const std::array<int, 14> valid_cuda_versions = {{CUDA_VALID_VER_INIT}}; > + 500, 520, 600, 610, 620, 700, 720, 750, 800 > + const std::array<int, 18> valid_cuda_versions = {{CUDA_VALID_VER_INIT}}; > > As Johan said, it'd be great to get rid of hardcoded stuff like this, to > prevent history repeating itself in a few years. There are other missing > versions too (530, 320); and with LLVM 11, it looks like no versions older > than 200 are supported anymore. - FWIW, in case we cannot derive them from > LLVM, I'm in favor of making it the dev's responsibility to provide a valid > version rather than pretending to have an up-to-date hardcoded list in the > compiler. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#3683 (review)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ASZLZVKZKWRQL4BPGSASWE3TDN2UTANCNFSM4ZC6JRKA> > . >

kinke · 2021-03-13T20:16:05Z

Option 3) is less appealing as I believe it would defer error reporting til run time (when the .ptx file is jitted)

I suggest confirming this first, because it's definitely not what I'd expect. I also expected an error for ldc2 -mtriple=nvptx64 -mcpu=sm_81, but it (AFAICT, LLVM) just prints a warning (6 times in my case...): 'sm_81' is not a recognized processor for this target (ignoring processor). Still, it's probably hard to overlook.

As already stated, I'm for option 3, simply because I care a lot more about power users trying to target the latest dialects (and don't want them to have to hack the compiler to get what they want) than users unable to figure out a correct string - provided there are errors or at least warnings and the help string makes it rather obvious, something like Use '-mtriple=nvptx[64] -mcpu=help' for a list of targetable shader models; e.g., 'cuda-500' corresponds to 'sm_50'.

bcarneal · 2021-03-13T20:45:22Z

I had thought that clang would use LLVM as the authority if possible but, apparently, I was mistaken. I'll look into LLVM and report back.

…

On Sat, Mar 13, 2021 at 12:16 PM Martin Kinkelin ***@***.***> wrote: Option 3) is less appealing as I believe it would defer error reporting til run time (when the .ptx file is jitted) I suggest confirming this first, because it's definitely not what I'd expect. I also expected an error for ldc2 -mtriple=nvptx64 -mcpu=sm_81, but it (AFAICT, LLVM) just prints a warning (6 times in my case...): 'sm_81' is not a recognized processor for this target (ignoring processor). Still, it's probably hard to overlook. As already stated, I'm for option 3, simply because I care a lot more about power users trying to target the latest dialects (and don't want them to have to hack the compiler to get what they want) than users unable to figure out a correct string - provided there are errors or at least warnings and the help string makes it rather obvious, something like Use '-mtriple=nvptx[64] -mcpu=help' for a list of targetable shader models; e.g., 'cuda-500' corresponds to 'sm_50'. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3683 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASZLZVJQJDHM4AQYVLK74W3TDPBZFANCNFSM4ZC6JRKA> .

bcarneal · 2021-03-14T02:29:34Z

OK, it looks like llvm keeps the master table of sm_xx info in llvm/lib/Target/NVPTX/NVPTX.td. During a build this is processed into: build/lib/Target/NVPTX/NVPTXGenSubtargetInfo.inc There is a lot of potentially valuable-at-compile-time GPU information available in this and other llvm files but I think we only really need a stable naming solution. So, we can give up on any driver level checking as kinke advises, do a manual table, do a .td scrape to get a table in sync with LLVM, process the td file ourselves, or get in bed with LLVM at the C++ level. One disconnect for me here is kinke's repeated use of command line examples without -mdcompute-targets=... How can we generate code for both a CPU and GPU(s) without something like -mdcompute-targets or the clangers --cuda-gpu-arch=... I'm probably missing something. Regardless, we can drop the driver level validity check and hope/expect that the IRState->module.setTargetTriple call within targetCUDA.cpp gives us a non-cryptic error if/when passed a bogus triple. At this point my preference is to do as the clangers did, maintain an independent table. Unless I'm missing something we'll eventually need to provide stable, curated GPU target names at compile time if we want to enable introspection against each of the targets in turn. We could rely on LLVM td scraping for those names but having our own table seems better. From a stable/curated naming scheme others can build extensive CT support outside the compiler. As before, let me know what I should do here and I'll try to get git to let me do it.

…

On Sat, Mar 13, 2021 at 12:45 PM Bruce Carneal ***@***.***> wrote: I had thought that clang would use LLVM as the authority if possible but, apparently, I was mistaken. I'll look into LLVM and report back. On Sat, Mar 13, 2021 at 12:16 PM Martin Kinkelin ***@***.***> wrote: > Option 3) is less appealing as I believe it would defer error reporting > til run time (when the .ptx file is jitted) > > I suggest confirming this first, because it's definitely not what I'd > expect. I also expected an error for ldc2 -mtriple=nvptx64 -mcpu=sm_81, > but it (AFAICT, LLVM) just prints a warning (6 times in my case...): 'sm_81' > is not a recognized processor for this target (ignoring processor). > Still, it's probably hard to overlook. > > As already stated, I'm for option 3, simply because I care a lot more > about power users trying to target the latest dialects (and don't want them > to have to hack the compiler to get what they want) than users unable to > figure out a correct string - provided there are errors or at least > warnings and the help string makes it rather obvious, something like Use > '-mtriple=nvptx[64] -mcpu=help' for a list of targetable shader models; > e.g., 'cuda-500' corresponds to 'sm_50'. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#3683 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ASZLZVJQJDHM4AQYVLK74W3TDPBZFANCNFSM4ZC6JRKA> > . >

thewilsonator · 2021-03-14T03:17:34Z

There is a lot of potentially valuable-at-compile-time GPU information available in this and other llvm files but I think we only really need a stable naming solution.

Those would be exposed via https://dlang.org/spec/traits.html#getTargetInfo though I'm not sure how to go about doing that for multiple GPUs/OpenCL at the same time, but that problem can be solved later.

use of command line examples without -mdcompute-targets=

Short answer is you can't because there is a bunch of metadata that needs to be attached to kernels in the IR for the backend to process is correctly. This is not added with ldc in "CPU mode" when targeting a GPU

clangs --cuda-gpu-arch=

clang does that because they process .cu files twice at the source level, one for the CPU and one for the GPU. ldc does a similar thing except the processing is duplicated in the AST -> IR conversion stage.

My preference would be for an *.inc based off of https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTX.td . The error messaging functionality would probably have to be reworked but thats fine. It was what worked at the time I wrote it and it was not the focus of my attention, it is far from an optimal way of doing things.

bcarneal · 2021-03-14T13:50:55Z

I agree that we should work from the td file in this case so I'd appreciate someone closing this PR. It will be some number of days before I take this up again and some amount of time after that to get the implementation done. I think that D has a great future on GPUs, nvptx here but later on atop SPIR-V platforms where CPUs/GPUs share memory. Thanks for the feedback.

…

On Sat, Mar 13, 2021 at 7:17 PM Nicholas Wilson ***@***.***> wrote: There is a lot of potentially valuable-at-compile-time GPU information available in this and other llvm files but I think we only really need a stable naming solution. Those would be exposed via https://dlang.org/spec/traits.html#getTargetInfo though I'm not sure how to go about doing that for multiple GPUs/OpenCL at the same time, but that problem can be solved later. use of command line examples without -mdcompute-targets= Short answer is you can't because there is a bunch of metadata that needs to be attached to kernels in the IR for the backend to process is correctly. This is not added with ldc in "CPU mode" when targeting a GPU clangs --cuda-gpu-arch= clang does that because they process .cu files twice at the source level, one for the CPU and one for the GPU. ldc does a similar thing except the processing is duplicated in the AST -> IR conversion stage. My preference would be for an *.inc based off of https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTX.td . The error messaging functionality would probably have to be reworked but thats fine. It was what worked at the time I wrote it and it was not the focus of my attention, it is far from an optimal way of doing things. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3683 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASZLZVNLPSCLOLAXJOCYYILTDQTFXANCNFSM4ZC6JRKA> .

thewilsonator · 2021-03-14T15:35:31Z

I agree that we should work from the td file in this case so I'd appreciate
someone closing this PR.

Having said that, this is a very non-intrusive change and if this quick-fix enables you to test intrinsics that require more recent processors, then this should go in and a more principled approach can be done later.

bcarneal · 2021-03-14T16:50:52Z

OK. Whatever you decide here is fine with me. A merge would be useful but I can work from my fork til I have time for the td effort. Note that the i1-as-dlang-bool commit updating the gen_gccbuiltins.cpp utility can stand alone from the driver cuda-version issue as it should be, with the recently added vec.empty() check, good for the long haul. I attempted to isolate that commit from the cuda-version issue but my current git ineptitude resulted in a marooned PR. I plan on fixing that Monday or Tuesday (when my git tutor is available for oversight :-) ). I'll watch what happens and fill in with additional PRs as needed in the future.

…

On Sun, Mar 14, 2021 at 8:35 AM Nicholas Wilson ***@***.***> wrote: I agree that we should work from the td file in this case so I'd appreciate someone closing this PR. Having said that, this is a very non-intrusive change and if this quick-fix enables you to test intrinsics that require more recent processors, then this should go in and a more principled approach can be done later. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3683 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASZLZVMIGS4NEXR6ZQVHSDLTDTJVDANCNFSM4ZC6JRKA> .

thewilsonator · 2021-03-15T01:52:44Z

I attempted to isolate that commit from the cuda-version issue but my
current git ineptitude resulted in a marooned PR.

Ahh I know that feeling all to well. I think what you want to do here is save the change as a patch an then apply the patch to the branch un question.

thewilsonator approved these changes Mar 13, 2021

View reviewed changes

thewilsonator mentioned this pull request Mar 13, 2021

Fix pip bootstrap URL #3684

Closed

kinke reviewed Mar 13, 2021

View reviewed changes

utils/gen_gccbuiltins.cpp Outdated Show resolved Hide resolved

kinke reviewed Mar 13, 2021

View reviewed changes

extend valid_cuda_versions thru 800

7c75da7

bcarneal force-pushed the dcomputextra branch from f231000 to 7c75da7 Compare March 14, 2021 00:33

thewilsonator merged commit b3e8b5f into ldc-developers:master Mar 15, 2021

bcarneal deleted the dcomputextra branch March 15, 2021 04:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dcomputextra #3683

Dcomputextra #3683

bcarneal commented Mar 12, 2021

thewilsonator commented Mar 13, 2021

JohanEngelen commented Mar 13, 2021

kinke Mar 13, 2021

bcarneal commented Mar 13, 2021 via email

bcarneal commented Mar 13, 2021 via email

kinke commented Mar 13, 2021

bcarneal commented Mar 13, 2021 via email

bcarneal commented Mar 14, 2021 via email

thewilsonator commented Mar 14, 2021

bcarneal commented Mar 14, 2021 via email

thewilsonator commented Mar 14, 2021

bcarneal commented Mar 14, 2021 via email

thewilsonator commented Mar 15, 2021

Dcomputextra #3683

Dcomputextra #3683

Conversation

bcarneal commented Mar 12, 2021

thewilsonator commented Mar 13, 2021

JohanEngelen commented Mar 13, 2021

kinke Mar 13, 2021

Choose a reason for hiding this comment

bcarneal commented Mar 13, 2021 via email

bcarneal commented Mar 13, 2021 via email

kinke commented Mar 13, 2021

bcarneal commented Mar 13, 2021 via email

bcarneal commented Mar 14, 2021 via email

thewilsonator commented Mar 14, 2021

bcarneal commented Mar 14, 2021 via email

thewilsonator commented Mar 14, 2021

bcarneal commented Mar 14, 2021 via email

thewilsonator commented Mar 15, 2021