-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dcomputextra #3683
Dcomputextra #3683
Conversation
Thanks, Waiting on #3684 for test suite to pass. You might have to force push once that goes through to restart the CI |
Are the valid cuda versions something we can get from LLVM? |
500, 520, 600, 610, 620 | ||
const std::array<int, 14> valid_cuda_versions = {{CUDA_VALID_VER_INIT}}; | ||
500, 520, 600, 610, 620, 700, 720, 750, 800 | ||
const std::array<int, 18> valid_cuda_versions = {{CUDA_VALID_VER_INIT}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Johan said, it'd be great to get rid of hardcoded stuff like this, to prevent history repeating itself in a few years. There are other missing versions too (530, 320); and with LLVM 11, it looks like no versions older than 200 are supported anymore. - FWIW, in case we cannot derive them from LLVM, I'm in favor of making it the dev's responsibility to provide a valid version rather than pretending to have an up-to-date hardcoded list in the compiler.
I'll go spelunking in clang. They may have a principled way to deal with
their analogous "sm_xx" capability.
…On Sat, Mar 13, 2021 at 6:42 AM Martin Kinkelin ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In driver/dcomputecodegenerator.cpp
<#3683 (comment)>:
> @@ -43,8 +43,8 @@ DComputeCodeGenManager::createComputeTarget(const std::string &s) {
#if LDC_LLVM_SUPPORTED_TARGET_NVPTX
#define CUDA_VALID_VER_INIT 100, 110, 120, 130, 200, 210, 300, 350, 370,\
- 500, 520, 600, 610, 620
- const std::array<int, 14> valid_cuda_versions = {{CUDA_VALID_VER_INIT}};
+ 500, 520, 600, 610, 620, 700, 720, 750, 800
+ const std::array<int, 18> valid_cuda_versions = {{CUDA_VALID_VER_INIT}};
As Johan said, it'd be great to get rid of hardcoded stuff like this, to
prevent history repeating itself in a few years. There are other missing
versions too (530, 320); and with LLVM 11, it looks like no versions older
than 200 are supported anymore. - FWIW, in case we cannot derive them from
LLVM, I'm in favor of making it the dev's responsibility to provide a valid
version rather than pretending to have an up-to-date hardcoded list in the
compiler.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3683 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASZLZVKZKWRQL4BPGSASWE3TDN2UTANCNFSM4ZC6JRKA>
.
|
OK. My first descent into the clang cave located
include/clang/Basic/Cuda.h and clang/lib/Basic/Cuda.cpp wherein we find
various sm_xx and CUDA mappings. I saw no reliance on an LLVM authority in
this area. clang appears to go its own way.
So it looks like we have three options:
1) Isolate the cuda/sm version tracking to llvm and feed it upstream.
2) update the table from time to time (matching or exceeding clang table
capability)
3) Remove the table and just flow the version thru from the command line
(no checking/validation).
Option 1) is "the right thing to do" but looks like a long term project
(write code, convince clangers, convince llvmers, ...).
Option 2) is what I prefer, at least in the near term while an effort
towards 1) is "in process". We might use a sed script, or other, to
automate this if an ldc build-time dependency on clang is permissible. Not
sure which is uglier, requirement for manual attention or back door
dependency on clang...
Option 3) is less appealing as I believe it would defer error reporting til
run time (when the .ptx file is jitted). I guess we could try to do a
trial jit at compile time but that would give us another dependency, and a
pretty ugly one at that...
Another problem with 3) is that it cuts off a possible introspection
dimension. If the compiler is not authoritative on what GPU variant is
being targeted (could be one of a largeish command line list) then higher
level introspection wrt GPU capabilities becomes significantly harder to do
correctly, at least I'm not sure how you'd do it cleanly. OTOH if we can
reliably introspect the GPU variant then we'd have a nice advantage wrt
other languages targeting GPUs.
Thanks for the earlier input. Let me know how I should proceed.
…On Sat, Mar 13, 2021 at 8:11 AM Bruce Carneal ***@***.***> wrote:
I'll go spelunking in clang. They may have a principled way to deal with
their analogous "sm_xx" capability.
On Sat, Mar 13, 2021 at 6:42 AM Martin Kinkelin ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In driver/dcomputecodegenerator.cpp
> <#3683 (comment)>:
>
> > @@ -43,8 +43,8 @@ DComputeCodeGenManager::createComputeTarget(const std::string &s) {
>
> #if LDC_LLVM_SUPPORTED_TARGET_NVPTX
> #define CUDA_VALID_VER_INIT 100, 110, 120, 130, 200, 210, 300, 350, 370,\
> - 500, 520, 600, 610, 620
> - const std::array<int, 14> valid_cuda_versions = {{CUDA_VALID_VER_INIT}};
> + 500, 520, 600, 610, 620, 700, 720, 750, 800
> + const std::array<int, 18> valid_cuda_versions = {{CUDA_VALID_VER_INIT}};
>
> As Johan said, it'd be great to get rid of hardcoded stuff like this, to
> prevent history repeating itself in a few years. There are other missing
> versions too (530, 320); and with LLVM 11, it looks like no versions older
> than 200 are supported anymore. - FWIW, in case we cannot derive them from
> LLVM, I'm in favor of making it the dev's responsibility to provide a valid
> version rather than pretending to have an up-to-date hardcoded list in the
> compiler.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#3683 (review)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ASZLZVKZKWRQL4BPGSASWE3TDN2UTANCNFSM4ZC6JRKA>
> .
>
|
I suggest confirming this first, because it's definitely not what I'd expect. I also expected an error for As already stated, I'm for option 3, simply because I care a lot more about power users trying to target the latest dialects (and don't want them to have to hack the compiler to get what they want) than users unable to figure out a correct string - provided there are errors or at least warnings and the help string makes it rather obvious, something like |
I had thought that clang would use LLVM as the authority if possible but,
apparently, I was mistaken. I'll look into LLVM and report back.
…On Sat, Mar 13, 2021 at 12:16 PM Martin Kinkelin ***@***.***> wrote:
Option 3) is less appealing as I believe it would defer error reporting
til run time (when the .ptx file is jitted)
I suggest confirming this first, because it's definitely not what I'd
expect. I also expected an error for ldc2 -mtriple=nvptx64 -mcpu=sm_81,
but it (AFAICT, LLVM) just prints a warning (6 times in my case...): 'sm_81'
is not a recognized processor for this target (ignoring processor).
Still, it's probably hard to overlook.
As already stated, I'm for option 3, simply because I care a lot more
about power users trying to target the latest dialects (and don't want them
to have to hack the compiler to get what they want) than users unable to
figure out a correct string - provided there are errors or at least
warnings and the help string makes it rather obvious, something like Use
'-mtriple=nvptx[64] -mcpu=help' for a list of targetable shader models;
e.g., 'cuda-500' corresponds to 'sm_50'.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3683 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASZLZVJQJDHM4AQYVLK74W3TDPBZFANCNFSM4ZC6JRKA>
.
|
OK, it looks like llvm keeps the master table of sm_xx info in
llvm/lib/Target/NVPTX/NVPTX.td.
During a build this is processed into:
build/lib/Target/NVPTX/NVPTXGenSubtargetInfo.inc
There is a lot of potentially valuable-at-compile-time GPU information
available in this and other llvm files but I think we only really need a
stable naming solution.
So, we can give up on any driver level checking as kinke advises, do a
manual table, do a .td scrape to get a table in sync with LLVM, process the
td file ourselves, or get in bed with LLVM at the C++ level.
One disconnect for me here is kinke's repeated use of command line examples
without -mdcompute-targets=... How can we generate code for both a CPU and
GPU(s) without something like -mdcompute-targets or the clangers
--cuda-gpu-arch=... I'm probably missing something.
Regardless, we can drop the driver level validity check and hope/expect
that the IRState->module.setTargetTriple call within targetCUDA.cpp gives
us a non-cryptic error if/when passed a bogus triple.
At this point my preference is to do as the clangers did, maintain an
independent table. Unless I'm missing something we'll eventually need to
provide stable, curated GPU target names at compile time if we want to
enable introspection against each of the targets in turn. We could rely on
LLVM td scraping for those names but having our own table seems better.
From a stable/curated naming scheme others can build extensive CT support
outside the compiler.
As before, let me know what I should do here and I'll try to get git to let
me do it.
…On Sat, Mar 13, 2021 at 12:45 PM Bruce Carneal ***@***.***> wrote:
I had thought that clang would use LLVM as the authority if possible but,
apparently, I was mistaken. I'll look into LLVM and report back.
On Sat, Mar 13, 2021 at 12:16 PM Martin Kinkelin ***@***.***>
wrote:
> Option 3) is less appealing as I believe it would defer error reporting
> til run time (when the .ptx file is jitted)
>
> I suggest confirming this first, because it's definitely not what I'd
> expect. I also expected an error for ldc2 -mtriple=nvptx64 -mcpu=sm_81,
> but it (AFAICT, LLVM) just prints a warning (6 times in my case...): 'sm_81'
> is not a recognized processor for this target (ignoring processor).
> Still, it's probably hard to overlook.
>
> As already stated, I'm for option 3, simply because I care a lot more
> about power users trying to target the latest dialects (and don't want them
> to have to hack the compiler to get what they want) than users unable to
> figure out a correct string - provided there are errors or at least
> warnings and the help string makes it rather obvious, something like Use
> '-mtriple=nvptx[64] -mcpu=help' for a list of targetable shader models;
> e.g., 'cuda-500' corresponds to 'sm_50'.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#3683 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ASZLZVJQJDHM4AQYVLK74W3TDPBZFANCNFSM4ZC6JRKA>
> .
>
|
Those would be exposed via https://dlang.org/spec/traits.html#getTargetInfo though I'm not sure how to go about doing that for multiple GPUs/OpenCL at the same time, but that problem can be solved later.
Short answer is you can't because there is a bunch of metadata that needs to be attached to kernels in the IR for the backend to process is correctly. This is not added with ldc in "CPU mode" when targeting a GPU
clang does that because they process My preference would be for an |
I agree that we should work from the td file in this case so I'd appreciate
someone closing this PR.
It will be some number of days before I take this up again and some amount
of time after that to get the implementation done.
I think that D has a great future on GPUs, nvptx here but later on atop
SPIR-V platforms where CPUs/GPUs share memory.
Thanks for the feedback.
…On Sat, Mar 13, 2021 at 7:17 PM Nicholas Wilson ***@***.***> wrote:
There is a lot of potentially valuable-at-compile-time GPU information
available in this and other llvm files but I think we only really need a
stable naming solution.
Those would be exposed via
https://dlang.org/spec/traits.html#getTargetInfo though I'm not sure how
to go about doing that for multiple GPUs/OpenCL at the same time, but that
problem can be solved later.
use of command line examples without -mdcompute-targets=
Short answer is you can't because there is a bunch of metadata that needs
to be attached to kernels in the IR for the backend to process is
correctly. This is not added with ldc in "CPU mode" when targeting a GPU
clangs --cuda-gpu-arch=
clang does that because they process .cu files twice at the source level,
one for the CPU and one for the GPU. ldc does a similar thing except the
processing is duplicated in the AST -> IR conversion stage.
My preference would be for an *.inc based off of
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTX.td
. The error messaging functionality would probably have to be reworked but
thats fine. It was what worked at the time I wrote it and it was not the
focus of my attention, it is far from an optimal way of doing things.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3683 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASZLZVNLPSCLOLAXJOCYYILTDQTFXANCNFSM4ZC6JRKA>
.
|
Having said that, this is a very non-intrusive change and if this quick-fix enables you to test intrinsics that require more recent processors, then this should go in and a more principled approach can be done later. |
OK. Whatever you decide here is fine with me. A merge would be useful but
I can work from my fork til I have time for the td effort.
Note that the i1-as-dlang-bool commit updating the gen_gccbuiltins.cpp
utility can stand alone from the driver cuda-version issue as it should be,
with the recently added vec.empty() check, good for the long haul.
I attempted to isolate that commit from the cuda-version issue but my
current git ineptitude resulted in a marooned PR. I plan on fixing that
Monday or Tuesday (when my git tutor is available for oversight :-) ).
I'll watch what happens and fill in with additional PRs as needed in the
future.
…On Sun, Mar 14, 2021 at 8:35 AM Nicholas Wilson ***@***.***> wrote:
I agree that we should work from the td file in this case so I'd appreciate
someone closing this PR.
Having said that, this is a very non-intrusive change and if this
quick-fix enables you to test intrinsics that require more recent
processors, then this should go in and a more principled approach can be
done later.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3683 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASZLZVMIGS4NEXR6ZQVHSDLTDTJVDANCNFSM4ZC6JRKA>
.
|
Ahh I know that feeling all to well. I think what you want to do here is save the change as a patch an then apply the patch to the branch un question. |
Updating valid_cuda_versions gives access to additional builtins.
The programmer is responsible for adapting to the evolving CUDA semantics but this should not be a breaking change.