-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use natural dispatch syntax #246
Use natural dispatch syntax #246
Conversation
…patch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
src/grid_launch.cpp
Outdated
|
||
auto agent = target_agent(stream); | ||
|
||
const auto it1 = find_if( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be std::find_if
, or is there a reason to leave this up to ADL?
inline | ||
section* find_section_if(elfio& reader, P p) | ||
{ | ||
const auto it = find_if( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, here is another ADL-based find_if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR seems to change a number of files for "other" reasons such as reformatting or refactoring. Some of the test changes (for example replacing hipFree of device memory with delete[] in some of the tests) seem unrelated and perhaps incorrect.
Would you focus this PR on just the new dispatch functionality?
…m visible VA == so_base_va + st_value(function_symbol). Remove quaint usage of pfe for hipMemset (which is actually fill_n).
ROCm/hcc-clang-upgrade#104 is needed in HCC's upstream clang to support this. |
…based_dispatch_instead_of_pfe # Conflicts: # src/hip_module.cpp
…based_dispatch_instead_of_pfe
…based_dispatch_instead_of_pfe
@AlexVlx I think this PR deserves some high-level overview, and possibly be broken into multiple PRs, as it seems to try achieve several different targets at the same time.
|
@AlexVlx one extra question: would |
@pfultz2 : good eyes:) |
@whchung @bensander #255 handles the noisy whitespace differences. |
Clean up trailing whitespace so as to reduce noise in #246.
@AlexVlx i guess my question for
from the logic I think the answer is yes, but i'd like to double check with you. for since |
invite @mangupta as directed tests / samples which depend on |
…based_dispatch_instead_of_pfe # Conflicts: # tests/src/runtimeApi/stream/hipStreamSync2.cpp
@whchung @mangupta making hccgenco.sh obsolete is a two step process, which I think can follow going through separately, after we're done with this: |
@bensander / @mangupta , ROCm/hcc-clang-upgrade#104 has been merged into |
@AlexVlx i agree it's time to try get rid of for |
Thanks Jack. We need to debug some of the failing tests so may take a bit to settle out. Does tip HCC still work with old HIP?
From: Wen-Heng (Jack) Chung [mailto:notifications@github.com]
Sent: Wednesday, November 8, 2017 1:17 PM
To: ROCm-Developer-Tools/HIP <HIP@noreply.github.com>
Cc: Sander, Ben <ben.sander@amd.com>; Mention <mention@noreply.github.com>
Subject: Re: [ROCm-Developer-Tools/HIP] Use natural dispatch syntax (#246)
@bensander<https://github.com/bensander> / @mangupta<https://github.com/mangupta> , ROCm/hcc-clang-upgrade#104<ROCm/hcc-clang-upgrade#104> has been merged into HCC mainline now. I think we may need your extra round of review for this PR.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#246 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACYSAsWyLgj4P-W4usy1RPs-qqv02vsKks5s0f4sgaJpZM4QOWGx>.
|
@bensander: the changes in the tip of HCC are non-intrusive, it will keep working with existing HIP. |
…based_dispatch_instead_of_pfe
…based_dispatch_instead_of_pfe
I'm particularly running into issues with `device_types.h` in real CUDA code...
…bitwise conversion functions, by using simple reinterpret_casts, as is idiomatic. These functions are supposed to be re-entrant, correct and efficient. Sadly, they were neither: they hid a massive race condition against a value stored in global memory, which means that they were also unreasonably slow if they ever managed to be correct, and relied on union based type punning which is in a grey area of the standard. It is difficult to ascertain what may have been the reason for coming up with this quirky solution.
…cpy for bitcasting, and not rely on undefined behaviour of a different flavour as a substitute for the original undefined behaviour. Note that the compiler will (should) optimise down to the same emitted code, since this is a pattern it understands.
Change-Id: I67943859a6344c5eec0eaa23418c9b802ef72468
…based_dispatch_instead_of_pfe # Conflicts: # src/hip_module.cpp
…xture references.
…ion of HCC used to compile.
@bensander done and done - if you'd care to experiment with it and counter-verify that things pass on your end as well, including on old(er) HCC, it would be neat. Thank you. |
…e compiled for the grid_launch_GGL component.
@AlexVlx - the 1.6 tests failed, see the Jenkins results. Here is a snippet of the error:
|
… later versions of the compiler, just like module based dispatch, and thus must be guarded against usage in earlier (e.g. 1.6) versions.
LGTM. @kknox - how do we get the CI results to run again? |
@bensander I''d wait up a bit since I think I have a better solution that the one embodied in the latest commit (definitely less noisy). As for CI I think it automatically runs when the PR is updated. |
…ork with later versions of the compiler, just like module based dispatch, and thus must be guarded against usage in earlier (e.g. 1.6) versions." This reverts commit d2fd1f5
…tps://github.com/AlexVlx/HIP into feature_use_module_based_dispatch_instead_of_pfe
…A indexing to be used.
…ork with later versions of the compiler, just like module based dispatch, and thus must be guarded against usage in earlier (e.g. 1.6) versions." This reverts commit d2fd1f5
…le. In this mode, there exist two executables per each code object, one created by HCC and one created by HIP. Since we dispatch through HCC in legacy mode, we should obtain the address for an agent allocated variable from the latter's executable. Also add two omitted validity checks, whose absence could lead to segfaults when the current process had no .kernel section and / or when an invalid or empty blob was extracted from the latter.
is emitted with full knowledge of its status as a kernel entry-point. More specifically, we need the function to have the AMDGPU_KERNEL calling convention. Unfortunately, only FunctionDecls with the OpenCLKernel attribute are emitted accordingly, but we do not want (and cannot handle) all of the overhead of the OpenCL specification (e.g. adding explicit address space qualifiers to the kernel signature). As such, we use this workaround which marks a __global__ function as an OpenCL kernel as late as possible, after OpenCL semantic checks, but before emitting the llvm::Function. This is rather unpleasant and unlikely to ever be upstreamed - the right solution is to have AMDGPU_KERNEL as its own calling convention, a la __stdcall, which can be used orthogonally to OpenCL. We need this change for correct code generation when using ROCm/HIP#246. (cherry picked from commit 7dd467d)
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit global functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the global functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.