-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Support for Treating Device Functions As hsa_executable_symbol_t's #203
Comments
What calling convention do you expect these functions to conform to? |
@t-tye for now the default emitted by LLVM, which is the C Calling convention. Support for other calling conventions and querying them can also be considered for the long term, but for now I don't think is required. |
I do not think AMD GPU defines a fixed C Calling convention (see AMDGPUUsage). There are the complexities that the register allocation is done dynamically at kernel launch, so when a function is called its convention depends on the register budget allocated. |
@t-tye I understand the reg constraint concerns. For now adding support for just recognizing them and locating them should be enough for my use case. Do you think that's feasible? |
There are a lot of issues in a calling convention, and to me the AMD GPU calling convention is not likely at a level that you can do the full generality that you describe. That does not mean some lesser thing could not be achieved. We are wrestling with a lot of this in our work on the debugger and compiler, but my sense is that that work is not fully set yet. To give a more precise answer we probably need to have a meeting to discuss further. |
@t-tye let's meet to discuss this further. |
As of right now, the HSA standard only supports identifying the following symbols types:
The standard has listed indirect functions as a symbol type, but the AMD ROCr runtime does not implement it.
Device functions on the other hand, are absent from this list, even though they can be identified by inspecting the Loaded Code Object's storage ELF directly, and are emitted by the LLVM AMDGPU compiler.
Supporting device functions as
hsa_executable_symbol_t
's can have the following benefits:hsa_executable_symbol_t
s means the loader can resolve their relocations. A user can have a library of device functions in a separate code object and another one that uses said library. Instead of having to link both code objects together before loading, the user can simply add both code objects into a single executable before freezing the executable.a. A tool writer inserts callbacks in the kernel to device functions they have written in the tool; The instrumentation runtime should be able to identify where the device function is loaded, so that it can perform insert the requested callback into the target application.
b. When analyzing the target kernel, a list of possible device functions called from it needs to be identified and returned to the tool writer, in case they want to instrument them as well. Exposing these as
hsa_executable_symbol_t
seems like the logical option.cc @kzhuravl
The text was updated successfully, but these errors were encountered: