-
Notifications
You must be signed in to change notification settings - Fork 260
[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181
base: master
Are you sure you want to change the base?
[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181
Conversation
set(MLIR_ROCM_CONVERSIONS_ENABLED 1) | ||
else() | ||
set(MLIR_ROCM_CONVERSIONS_ENABLED 0) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct I believe: at the moment you have a dependency on the environment (lld, ROCm runtime bitcode libraries, etc.)
As mentioned elsewhere: there should be a step detecting the environment (and we should probably put this behind an opt-in CMake flag by the way) and generating a rocm_config.h that can be included in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have address this in the CMakeLists.txt file in the GPUToROCM dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is much better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joker-eph , I have pushed out a new rebased commit that addresses the feedback you provided.
Please re-review....thanks
deven
set(MLIR_ROCM_CONVERSIONS_ENABLED 1) | ||
else() | ||
set(MLIR_ROCM_CONVERSIONS_ENABLED 0) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have address this in the CMakeLists.txt file in the GPUToROCM dir
@joker-eph , gentle ping |
Sorry, I missed the update (and LLVM Dev Meeting happened). I'm catching up on everything and will get to it this week. Ping on Monday if you don't see an update! |
if (EXISTS ${ROCM_INSTALL_DIR}) | ||
message("-- ROCm Install Dir - ${ROCM_INSTALL_DIR}") | ||
else() | ||
message(SEND_ERROR "-- NOT FOUND : ROCm Install Dir - ${ROCM_INSTALL_DIR}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't issue an error when this will be the normal behavior (a complete config on a machine without ROCM).
We should differentiate when the user is setting the value and when it is the default one. When the user provides the path to ROCM and we don't find it, we should just use FATAL_ERROR, otherwise a normal STATUS should be enough.
Can you also set the variables to empty in the else branch?
Finally, if ROCM is correctly detected, it seems like we should set a variable so that we can distinguish on it in the code and in lit tests as well: basically we have two levels that we should distinguish:
- AMDGPU backend is built-in: we can run the test pass.
- We have 1) and ROCM is installed/configured on the system. In this case we can run another test that exercise the linker invocation as well.
In lit you have config.run_rocm_tests
at the moment, which is only indicating 1) above. We should have another flag for 2), and a macro in the code as well ideally.
#define MLIR_CONVERSION_GPUTOROCM_ROCMCONFIG_H_ | ||
|
||
/// The code to generate the HSACO binary blobs (corresponding the GPU kernels) | ||
/// assumes the presense of ROCm libraries/utilities. The location of these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// assumes the presense of ROCm libraries/utilities. The location of these | |
/// assumes the presence of ROCm libraries/utilities. The location of these |
static constexpr const char *kHSACOAnnotation = "amdgpu.hsaco"; | ||
static constexpr const char *kHSACOGetterAnnotation = "amdgpu.hsacogetter"; | ||
static constexpr const char *kHSACOGetterSuffix = "_hsaco"; | ||
static constexpr const char *kHSACOStorageSuffix = "_hsaco_cst"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a use for kHSACOGetterSuffix and kHSACOStorageSuffix?
Also are all these definition sintended to be defined in the public header of are they more internal detail of the pass (and could be in the implementation file).
set(MLIR_ROCM_CONVERSIONS_ENABLED 1) | ||
else() | ||
set(MLIR_ROCM_CONVERSIONS_ENABLED 0) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is much better :)
llvm::TargetMachine::CGFT_ObjectFile); | ||
codegenPasses.run(llvmModule); | ||
isabinFileStream.flush(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to keep maximum coverage for the test, since the pass is only available when the AMDGPU backend is built in, we should be able to get until there for the basic test. We could just reload the binary file isabinFilename
print it as hexadecimal and return.
Something like:
if (testMode) {
auto hsacoFileOrError = llvm::MemoryBuffer::getFileAsStream(isabinFilename);
if ((ec = hsacoFileOrError.getError()))
return {};
std::unique_ptr<llvm::MemoryBuffer> hsacoFile = std::move(hsacoFileOrError.get());
SmallVector<char, 1> buffer;
{
llvm::raw_svector_ostream os(buffer);
for (auto char : hascoFile.getBuffer())
os.write_hex(char);
}
return std::make_unique<std::vector<char>>(buffer.begin(), buffer.end());
}
The test would be able to check that we actually correctly generate an ELF binary (by checking the ELF magic in hexadecimal) and cover everything but the linker when ROCM is not available.
|
||
static PassRegistration<GpuKernelToHSACOPass> | ||
pass("test-kernel-to-hsaco", | ||
"Convert all kernel functions to HSA code object blobs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a macro that define if ROCM is installed or not, we can then conditionally register another version of the pass "rocm-hsaco-lowering" or something like which actually run the linker as well, and add a test for this.
48dcae0
to
3722f03
Compare
This is the first of four PRs to add support for the ROCm backend for AMD GPUs. This PR adds the pass to generate the HSA Code objects corresponding the GPU kernels. This PR builds on the work done by @whchung in PR #63
This PR is a follow-up to request in PR #169. It addresses all but two of the code-review comments made in that PR for the
ConvertKernelFuncToHSACO.cpp
file.Two reasons for this.
The binary blob owned by this variable could potentially be large, and hence pass-by-pointer instead of pass-by-value (Note that the "HSACO" string is only returned in the "test" scenario)
Consistency with the CUDA side (where we OwnedCubin, which is also unique_ptr wrapper on a vector of chars)
At least for the time being, it is easier to go through files. Dumping out the .ll file is helpful for debugging purposes, so that is something we would like to keep regardless. Also I am still trying to figure out how to invoke
lld
as an API call, within the mlir setup :(@joker-eph @whchung