-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use FNV-1a for kernel indexing instead of md5 #44
Conversation
Instead of hardcoding the HSA_AMDGPU_GPU_TARGET at compile time, autodetect it at runtime from the KFD topology. Change-Id: I00af68084869ab4d439e70cf8816c1c8868f224d
[CMake] Autodetect HSA_AMDGPU_GPU_TARGET
Use new workitem intrinsics + range metadata, correct some attributes on functions, and canonicalize. Correct range metadata to be maximum theoretical workgroup size. Change-Id: I9dedbe2dd62753858ccd0eb7841e228873a2c031
Cleanup wrapper IR functions
this will improve hcc runtime performance when multiple kernels are used in a program
Hey @whchung, Could you please review this PR? From my experiments of this patch in the gpu_deconvolution work, the total execution time has been reduced from 3m10s to 1m59s, which is a big improvement. |
@yan-ming thanks for this contribution. Could you alter this PR so it goes to "develop" branch? We are trying to adopt a new branching strategy now. |
|
||
const char *str = static_cast<const char *>(source); | ||
|
||
// 104 is the proper size from Jack's research |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 140, not 104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I just quoted that directly from your mail. Let me fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, 104 should be the correct number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, could you make it a macro? The rationale is because a new code object format is being developed so this value may have to be changed soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give me the desired name of the macro? Or I just use FNV1A_CUTOFF_SIZE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not very comfortable at merely passing the size of header to the hash algorithm. Take BrigModuleHeader as an example, originally it shall contain the hash of the BRIG module itself. But it's all filled by 0 in the current implementation. Also in LC backend we are moving toward completely ELF-compatible format so it's likely we run into same ELF headers for similar kernels.
Could you help study the impact to performance if we don't use this 104 heuristic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference purpose, here are sources where I got this 104 magic number. It's the larger value of these 2 kinds of headers.
- HSA BRIG module header: http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/BrigModuleHeader.htm
- ELF64 header: https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.eheader.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the gpu_deconvolution work is actually using LC backend (due to hcFFT). If I use the original size in FNV-loop, the execution time is about 2m37s.
Here's the comparison of each condition:
- original: 3m10s
- FNV hash: 2m37s
- FNV hash + 104 cutoff: 1m59s
Personally I would prefer having FNV hash with some fixed size cutoff for the best performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing the values. To boost performance let's make a macro to hold this magic number then. How about call it "KERNEL_CODE_OBJECT_HEADER_SIZE"? And put comments so we know how it's derived (bigger value of HSA BrigModuleHeader and Elf64_Ehdr).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yan-ming what you proposed ( FNV1A_CUTOFF_SIZE
) may be a better name after all.
Hi @whchung, Sure let me change the PR destination after I make sure everything is okay. |
This reverts commit cb0f883.
@yan-ming it's alright. github UI is less favorable compared to Bitbucket in this regard. |
this will improve hcc runtime performance when multiple kernels are used
in a program
make test
: