use FNV-1a for kernel indexing instead of md5 #44

yan-ming · 2016-04-29T20:11:15Z

this will improve hcc runtime performance when multiple kernels are used
in a program

make test:

********************
Testing Time: 633.70s
********************
Failing Tests (7):
    CPPAMP :: Unit/AmpShortVectors/hc_short_vector_device.cpp
    CPPAMP :: Unit/HC/memcpy_symbol1.cpp
    CPPAMP :: Unit/HC/memcpy_symbol3.cpp
    CPPAMP :: Unit/HC/wg_size.cpp
    CPPAMP :: Unit/HSAIL/shfl_xor.cpp
    CPPAMP :: Unit/SharedLibrary/shared_library2.cpp
    CPPAMP :: Unit/SharedLibrary/shared_library3.cpp

  Expected Passes    : 661
  Expected Failures  : 25
  Unsupported Tests  : 10
  Unexpected Failures: 7

Instead of hardcoding the HSA_AMDGPU_GPU_TARGET at compile time, autodetect it at runtime from the KFD topology. Change-Id: I00af68084869ab4d439e70cf8816c1c8868f224d

[CMake] Autodetect HSA_AMDGPU_GPU_TARGET

Use new workitem intrinsics + range metadata, correct some attributes on functions, and canonicalize. Correct range metadata to be maximum theoretical workgroup size. Change-Id: I9dedbe2dd62753858ccd0eb7841e228873a2c031

Cleanup wrapper IR functions

this will improve hcc runtime performance when multiple kernels are used in a program

yan-ming · 2016-04-29T20:18:08Z

Hey @whchung,

Could you please review this PR?

From my experiments of this patch in the gpu_deconvolution work, the total execution time has been reduced from 3m10s to 1m59s, which is a big improvement.

whchung · 2016-04-29T20:24:20Z

@yan-ming thanks for this contribution. Could you alter this PR so it goes to "develop" branch? We are trying to adopt a new branching strategy now.

whchung · 2016-04-29T20:25:09Z

lib/hsa/mcwamp_hsa.cpp

+
+ const char *str = static_cast<const char *>(source);
+
+ // 104 is the proper size from Jack's research


It's 140, not 104

Oops, I just quoted that directly from your mail. Let me fix it.

Sorry, 104 should be the correct number.

On a second thought, could you make it a macro? The rationale is because a new code object format is being developed so this value may have to be changed soon.

Could you give me the desired name of the macro? Or I just use FNV1A_CUTOFF_SIZE?

I'm still not very comfortable at merely passing the size of header to the hash algorithm. Take BrigModuleHeader as an example, originally it shall contain the hash of the BRIG module itself. But it's all filled by 0 in the current implementation. Also in LC backend we are moving toward completely ELF-compatible format so it's likely we run into same ELF headers for similar kernels.

Could you help study the impact to performance if we don't use this 104 heuristic?

For reference purpose, here are sources where I got this 104 magic number. It's the larger value of these 2 kinds of headers.

HSA BRIG module header: http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/BrigModuleHeader.htm

ELF64 header: https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.eheader.html

So the gpu_deconvolution work is actually using LC backend (due to hcFFT). If I use the original size in FNV-loop, the execution time is about 2m37s.

Here's the comparison of each condition:

original: 3m10s

FNV hash: 2m37s

FNV hash + 104 cutoff: 1m59s

Personally I would prefer having FNV hash with some fixed size cutoff for the best performance.

Thanks for providing the values. To boost performance let's make a macro to hold this magic number then. How about call it "KERNEL_CODE_OBJECT_HEADER_SIZE"? And put comments so we know how it's derived (bigger value of HSA BrigModuleHeader and Elf64_Ehdr).

@yan-ming what you proposed ( FNV1A_CUTOFF_SIZE ) may be a better name after all.

yan-ming · 2016-04-29T20:30:49Z

Hi @whchung,

Sure let me change the PR destination after I make sure everything is okay.

This reverts commit cb0f883.

yan-ming · 2016-04-29T21:52:24Z

@whchung it seems like github doesn't allow me to change PR destination branch directly, please look to PR #45, sorry for the inconvenience.

whchung · 2016-04-29T21:55:54Z

@yan-ming it's alright. github UI is less favorable compared to Bitbucket in this regard.

Andres-design16 and others added 9 commits April 22, 2016 11:33

[CMake] Autodetect HSA_AMDGPU_GPU_TARGET

37e58ed

Instead of hardcoding the HSA_AMDGPU_GPU_TARGET at compile time, autodetect it at runtime from the KFD topology. Change-Id: I00af68084869ab4d439e70cf8816c1c8868f224d

Merge pull request ROCm#36 from arodrigx7/develop

b40ee79

[CMake] Autodetect HSA_AMDGPU_GPU_TARGET

Cleanup wrapper IR functions

e85cf4b

Use new workitem intrinsics + range metadata, correct some attributes on functions, and canonicalize. Correct range metadata to be maximum theoretical workgroup size. Change-Id: I9dedbe2dd62753858ccd0eb7841e228873a2c031

Merge pull request ROCm#39 from arsenm/lib-ir-cleanup

36a5839

Cleanup wrapper IR functions

merge from master

1955a55

fix missing metadata error in wrapper library

403f9e4

add C++ placement new unit test

5520c23

use FNV-1a for kernel indexing instead of md5

60058c6

this will improve hcc runtime performance when multiple kernels are used in a program

fix typo

ceaf9ed

whchung reviewed Apr 29, 2016
View reviewed changes

yan-ming added 5 commits April 29, 2016 15:32

change the magic number from 104 to 140

cb0f883

rename the kernel checksum function name for clarity

82d3077

Revert "change the magic number from 104 to 140"

dcf3968

This reverts commit cb0f883.

use a macro to hold the magic number 104

94071f3

Merge remote-tracking branch 'origin/develop' into new-hash

825a3ac

yan-ming closed this Apr 29, 2016

jvesely mentioned this pull request Feb 25, 2017

Build failure #250

Closed

nonpolarity mentioned this pull request Jun 13, 2019

hcc 2.x compiling fails with -DCMAKE_BUILD_TYPE="Debug" #1178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use FNV-1a for kernel indexing instead of md5 #44

use FNV-1a for kernel indexing instead of md5 #44

yan-ming commented Apr 29, 2016

yan-ming commented Apr 29, 2016

whchung commented Apr 29, 2016

whchung Apr 29, 2016

yan-ming Apr 29, 2016

whchung Apr 29, 2016

whchung Apr 29, 2016

yan-ming Apr 29, 2016 •

edited

Loading

whchung Apr 29, 2016

whchung Apr 29, 2016

yan-ming Apr 29, 2016

whchung Apr 29, 2016

whchung Apr 29, 2016

yan-ming commented Apr 29, 2016

yan-ming commented Apr 29, 2016

whchung commented Apr 29, 2016


		const char str = static_cast<const char >(source);

		// 104 is the proper size from Jack's research

use FNV-1a for kernel indexing instead of md5 #44

use FNV-1a for kernel indexing instead of md5 #44

Conversation

yan-ming commented Apr 29, 2016

yan-ming commented Apr 29, 2016

whchung commented Apr 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yan-ming Apr 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yan-ming commented Apr 29, 2016

yan-ming commented Apr 29, 2016

whchung commented Apr 29, 2016

yan-ming Apr 29, 2016 •

edited

Loading