increase KERNARG_BUFFER_SIZE from 512 to 4k #1377

jeffdaily · 2020-01-25T00:34:12Z

Decrease HCC_ASYNCOPS_SIZE from 16k to 1k.
HCC_KERNARG_BUFFER_SIZE is now an environment variable.
HCC_KERNARG_POOL_SIZE is now an environment variable.

jeffdaily · 2020-01-25T00:35:41Z

Since this PR also reduces the asyncops size, it could replace #1261 .

emankov · 2020-01-25T13:19:21Z

Justification for all the numbers is needed.

jeffdaily · 2020-01-27T15:24:53Z

@emankov

CUDA default kernarg size is 4k. __global__ function parameters are passed to the device via constant memory and are limited to 4 KB. From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-parameters .
PyTorch translate model uses a number of kernels with kernargs > 512 bytes, the current default. Changing the default kernarg buffer size results in a 30% performance improvement since kernargs are no longer allocated on demand.
Since kernarg buffer size is increased by 8 times, HCC_ASYNCOPS_SIZE is reduced by 16 times to keep memory use roughly the same in the worst case, assuming two streams fully queuing to the same device.

jeffdaily · 2020-01-27T15:45:09Z

@emankov The most important change in this PR is the increase in the default kernarg buffer size. If needed, would such a change be acceptable without the other changes?

emankov · 2020-01-27T15:55:03Z

@jeffdaily, thank you for explanation. Could you please add just a few words in comments?

jeffdaily · 2020-01-27T16:52:01Z

@emankov comments added in commit f0e2b40.

lib/hsa/mcwamp_hsa.cpp

Decrease HCC_ASYNCOPS_SIZE from 16k to 1k.

jeffdaily requested a review from scchan January 25, 2020 00:34

jeffdaily changed the title ~~cincrease KERNARG_BUFFER_SIZE from 512 to 4k~~ increase KERNARG_BUFFER_SIZE from 512 to 4k Jan 25, 2020

scchan requested changes Jan 27, 2020

View reviewed changes

lib/hsa/mcwamp_hsa.cpp Outdated Show resolved Hide resolved

jeffdaily added 2 commits January 27, 2020 20:56

increase KERNARG_BUFFER_SIZE from 512 to 4k

570dc6a

Decrease HCC_ASYNCOPS_SIZE from 16k to 1k.

add comments to HCC_KERNARG_BUFFER_SIZE, HCC_ASYNCOPS_SIZE

d6b79a3

jeffdaily force-pushed the increase_kernarg_buffer_size branch from f0e2b40 to d6b79a3 Compare January 27, 2020 20:57

scchan merged commit 05d7af3 into ROCm:clang_tot_upgrade Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

increase KERNARG_BUFFER_SIZE from 512 to 4k #1377

increase KERNARG_BUFFER_SIZE from 512 to 4k #1377

jeffdaily commented Jan 25, 2020

jeffdaily commented Jan 25, 2020

emankov commented Jan 25, 2020

jeffdaily commented Jan 27, 2020

jeffdaily commented Jan 27, 2020

emankov commented Jan 27, 2020

jeffdaily commented Jan 27, 2020

increase KERNARG_BUFFER_SIZE from 512 to 4k #1377

increase KERNARG_BUFFER_SIZE from 512 to 4k #1377

Conversation

jeffdaily commented Jan 25, 2020

jeffdaily commented Jan 25, 2020

emankov commented Jan 25, 2020

jeffdaily commented Jan 27, 2020

jeffdaily commented Jan 27, 2020

emankov commented Jan 27, 2020

jeffdaily commented Jan 27, 2020