Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase KERNARG_BUFFER_SIZE from 512 to 4k #1377

Merged
merged 2 commits into from
Jan 29, 2020

Conversation

jeffdaily
Copy link
Collaborator

Decrease HCC_ASYNCOPS_SIZE from 16k to 1k.
HCC_KERNARG_BUFFER_SIZE is now an environment variable.
HCC_KERNARG_POOL_SIZE is now an environment variable.

@jeffdaily jeffdaily requested a review from scchan January 25, 2020 00:34
@jeffdaily jeffdaily changed the title cincrease KERNARG_BUFFER_SIZE from 512 to 4k increase KERNARG_BUFFER_SIZE from 512 to 4k Jan 25, 2020
@jeffdaily
Copy link
Collaborator Author

Since this PR also reduces the asyncops size, it could replace #1261 .

@emankov
Copy link
Contributor

emankov commented Jan 25, 2020

Justification for all the numbers is needed.

@jeffdaily
Copy link
Collaborator Author

@emankov

  1. CUDA default kernarg size is 4k. __global__ function parameters are passed to the device via constant memory and are limited to 4 KB. From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-parameters .
  2. PyTorch translate model uses a number of kernels with kernargs > 512 bytes, the current default. Changing the default kernarg buffer size results in a 30% performance improvement since kernargs are no longer allocated on demand.
  3. Since kernarg buffer size is increased by 8 times, HCC_ASYNCOPS_SIZE is reduced by 16 times to keep memory use roughly the same in the worst case, assuming two streams fully queuing to the same device.

@jeffdaily
Copy link
Collaborator Author

@emankov The most important change in this PR is the increase in the default kernarg buffer size. If needed, would such a change be acceptable without the other changes?

@emankov
Copy link
Contributor

emankov commented Jan 27, 2020

@jeffdaily, thank you for explanation. Could you please add just a few words in comments?

@jeffdaily
Copy link
Collaborator Author

@emankov comments added in commit f0e2b40.

lib/hsa/mcwamp_hsa.cpp Outdated Show resolved Hide resolved
@scchan scchan merged commit 05d7af3 into ROCm:clang_tot_upgrade Jan 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants