Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix threadblocks in Errf_CUDA #17

Open
miromarszal opened this issue Jun 28, 2017 · 0 comments
Open

Fix threadblocks in Errf_CUDA #17

miromarszal opened this issue Jun 28, 2017 · 0 comments

Comments

@miromarszal
Copy link
Owner

Currently threads are divided into N block of N threads per block, where N is the width of an array. This has a very low limit on scalability, i.e. the maximum number of threads per block.

Rewrite in such a way, that the code scales efficiently at least for arrays whose size is a power of 2. Perhaps dividing into square submatrices would be still quite readable, but much more scalable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant