Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve threads and blocks tuning for 2D parallel_for (CUDA) #100

Closed

Conversation

PhilipFackler
Copy link
Collaborator

Similar to #76 but for 2D. This only applies to the CUDA version.

@PhilipFackler PhilipFackler marked this pull request as draft June 5, 2024 18:59
@williamfgc
Copy link
Collaborator

@PhilipFackler does this work for your use case?

@williamfgc
Copy link
Collaborator

Test this please

@PhilipFackler
Copy link
Collaborator Author

@williamfgc It worked for the smallish test case I was using for development (one that caused the failure in the first place). I still want to test it out on iguazu with a bigger case.

@williamfgc
Copy link
Collaborator

@PhilipFackler thanks, let me know when this is ready to merge.

@PhilipFackler PhilipFackler force-pushed the fix-cuda-thread-counts-2d branch 2 times, most recently from 0da2e97 to 66cd4b8 Compare June 6, 2024 16:29
@PhilipFackler PhilipFackler force-pushed the fix-cuda-thread-counts-2d branch from 66cd4b8 to 5b8f820 Compare June 6, 2024 16:30
@PhilipFackler PhilipFackler marked this pull request as ready for review June 6, 2024 16:30
@PhilipFackler
Copy link
Collaborator Author

@williamfgc I added functors to change the behavior of getting i and j for the kernel function. This solves the problem for my case. However it would be nice to implement a BlockIndexer that used sub-ranges for when the problem size would cause the number of blocks to exceed the maximum.

@PhilipFackler
Copy link
Collaborator Author

@williamfgc I believe this is ready to merge. You'll want to request all the tests again.

@PhilipFackler
Copy link
Collaborator Author

@williamfgc can you merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants