Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix large (>512 elements) ecntt issue #553

Merged
merged 1 commit into from
Jul 4, 2024

Conversation

yshekel
Copy link
Collaborator

@yshekel yshekel commented Jul 4, 2024

This PR solves an issue for large ecntt where cuda blocks are too large and cannot be assigned to SMs. The fix is to reduce thread count per block and increase block count in that case.

The issue is that cuda blocks are too large and cannot be placed on SMs.
Fix is to reduce thread count per block and increase block count.
Copy link
Contributor

@HadarIngonyama HadarIngonyama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine but why not use 128 for all cases? I'm not aware of any benefit of using large blocks.

@yshekel
Copy link
Collaborator Author

yshekel commented Jul 4, 2024

looks fine but why not use 128 for all cases? I'm not aware of any benefit of using large blocks.

I also was wondering that, but I assumed maybe it performed better so that's why it as done like that. Also I would have to measure and see that I did not degrade anything so wanted to avoid that right now.

@jeremyfelder jeremyfelder merged commit 5516320 into main Jul 4, 2024
26 checks passed
@jeremyfelder jeremyfelder deleted the yshekel/fix/large_ecntt_inv branch July 4, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants