Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread throttling profile for SGEMM on NEOVERSEV1 #5071

Merged
merged 1 commit into from
Jan 23, 2025

Conversation

annop-w
Copy link
Contributor

@annop-w annop-w commented Jan 13, 2025

This PR introduces a new thread throttling profile for SGEMM on NEOVERSEV1 CPU core.
Benchmarking for M=N=K=[2, 1024] shows a geometric mean speedup of ~1.1x.

@annop-w
Copy link
Contributor Author

annop-w commented Jan 16, 2025

@martin-frbg Could I please get a review for this patch ?

@martin-frbg
Copy link
Collaborator

As soon as I can concentrate on it. Would you happen to have any benchmark numbers available ? The idea of using varying thread counts has been floated a few times in the past, but the added logic seemed to outweigh gains on x86_64 at least

@annop-w
Copy link
Contributor Author

annop-w commented Jan 16, 2025

@martin-frbg Thank you for your comment. As mentioned in the description, the geo. mean speedup is ~1.1 for M=N=K up to 1024. Here is a figure that shows the data on NEOVERSEV1 with 64 cores.
sgemm_v1

@martin-frbg
Copy link
Collaborator

Hmm, I don't really like the added complexity with yet another "hidden" table of parameters (I assume the V2 would need different thresholds), but what's not to like about the added performance...

@annop-w
Copy link
Contributor Author

annop-w commented Jan 20, 2025

Yes, V2 would need another set of parameters. I think the complexity is inevitable if we want optimal performance. We could try to somehow organize these tuning parameters, e.g. into files/folders. I am open to suggestions.

@martin-frbg martin-frbg added this to the 0.3.30 milestone Jan 23, 2025
@martin-frbg
Copy link
Collaborator

Merging to give it more exposure, but I think ideally the table part should be in (or closer to) param.h

@martin-frbg martin-frbg merged commit a54f9a9 into OpenMathLib:develop Jan 23, 2025
75 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants