-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel optimized for A100 #25
Comments
Hi @lisuying214 , Thanks for your question! The kernels in Atom are specifically optimized for Ada GPUs. Its performance on A100 will degrade a lot due to A100's poor CUDA core throughput. I suppose optimizing the dequantization process in Atom kernel will be crucial for A100 performance. Please refer to this recent work to see A100 evaluations (https://arxiv.org/pdf/2405.04532) |
@happierpig |
As ref in
|
Thank you for the great work and experiment. We want to test the throughput on A100 with bachsize=16. Do you try kernel optimized for A100, or what can I refer?
The text was updated successfully, but these errors were encountered: