Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making check for output match in original types. It saves some memory. #135

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

maleksan85
Copy link

for cards with small memory and models with big gemms, there might no be extra memory to convert outputs to f32 from fp16 for instance. Every conversion is happening as copy. So triggers OOM on Navi. This change fixes the problem

@maleksan85 maleksan85 requested a review from gshtras August 14, 2024 18:06
@maleksan85 maleksan85 self-assigned this Aug 14, 2024
@maleksan85 maleksan85 merged commit 4132cbe into main Aug 14, 2024
13 checks passed
@maleksan85 maleksan85 deleted the gemm_tunner_memory_usage_umprovement branch August 16, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants