Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #876 #1012

Merged
merged 1 commit into from
Jun 25, 2023
Merged

Conversation

burningion
Copy link
Contributor

For the following issue:

#876 (comment)

GPU support won't build without the change from native to any. Confirmed working with a 4090.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that all does not reduce the performance compared to native ?

@ggerganov ggerganov merged commit 207a12f into ggerganov:master Jun 25, 2023
@byte-6174
Copy link
Contributor

It most certainly would in some sense. For example when I compile with all the binary is 1.5MB vs. 812KB when I give the exact architecture I am compiling for.
The exact architecture is the best way to make sure you have the most optimized code for it, as it does not create the intermediate ptx code. With option all, nvcc would generate the ptx and postpone compilation at runtime. Nvidia notes this would lead to app startup delays. It could be addressed by caching, however.

@byte-6174
Copy link
Contributor

for example I just ran w a binary the load time for a model (models/ggml-base.en.bin) was:

-gpu-architecture=all:
whisper_print_timings: load time = 4452.92 ms

vs
-gpu-architecture=sm_72:
whisper_print_timings: load time = 224.77 ms

@ggerganov
Copy link
Owner

So, is it worth switching to all?
Maybe have native by default and only use all upon setting certain build option

@byte-6174
Copy link
Contributor

yes, cuda optimizations are highly specific to the hardware. all will allow compilation without errors, but with suboptimal binary.
whereas native would work sometimes based on arch. / cuda version, but when it works will produce the best binary 😄

In that regard, I like the above suggestion. However, I'm reading in other comments though for some arch/cuda combos neither works :( and one has to specify the exact arch like sm_72 etc.
So there is some finetuning of the makefile when it comes to cuda.

@FerLuisxd
Copy link

FerLuisxd commented Jul 8, 2023

It doesn't work on 2080 and version 12.1
See #1082

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants