-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
whisper : calculate mel spectrogram directly into a ggml_tensor #2208
Conversation
whisper.cpp
Outdated
if (mel_only) { | ||
ggml_set_output(mel); | ||
ggml_build_forward_expand(gf, mel); | ||
ggml_free(ctx0); | ||
return gf; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the external encoder logic because the wstate.embd_enc
tensor is not initialized.
I think mel_only
is not needed at all:
diff --git a/whisper.cpp b/whisper.cpp
index b8fa77d..e0e6719 100644
--- a/whisper.cpp
+++ b/whisper.cpp
@@ -1815,8 +1815,7 @@ static bool whisper_encode_external(const whisper_state & wstate) {
static struct ggml_cgraph * whisper_build_graph_conv(
whisper_context & wctx,
whisper_state & wstate,
- const int mel_offset,
- bool mel_only) {
+ const int mel_offset) {
const auto & model = wctx.model;
const auto & hparams = model.hparams;
@@ -1861,13 +1860,6 @@ static struct ggml_cgraph * whisper_build_graph_conv(
mel = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, 2 * n_ctx, n_mels);
}
- if (mel_only) {
- ggml_set_output(mel);
- ggml_build_forward_expand(gf, mel);
- ggml_free(ctx0);
- return gf;
- }
-
struct ggml_tensor * cur = nullptr;
if (!whisper_encode_external(wstate)) {
@@ -2248,9 +2240,7 @@ static bool whisper_encode_internal(
{
auto & alloc = wstate.alloc_conv.alloc;
- bool encode_external = whisper_encode_external(wstate);
-
- ggml_cgraph * gf = whisper_build_graph_conv(wctx, wstate, mel_offset, encode_external);
+ ggml_cgraph * gf = whisper_build_graph_conv(wctx, wstate, mel_offset);
if (!ggml_gallocr_alloc_graph(alloc, gf)) {
// should never happen as we pre-allocate the memory
@@ -2261,7 +2251,7 @@ static bool whisper_encode_internal(
return false;
}
- if (encode_external) {
+ if (whisper_encode_external(wstate)) {
ggml_tensor * mel = gf->nodes[gf->n_nodes - 1];
assert(mel->ne[1] == wctx.model.hparams.n_mels);
GGML_UNUSED(mel);
@@ -3427,7 +3417,7 @@ struct whisper_state * whisper_init_state(whisper_context * ctx) {
{
bool ok = whisper_allocr_graph_init(state->alloc_conv, ctx->backend,
[&]() {
- return whisper_build_graph_conv(*ctx, *state, 0, false);
+ return whisper_build_graph_conv(*ctx, *state, 0);
});
if (!ok) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it wasn't initialized anyway. In the previous code the conv graph was never executed. It was only used to allocate the "mel"
tensor which was manually filled from the std::vector data
then propagated to the external encoders:
Lines 2255 to 2264 in ffef323
if (!whisper_encode_external(wstate)) { | |
if (!ggml_graph_compute_helper(wstate.backend, gf, n_threads)) { | |
return false; | |
} | |
} else { | |
#if defined(WHISPER_USE_COREML) | |
whisper_coreml_encode(wstate.ctx_coreml, mel->ne[0], mel->ne[1], (float *) mel->data, (float *) wstate.embd_enc->data); | |
#elif defined(WHISPER_USE_OPENVINO) | |
whisper_openvino_encode(wstate.ctx_openvino, mel, wstate.embd_enc); | |
#endif |
With the PR there is no data
vector. Instead mel_only
is used to "break" the graph after the mel tensor correctly constructed (as a view of the input mels tensor which is either padded or made contiguous). Then the graph is executed to only produce this vector, which is then propagated to the external encoders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait. No. I got it :)
Will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix was pushed
but yes, the key part is that now the graph must be executed in order to produce mel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will push a follow-up PR with a few style updates + some fixes of using the backend
instances in whisper_context
and whisper_state
, not related to the changes here
…ganov#2208) * whisper : calculate mel spectrogram directly into a ggml_tensor * whisper : remove unused temp buffer from state * whisper : fix not initializing wstate.embd_enc
…ganov#2208) * whisper : calculate mel spectrogram directly into a ggml_tensor * whisper : remove unused temp buffer from state * whisper : fix not initializing wstate.embd_enc
When calculating the mel spectrogram directly write the result into a
ggml_tensor
.On CUDA this saves a device-to-host
cudaMemcpy
for the entire spectrogram which further improves the CUDA mel spectrogram perf by a factor of about 2.In most cases with CUDA this computation now takes about 1ms (as opposed to 2ms)