whisper : calculate mel spectrogram directly into a ggml_tensor #2208

iboB · 2024-06-04T08:05:39Z

When calculating the mel spectrogram directly write the result into a ggml_tensor.

On CUDA this saves a device-to-host cudaMemcpy for the entire spectrogram which further improves the CUDA mel spectrogram perf by a factor of about 2.

In most cases with CUDA this computation now takes about 1ms (as opposed to 2ms)

ggerganov · 2024-06-04T12:55:59Z

whisper.cpp

+    if (mel_only) {
+        ggml_set_output(mel);
+        ggml_build_forward_expand(gf, mel);
+        ggml_free(ctx0);
+        return gf;
+    }


This breaks the external encoder logic because the wstate.embd_enc tensor is not initialized.

I think mel_only is not needed at all:

diff --git a/whisper.cpp b/whisper.cpp index b8fa77d..e0e6719 100644 --- a/whisper.cpp +++ b/whisper.cpp @@ -1815,8 +1815,7 @@ static bool whisper_encode_external(const whisper_state & wstate) { static struct ggml_cgraph * whisper_build_graph_conv( whisper_context & wctx, whisper_state & wstate, - const int mel_offset, - bool mel_only) { + const int mel_offset) { const auto & model = wctx.model; const auto & hparams = model.hparams; @@ -1861,13 +1860,6 @@ static struct ggml_cgraph * whisper_build_graph_conv( mel = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, 2 * n_ctx, n_mels); } - if (mel_only) { - ggml_set_output(mel); - ggml_build_forward_expand(gf, mel); - ggml_free(ctx0); - return gf; - } - struct ggml_tensor * cur = nullptr; if (!whisper_encode_external(wstate)) { @@ -2248,9 +2240,7 @@ static bool whisper_encode_internal( { auto & alloc = wstate.alloc_conv.alloc; - bool encode_external = whisper_encode_external(wstate); - - ggml_cgraph * gf = whisper_build_graph_conv(wctx, wstate, mel_offset, encode_external); + ggml_cgraph * gf = whisper_build_graph_conv(wctx, wstate, mel_offset); if (!ggml_gallocr_alloc_graph(alloc, gf)) { // should never happen as we pre-allocate the memory @@ -2261,7 +2251,7 @@ static bool whisper_encode_internal( return false; } - if (encode_external) { + if (whisper_encode_external(wstate)) { ggml_tensor * mel = gf->nodes[gf->n_nodes - 1]; assert(mel->ne[1] == wctx.model.hparams.n_mels); GGML_UNUSED(mel); @@ -3427,7 +3417,7 @@ struct whisper_state * whisper_init_state(whisper_context * ctx) { { bool ok = whisper_allocr_graph_init(state->alloc_conv, ctx->backend, [&]() { - return whisper_build_graph_conv(*ctx, *state, 0, false); + return whisper_build_graph_conv(*ctx, *state, 0); }); if (!ok) {

I think it wasn't initialized anyway. In the previous code the conv graph was never executed. It was only used to allocate the "mel" tensor which was manually filled from the std::vector data then propagated to the external encoders:

whisper.cpp/whisper.cpp

Lines 2255 to 2264 in ffef323

if (!whisper_encode_external(wstate)) {

if (!ggml_graph_compute_helper(wstate.backend, gf, n_threads)) {

return false;

}

} else {

#if defined(WHISPER_USE_COREML)

whisper_coreml_encode(wstate.ctx_coreml, mel->ne[0], mel->ne[1], (float *) mel->data, (float *) wstate.embd_enc->data);

#elif defined(WHISPER_USE_OPENVINO)

whisper_openvino_encode(wstate.ctx_openvino, mel, wstate.embd_enc);

#endif

With the PR there is no data vector. Instead mel_only is used to "break" the graph after the mel tensor correctly constructed (as a view of the input mels tensor which is either padded or made contiguous). Then the graph is executed to only produce this vector, which is then propagated to the external encoders.

Oh, wait. No. I got it :)

Will fix

fix was pushed

but yes, the key part is that now the graph must be executed in order to produce mel

ggerganov

Will push a follow-up PR with a few style updates + some fixes of using the backend instances in whisper_context and whisper_state, not related to the changes here

…ganov#2208) * whisper : calculate mel spectrogram directly into a ggml_tensor * whisper : remove unused temp buffer from state * whisper : fix not initializing wstate.embd_enc

iboB added 2 commits June 4, 2024 10:56

whisper : calculate mel spectrogram directly into a ggml_tensor

8081378

whisper : remove unused temp buffer from state

4eebca5

ggerganov reviewed Jun 4, 2024

View reviewed changes

whisper : fix not initializing wstate.embd_enc

07d5108

ggerganov approved these changes Jun 6, 2024

View reviewed changes

ggerganov merged commit f842d31 into ggerganov:master Jun 6, 2024
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : calculate mel spectrogram directly into a ggml_tensor #2208

whisper : calculate mel spectrogram directly into a ggml_tensor #2208

iboB commented Jun 4, 2024 •

edited

Loading

ggerganov Jun 4, 2024

iboB Jun 4, 2024

iboB Jun 4, 2024

iboB Jun 4, 2024

ggerganov left a comment

	if (!whisper_encode_external(wstate)) {
	if (!ggml_graph_compute_helper(wstate.backend, gf, n_threads)) {
	return false;
	}
	} else {
	#if defined(WHISPER_USE_COREML)
	whisper_coreml_encode(wstate.ctx_coreml, mel->ne[0], mel->ne[1], (float ) mel->data, (float ) wstate.embd_enc->data);
	#elif defined(WHISPER_USE_OPENVINO)
	whisper_openvino_encode(wstate.ctx_openvino, mel, wstate.embd_enc);
	#endif

whisper : calculate mel spectrogram directly into a ggml_tensor #2208

whisper : calculate mel spectrogram directly into a ggml_tensor #2208

Conversation

iboB commented Jun 4, 2024 • edited Loading

ggerganov Jun 4, 2024

Choose a reason for hiding this comment

iboB Jun 4, 2024

Choose a reason for hiding this comment

iboB Jun 4, 2024

Choose a reason for hiding this comment

iboB Jun 4, 2024

Choose a reason for hiding this comment

ggerganov left a comment

Choose a reason for hiding this comment

iboB commented Jun 4, 2024 •

edited

Loading