whisper : fix usage of extenral encoders (e.g. CoreML) #1859

ggerganov · 2024-02-12T13:25:27Z

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
./models/generate-coreml-model.sh base.en

# build and with CoreML support
WHISPER_COREML=1 make samples
WHISPER_COREML=1 make -j && ./main -m ./models/ggml-base.en.bin -f samples/gb0.wav

slaren · 2024-02-12T13:29:14Z

whisper.cpp

+        // TODO: without this op, the "embd_enc" tensor ends up being not allocated
+        //       is there a better fix?
+        cur = ggml_scale(ctx0, cur, 1.0f);


Only the tensors used in the nodes of the graph are allocated, but I don't think this is new. Did it work before alloc v3?

I guess before it worked because of the explicit ggml_allocr_alloc calls

ggerganov · 2024-02-12T13:31:12Z

whisper.cpp

+        // keep the "mel" tensor alive - we will use it to store the input data for the external encoders
+        // TODO: is there a better way to do this
+        mel = ggml_scale(ctx0, mel, 1.0f);
+        ggml_build_forward_expand(gf, mel);


@slaren Do you have suggestions how to improve this? Without using an explicit op (such as ggml_scale, the input mel and embd_enc tensors will not get allocated by the allocator

The alternative is to not use ggml tensors at all when using an external encoder, which makes sense overall. But before I reimplement it, was wondering if there is a neat way to improve on this

As a workaround, you can add this directly to the graph nodes as a GGML_OP_NONE, ie gf->nodes[gf->n_nodes++] = mel. At least, this will avoid wasting compute on a useless op.

The problem, as mentioned above, is that only the nodes of the graph are considered in ggml-alloc. I have thought of making all the leafs in the graph automatically inputs - this would ensure that they are allocated at the beginning of the graph, and if they are never used they will never be freed, so they will also be suitable as an output or static tensor. My reasoning is that the only reason to use ggml_new_tensor on a graph is to create an input tensor, since there are better ways to do everything else that in the past required creating new tensors (ie. use ggml_cont and ggml_cast instead of ggml_cpy with a new tensor). But I am worried that it will waste memory in code that is not written very carefully. What do you think?

Thinking about this more, I am inclined to favor ease of use over a minor memory usage optimization. It can be very surprising that a tensor added to the graph ends not being allocated, and figuring the reason of that requires knowledge of the internals of ggml-alloc. So I think it would be a good idea to simply allocate all the leafs at the beginning of the graphs as if they were inputs. If this causes increased memory usage, it is always possible to optimize this by removing the calls to ggml_new_tensor from the graph, but it will avoid surprises in other cases.

If it is not a big change, we should probably do it. I'm trying to rework the implementation to not rely on leaf tensors, but I think it would be helpful to have these always allocated - mostly for ease of use

ggerganov/ggml#731

Will give this a try now

ggerganov · 2024-02-12T17:12:17Z

superseded by #1860

whisper : fix usage of extenral encoders (e.g. CoreML)

74c260f

slaren reviewed Feb 12, 2024

View reviewed changes

ggerganov commented Feb 12, 2024

View reviewed changes

whisper : alternative way to handle the external encoders

f25edad

ggerganov closed this Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : fix usage of extenral encoders (e.g. CoreML) #1859

whisper : fix usage of extenral encoders (e.g. CoreML) #1859

ggerganov commented Feb 12, 2024

slaren Feb 12, 2024

ggerganov Feb 12, 2024

ggerganov Feb 12, 2024

slaren Feb 12, 2024

slaren Feb 12, 2024 •

edited

Loading

ggerganov Feb 12, 2024

slaren Feb 12, 2024

ggerganov Feb 12, 2024

ggerganov commented Feb 12, 2024

whisper : fix usage of extenral encoders (e.g. CoreML) #1859

whisper : fix usage of extenral encoders (e.g. CoreML) #1859

Conversation

ggerganov commented Feb 12, 2024

slaren Feb 12, 2024

Choose a reason for hiding this comment

ggerganov Feb 12, 2024

Choose a reason for hiding this comment

ggerganov Feb 12, 2024

Choose a reason for hiding this comment

slaren Feb 12, 2024

Choose a reason for hiding this comment

slaren Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

ggerganov Feb 12, 2024

Choose a reason for hiding this comment

slaren Feb 12, 2024

Choose a reason for hiding this comment

ggerganov Feb 12, 2024

Choose a reason for hiding this comment

ggerganov commented Feb 12, 2024

slaren Feb 12, 2024 •

edited

Loading