Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : fix usage of extenral encoders (e.g. CoreML) #1859

Closed
wants to merge 2 commits into from

Conversation

ggerganov
Copy link
Owner

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
./models/generate-coreml-model.sh base.en

# build and with CoreML support
WHISPER_COREML=1 make samples
WHISPER_COREML=1 make -j && ./main -m ./models/ggml-base.en.bin -f samples/gb0.wav

whisper.cpp Outdated
Comment on lines 1672 to 1674
// TODO: without this op, the "embd_enc" tensor ends up being not allocated
// is there a better fix?
cur = ggml_scale(ctx0, cur, 1.0f);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the tensors used in the nodes of the graph are allocated, but I don't think this is new. Did it work before alloc v3?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess before it worked because of the explicit ggml_allocr_alloc calls

whisper.cpp Outdated
Comment on lines 1662 to 1665
// keep the "mel" tensor alive - we will use it to store the input data for the external encoders
// TODO: is there a better way to do this
mel = ggml_scale(ctx0, mel, 1.0f);
ggml_build_forward_expand(gf, mel);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slaren Do you have suggestions how to improve this? Without using an explicit op (such as ggml_scale, the input mel and embd_enc tensors will not get allocated by the allocator

The alternative is to not use ggml tensors at all when using an external encoder, which makes sense overall. But before I reimplement it, was wondering if there is a neat way to improve on this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a workaround, you can add this directly to the graph nodes as a GGML_OP_NONE, ie gf->nodes[gf->n_nodes++] = mel. At least, this will avoid wasting compute on a useless op.

The problem, as mentioned above, is that only the nodes of the graph are considered in ggml-alloc. I have thought of making all the leafs in the graph automatically inputs - this would ensure that they are allocated at the beginning of the graph, and if they are never used they will never be freed, so they will also be suitable as an output or static tensor. My reasoning is that the only reason to use ggml_new_tensor on a graph is to create an input tensor, since there are better ways to do everything else that in the past required creating new tensors (ie. use ggml_cont and ggml_cast instead of ggml_cpy with a new tensor). But I am worried that it will waste memory in code that is not written very carefully. What do you think?

Copy link
Collaborator

@slaren slaren Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, I am inclined to favor ease of use over a minor memory usage optimization. It can be very surprising that a tensor added to the graph ends not being allocated, and figuring the reason of that requires knowledge of the internals of ggml-alloc. So I think it would be a good idea to simply allocate all the leafs at the beginning of the graphs as if they were inputs. If this causes increased memory usage, it is always possible to optimize this by removing the calls to ggml_new_tensor from the graph, but it will avoid surprises in other cases.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not a big change, we should probably do it. I'm trying to rework the implementation to not rely on leaf tensors, but I think it would be helpful to have these always allocated - mostly for ease of use

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will give this a try now

@ggerganov
Copy link
Owner Author

superseded by #1860

@ggerganov ggerganov closed this Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants