-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
whisper : fix usage of extenral encoders (e.g. CoreML) #1859
Conversation
ggerganov
commented
Feb 12, 2024
whisper.cpp
Outdated
// TODO: without this op, the "embd_enc" tensor ends up being not allocated | ||
// is there a better fix? | ||
cur = ggml_scale(ctx0, cur, 1.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the tensors used in the nodes of the graph are allocated, but I don't think this is new. Did it work before alloc v3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess before it worked because of the explicit ggml_allocr_alloc
calls
whisper.cpp
Outdated
// keep the "mel" tensor alive - we will use it to store the input data for the external encoders | ||
// TODO: is there a better way to do this | ||
mel = ggml_scale(ctx0, mel, 1.0f); | ||
ggml_build_forward_expand(gf, mel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slaren Do you have suggestions how to improve this? Without using an explicit op (such as ggml_scale
, the input mel
and embd_enc
tensors will not get allocated by the allocator
The alternative is to not use ggml
tensors at all when using an external encoder, which makes sense overall. But before I reimplement it, was wondering if there is a neat way to improve on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a workaround, you can add this directly to the graph nodes as a GGML_OP_NONE
, ie gf->nodes[gf->n_nodes++] = mel
. At least, this will avoid wasting compute on a useless op.
The problem, as mentioned above, is that only the nodes of the graph are considered in ggml-alloc
. I have thought of making all the leafs in the graph automatically inputs - this would ensure that they are allocated at the beginning of the graph, and if they are never used they will never be freed, so they will also be suitable as an output or static tensor. My reasoning is that the only reason to use ggml_new_tensor
on a graph is to create an input tensor, since there are better ways to do everything else that in the past required creating new tensors (ie. use ggml_cont
and ggml_cast
instead of ggml_cpy
with a new tensor). But I am worried that it will waste memory in code that is not written very carefully. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I am inclined to favor ease of use over a minor memory usage optimization. It can be very surprising that a tensor added to the graph ends not being allocated, and figuring the reason of that requires knowledge of the internals of ggml-alloc. So I think it would be a good idea to simply allocate all the leafs at the beginning of the graphs as if they were inputs. If this causes increased memory usage, it is always possible to optimize this by removing the calls to ggml_new_tensor
from the graph, but it will avoid surprises in other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not a big change, we should probably do it. I'm trying to rework the implementation to not rely on leaf tensors, but I think it would be helpful to have these always allocated - mostly for ease of use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will give this a try now
superseded by #1860 |