-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Add support for --carry-initial-prompt #3395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patches for Ruby are nice, though I'm not sure the essential changes and API are accepted. Let me point just a thing.
Changes applied - let me know what you think of this PR |
@ggerganov could I ask you to review this PR? |
any update on this? Happy to resolve the conflict in |
Hi, apologies for the long wait. I'm interested in adding this functionality, but I am having difficulty following the implemented logic for prepending the initial prompt. Would like to see this simplified in some way. I'll try to add some suggestions how to improve it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main complexity comes from using a single prompt_past
vector in the whisper_state
which results in some convoluted logic for deduplicating and slicing the tokens.
I expect that the logic can become much simpler if you replace prompt_past
with 2 vectors: prompt_past0
and prompt_past1
. The full prompt is a concatenation of prompt_past0 + prompt_past1
. The prompt_past0
can be utilized to store some static prefix - i.e. the original prompt that is being carried.
That's a good point. I tried taking a bit further and simplifying it as much as I could - what do you think? |
std::vector<whisper_token> prompt_tokens; | ||
|
||
// initial prompt | ||
if (!params.prompt_tokens && params.initial_prompt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the comment:
if (!params.prompt_tokens && params.initial_prompt) { | |
// tokenize the initial prompt | |
if (!params.prompt_tokens && params.initial_prompt) { |
src/whisper.cpp
Outdated
prompt_past.push_back(params.prompt_tokens[i]); | ||
if (params.carry_initial_prompt) { | ||
if (prompt_past0.empty()) { | ||
prompt_past0.insert(prompt_past0.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be simply:
prompt_past0.insert(prompt_past0.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens); | |
prompt_past0 = params.prompt_tokens; |
src/whisper.cpp
Outdated
prompt_past0.insert(prompt_past0.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens); | ||
} | ||
} else { | ||
prompt_past1.insert(prompt_past1.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want to use the original std::rotate
implementation for efficieny.
src/whisper.cpp
Outdated
std::vector<beam_candidate> beam_candidates; | ||
|
||
// main loop | ||
bool first_history_iter = true; // track first decode iteration for carry_initial_prompt logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add this var - use seek == seek_start
instead.
src/whisper.cpp
Outdated
if (!params.carry_initial_prompt) { | ||
prompt_past1.clear(); | ||
if (!prompt.empty() && prompt.front() == whisper_token_prev(ctx)) { | ||
auto start_it = prompt.begin() + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this intermediate var - keep the original insert
call
src/whisper.cpp
Outdated
const bool have_dynamic = !prompt_past1.empty(); | ||
const bool can_carry_static = params.carry_initial_prompt && !prompt_past0.empty() && !first_history_iter; | ||
|
||
if (have_dynamic || can_carry_static) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This outer if
can be removed.
Pushed PR fixes - let me know what you think |
src/whisper.cpp
Outdated
if (!params.carry_initial_prompt) { | ||
prompt_past1.clear(); | ||
if (!prompt.empty() && prompt.front() == whisper_token_prev(ctx)) { | ||
prompt_past1.insert(prompt_past1.end(), prompt.begin() + 1, prompt.end() - prompt_init.size()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to clarify why this update of prompt_past1
is not needed when carry_initial_prompt == true
- the reason is not obvious to me.
My expectation was that prompt_past1
would behave the same way as the old prompt_past
, regardless of the carry_initial_prompt
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, that was leftover from my earlier experimentation
This PR is bringing over the
--carry-initial-prompt
flag from the python library (openai/whisper#2343)By default, an
--prompt
(initial prompt) is only used for the first decoding window; subsequent windows rely on the text generated so far for continuity. When you pass--carry-initial-prompt
, the initial prompt tokens are explicitly prepended to every internal decode window. This mirrors the Python reference implementation'scarry_initial_prompt
behavior and can help enforce custom vocabulary or style throughout long transcriptions. Trade‑off: it may slightly reduce the model's ability to adapt dynamically to newly generated context (can increase risk of repetitions if the prompt is long). If the combined size of the carried initial prompt and the rolling context exceeds half the model text context, the leftmost (oldest) part of the initial prompt is truncated to fit.