Skip to content

Conversation

alubbe
Copy link
Contributor

@alubbe alubbe commented Aug 28, 2025

This PR is bringing over the --carry-initial-prompt flag from the python library (openai/whisper#2343)

By default, an --prompt (initial prompt) is only used for the first decoding window; subsequent windows rely on the text generated so far for continuity. When you pass --carry-initial-prompt, the initial prompt tokens are explicitly prepended to every internal decode window. This mirrors the Python reference implementation's carry_initial_prompt behavior and can help enforce custom vocabulary or style throughout long transcriptions. Trade‑off: it may slightly reduce the model's ability to adapt dynamically to newly generated context (can increase risk of repetitions if the prompt is long). If the combined size of the carried initial prompt and the rolling context exceeds half the model text context, the leftmost (oldest) part of the initial prompt is truncated to fit.

Copy link
Collaborator

@KitaitiMakoto KitaitiMakoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patches for Ruby are nice, though I'm not sure the essential changes and API are accepted. Let me point just a thing.

@alubbe
Copy link
Contributor Author

alubbe commented Sep 8, 2025

Changes applied - let me know what you think of this PR

@alubbe alubbe requested a review from KitaitiMakoto September 8, 2025 11:02
@alubbe
Copy link
Contributor Author

alubbe commented Sep 25, 2025

@ggerganov could I ask you to review this PR?

@alubbe
Copy link
Contributor Author

alubbe commented Oct 7, 2025

any update on this? Happy to resolve the conflict in cli.cpp if I can get a general feeling of whether you're interested in merging this functionality or not. We've been using this branch for weeks now and have observed a much improved transcription from whisper when it comes to unusual names (people, places, companies, etc.) from carrying the initial prompt.

@ggerganov
Copy link
Member

Hi, apologies for the long wait. I'm interested in adding this functionality, but I am having difficulty following the implemented logic for prepending the initial prompt. Would like to see this simplified in some way. I'll try to add some suggestions how to improve it.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main complexity comes from using a single prompt_past vector in the whisper_state which results in some convoluted logic for deduplicating and slicing the tokens.

I expect that the logic can become much simpler if you replace prompt_past with 2 vectors: prompt_past0 and prompt_past1. The full prompt is a concatenation of prompt_past0 + prompt_past1. The prompt_past0 can be utilized to store some static prefix - i.e. the original prompt that is being carried.

@alubbe
Copy link
Contributor Author

alubbe commented Oct 8, 2025

That's a good point. I tried taking a bit further and simplifying it as much as I could - what do you think?

std::vector<whisper_token> prompt_tokens;

// initial prompt
if (!params.prompt_tokens && params.initial_prompt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the comment:

Suggested change
if (!params.prompt_tokens && params.initial_prompt) {
// tokenize the initial prompt
if (!params.prompt_tokens && params.initial_prompt) {

src/whisper.cpp Outdated
prompt_past.push_back(params.prompt_tokens[i]);
if (params.carry_initial_prompt) {
if (prompt_past0.empty()) {
prompt_past0.insert(prompt_past0.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be simply:

Suggested change
prompt_past0.insert(prompt_past0.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens);
prompt_past0 = params.prompt_tokens;

src/whisper.cpp Outdated
prompt_past0.insert(prompt_past0.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens);
}
} else {
prompt_past1.insert(prompt_past1.end(), params.prompt_tokens, params.prompt_tokens + params.prompt_n_tokens);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to use the original std::rotate implementation for efficieny.

src/whisper.cpp Outdated
std::vector<beam_candidate> beam_candidates;

// main loop
bool first_history_iter = true; // track first decode iteration for carry_initial_prompt logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add this var - use seek == seek_start instead.

src/whisper.cpp Outdated
if (!params.carry_initial_prompt) {
prompt_past1.clear();
if (!prompt.empty() && prompt.front() == whisper_token_prev(ctx)) {
auto start_it = prompt.begin() + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this intermediate var - keep the original insert call

src/whisper.cpp Outdated
const bool have_dynamic = !prompt_past1.empty();
const bool can_carry_static = params.carry_initial_prompt && !prompt_past0.empty() && !first_history_iter;

if (have_dynamic || can_carry_static) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This outer if can be removed.

@alubbe
Copy link
Contributor Author

alubbe commented Oct 8, 2025

Pushed PR fixes - let me know what you think

src/whisper.cpp Outdated
Comment on lines 7585 to 7590
if (!params.carry_initial_prompt) {
prompt_past1.clear();
if (!prompt.empty() && prompt.front() == whisper_token_prev(ctx)) {
prompt_past1.insert(prompt_past1.end(), prompt.begin() + 1, prompt.end() - prompt_init.size());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to clarify why this update of prompt_past1 is not needed when carry_initial_prompt == true - the reason is not obvious to me.

My expectation was that prompt_past1 would behave the same way as the old prompt_past, regardless of the carry_initial_prompt value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, that was leftover from my earlier experimentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants