Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make reverse prompt option act as a stop token in non-interactive sce… #1032

Merged
merged 7 commits into from
May 19, 2023
4 changes: 2 additions & 2 deletions examples/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -368,8 +368,8 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
fprintf(stderr, " -ins, --instruct run in instruction mode (use with Alpaca models)\n");
fprintf(stderr, " --multiline-input allows you to write or paste multiple lines without ending each in '\\'\n");
fprintf(stderr, " -r PROMPT, --reverse-prompt PROMPT\n");
fprintf(stderr, " run in interactive mode and poll user input upon seeing PROMPT (can be\n");
fprintf(stderr, " specified more than once for multiple prompts).\n");
fprintf(stderr, " halt generation at PROMPT, return control in interactive mode\n");
fprintf(stderr, " (can be specified more than once for multiple prompts).\n");
fprintf(stderr, " --color colorise output to distinguish prompt and user input from generations\n");
fprintf(stderr, " -s SEED, --seed SEED RNG seed (default: -1, use random seed for < 0)\n");
fprintf(stderr, " -t N, --threads N number of threads to use during computation (default: %d)\n", params.n_threads);
Expand Down
26 changes: 18 additions & 8 deletions examples/main/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,8 @@ int main(int argc, char ** argv) {
params.antiprompt.push_back("### Instruction:\n\n");
}

// enable interactive mode if reverse prompt or interactive start is specified
if (params.antiprompt.size() != 0 || params.interactive_first) {
// enable interactive mode if interactive start is specified
if (params.interactive_start) {
params.interactive = true;
}

Expand Down Expand Up @@ -306,7 +306,7 @@ int main(int argc, char ** argv) {

std::vector<llama_token> embd;

while (n_remain != 0 || params.interactive) {
while ((n_remain != 0 && !is_antiprompt) || params.interactive) {
// predict
if (embd.size() > 0) {
// infinite text generation via context swapping
Expand Down Expand Up @@ -504,9 +504,8 @@ int main(int argc, char ** argv) {
console_set_color(con_st, CONSOLE_COLOR_DEFAULT);
}

// in interactive mode, and not currently processing queued inputs;
// check if we should prompt the user for more
if (params.interactive && (int) embd_inp.size() <= n_consumed) {
// if not currently processing queued inputs;
if ((int) embd_inp.size() <= n_consumed) {

// check for reverse prompt
if (params.antiprompt.size()) {
Expand All @@ -517,10 +516,21 @@ int main(int argc, char ** argv) {

is_antiprompt = false;
// Check if each of the reverse prompts appears at the end of the output.
// If we're not running interactively, the reverse prompt might be tokenized with some following characters
// so we'll compensate for that by widening the search window a bit.
for (std::string & antiprompt : params.antiprompt) {
if (last_output.find(antiprompt.c_str(), last_output.length() - antiprompt.length(), antiprompt.length()) != std::string::npos) {
is_interacting = true;
size_t extra_padding = params.interactive ? 0 : 2;
size_t search_start_pos = last_output.length() > static_cast<size_t>(antiprompt.length() + extra_padding)
? last_output.length() - static_cast<size_t>(antiprompt.length() + extra_padding)
: 0;

if (last_output.find(antiprompt.c_str(), search_start_pos) != std::string::npos) {
if (params.interactive) {
is_interacting = true;
set_console_color(con_st, CONSOLE_COLOR_USER_INPUT);
}
Copy link
Contributor

@DannyDaemonic DannyDaemonic May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in this PR causes this issue. The problem this is trying to fix is the one that pops up when people end their reverse prompts with a space. If you remove the trailing spaces from your reverse prompt, it should work. There's been talk of adding a warning to the program when users include a trailing space. I'd suggest we just remove it.

Suggested change
size_t extra_padding = params.interactive ? 0 : 2;
size_t search_start_pos = last_output.length() > static_cast<size_t>(antiprompt.length() + extra_padding)
? last_output.length() - static_cast<size_t>(antiprompt.length() + extra_padding)
: 0;
if (last_output.find(antiprompt.c_str(), search_start_pos) != std::string::npos) {
if (params.interactive) {
is_interacting = true;
set_console_color(con_st, CONSOLE_COLOR_USER_INPUT);
}

Copy link
Contributor Author

@data-angel data-angel May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done. I've found this code is necessary in several scenarios. For example, if your stop token is chatML like "<|IM_END|>", it often is tokenized with trailing characters like "\", so if you don't do a fuzzy search for the stop token you miss it in the generated stream. In the non-interactive case, it's not a big deal to have extra trailing characters in the response as that can be easily handled by the code that's processing the response.

Copy link
Contributor

@DannyDaemonic DannyDaemonic May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a proper way to do this such that it works with both interactive mode and noninteractive mode that involves looking at the next token right when we first get it from llama_sample, before it's printed to the terminal or evaluated by llama_eval. It would be more efficient as well because you're skipping an eval. Of course I don't expect you to do that. It's far from a trivial fix and should be its own pull request anyway.

I guess what's holding me back from approving this is that we've made everyone else adjust their reverse prompts not to include the token that gets split. So we tell people to remove the trailing space ( ) from their reverse prompt, or in your case the equivalent suggestion would be to drop the right angle bracket (>) from your reverse prompt (it may take dropping both |>).

The decision to rigidly enforce the reverse prompt is from before I became a collaborator, so as much as I agree that reverse prompts should not automatically enable interactive mode, this isn't part of that. I'm going to have to wait for someone more senior than me to approve this particular part of the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your perspective, but I think it would be a mistake to require arcane tokenization knowledge to have people "properly" set a stop token. Better to just set it to the string you want to look for. I agree that the more correct answer is to wire the stop token detection at a deeper level, but as you pointed out that also comes with more risk and complexity. I think this is a fine compromise and the potential for negative impact seems minimal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split out this hunk into its own PR perhaps? It appears everything else here has consensus and could be landed. Even without this, non-interactive -r would be a net improvement and it would at worst be consistent with the current (buggy) behavior of -r.

Copy link
Contributor Author

@data-angel data-angel May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we could, but I feel like it's a pretty concise, contained, low risk change as it sits. Allowing for a more natural expression of the reverse prompt in the non-interactive case I think is far more accessible for users who aren't going to understand how their prompt string is tokenized exactly and more maintainable for those folks down the road. Imagine a case in which your reverse prompt is expressed as "<|IM_END" in your code - you'll need to comment why it's expressed like that as a quirk of the underlying platform so that someone doesn't change it to the more correct looking "<|IM_END|>" and be confused when that doesn't terminate the execution despite being clearly present in the output.

The way the code is currently written, it only changes it in the non-interactive case, which is pretty surgical. IMO the bigger issue with this PR is that it's a breaking change for people who are just using -r to trigger interactivity in their scripts, but I think that part is acceptable and folks seem to agree.

@DannyDaemonic - who should take a look at this to see if they're bothered by allowing reverse prompts that aren't precisely token aligned?

is_antiprompt = true;
fflush(stdout);
break;
}
}
Expand Down