Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: better "generate more" functionality #891

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

codedealer
Copy link
Contributor

@codedealer codedealer commented Apr 7, 2024

Reasoning:
Empirically, continuation of the already generated text is more consistent when the latest message is supplied as is. With that in mind in this PR we remove {{post}} from template when request is passed with kind continue.

Update
Our goal here is not to manipulate {{post}} per se but to cut the prompt as close to the last response in history as possible to let the model continue the generation.

New algorithm:

  1. Manipulate ast so that the history is the last node (if present)
  2. When rendering complex nodes (conditionals, iterables) cut strictly after the message was rendered

That means that stuff like ujb and whatever was written in validated/unvalidated presets after history (like modifiers) will be erased and it can affect the generation results. However, I have tested with different prompt formats (albeit with sub 1 temperatures) and this approach beats the state of art every time.

Known issues:
Currently the concatenation of the existing message and the newly generated response will insert either white space or a new line between the chunks (which expands the existing behavior that always inserts a space). In case where the existing message chunk cuts off mid-word this will cause a white space inserted in the middle of a word. However, the chances of that are small enough comparing to the potential workload to factor this case in, so we ignore it for now.

Additional information
Closes: #869
Discord thread

common/prompt.ts Outdated
@@ -503,7 +499,7 @@ function createPostPrompt(
>
) {
const post = []
post.push(`${opts.replyAs.name}:`)
opts.kind !== 'continue' && post.push(`${opts.replyAs.name}:`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use if here instead of JSX-ish inlining.

@sceuick
Copy link
Member

sceuick commented Apr 7, 2024

You probably just need to update the snapshots using pnpm snapshot.
I can't see anything obviously run with the snapshot failures. It'll be easier to see if they're okay once you push the updated snapshots.

@@ -139,6 +139,18 @@ export async function parseTemplate(
}

const ast = parser.parse(template, {}) as PNode[]

// hack: when we continue, remove the post along with the last newline from tree
if (opts.continue && ast.length > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In unvalidated prompts the {{post}} placeholder can technically be anywhere.

Copy link
Contributor Author

@codedealer codedealer Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I have reexamined my whole approach to this problem. Our goal here is not to manipulate {{post}} per se but to cut the prompt as close to the last response in history as possible to let the model continue the generation. I have rolled back my change to createPostMessage.

New algorithm:

  1. Manipulate ast so that the history is the last node (if present)
  2. When rendering complex nodes (conditionals, iterables) cut strictly after the message was rendered

That means that stuff like ujb and whatever was written in validated/unvalidated presets after history (like modifiers) will be erased and it can affect the generation results. However, I have tested with different prompt formats (albeit with sub 1 temperatures) and this approach beats the state of art every time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to avoid modifying the template-parser if we send the correct history/lines from the frontend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, realistically the parser shouldn't make any assumptions about the parts (including history) that it is passed so it shouldn't be modifying it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible. It works without any modification to the template if {{history}} is used. Then we can just cut the tree and it will be enough. The complexity arises from the usage of {{each msg}} because it allows users to modify how each individual line in history is rendered.

So, the line we pass from the front may be perfectly fine (e.g. Hello world) but upon render it will turn into Hello world\n\n. There could be any symbols at the end, not just new lines and after it's rendered it is impossible to determine whether they came from the template or from the message. This is a major cause of interference when it comes to continuing the message.

The only way I found to ensure that the prompt ends on what was sent by the bot is to change how history props are edited in template parser.

if (opts.continue && i === 0 && isHistory) break children_loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess an alternative would be to have a completely different "continue" template that is triggered in this particular case?

@codedealer
Copy link
Contributor Author

I tried another approach, that doesn't require editing the template parser, although I think there was nothing wrong with it and it worked better because it was more granular.

The new method modifies AST to always put {{history}} in the tree. Although this can interfere with the user's template in a substantial way, it ensures that generation always starts from where it left off.

Sentence concatenation also changed a bit with more reliance on LLM to compose it. Prior there was a discrepancy between the message on the client and the prompt where the prompt was always trimmed, now the client message is also trimmed so the LLM and the user see the same thing.

@codedealer codedealer requested a review from sceuick May 2, 2024 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The "Generate more" button does not work properly
2 participants