Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) #6795

Closed
wants to merge 7 commits into from

Conversation

hanishkvc
Copy link
Contributor

@hanishkvc hanishkvc commented Apr 20, 2024

Currently the interactive mode of main doesnt add any tags to identify system or user messages to the model, by default.

One will have to either

  • use the seperate chatml mode to specifically work with chatml supported models.
  • or pass in-prefix, in-suffix and reverse-prompt arguments as required to try and match the required chatting template.

This PR tries to add a generic chat mode to main, which can make use of any chat templates already added to llama_chat_apply_template_internal, which is currently used by server logic, but not main logic.

To help with the same a new chaton.hpp file is added to common, which contains

  • llama_chat_apply_template_simple, which is a wrapper around llama_chat_apply_template(inturn internal) of lama.cpp
  • llama_chat_reverse_prompt which helps add any needed reverse prompts for the requested template standard

To add new chat handshake templates remember to add needed logic to

  • llama_chat_apply_template_internal (llama.cpp) and
  • llama_chat_reverse_prompt (common/chaton.hpp)

To use this support pass -i and --chaton TEMPLATE_ID to main.
Currently supported templates is chatml and llama2, for other chat handshake template standards already support by chat_apply_template_internal, suitable reverse prompts need to be added to llama_chat_reverse_prompt.

user needs to pass --chaton TEMPLATE_ID

TEMPLATE_ID will be one of the predefined chat templates already
in llama.cpp's llama_chat_apply_template_internal and related
like chatml, llama2, llama3, ...
Helper to return reverse prompts needed for a given chat template

A wrapper that will allow wrapping a given message within a tagged
chat template based on the role and chat template specified.
Glanced through existing interactive and chatml flow, to incorporate
this flow. Need to look deeper later.

NOTE: Till this point is reapplying of my initial go at chaton, by
simplifying the amount of change done to existing code, a bitmore.
This is a commit with dbug messages.

ChatApplyTemplateSimple

* wasnt handling unknown template ids properly, this is identified
now and a warning logged, rather than trying to work with len of -1.
Need to change to quit later.

* Also avoid wrapping in a vector, as only a single message can
be tagged wrt chat handshake template.

ReversePrompt

Add support for llama2
Cleanup the associated log messages.

Dont overload the return for status as well as data. Now the data
returned if any is kept independent of the status of the operation.

On failure log a message and exit.
Avoid the use of the seperate vector, which inturn is copied to
the main vector on return. Now directly pass the main reverse
prompt vector and inturn directly add to passed vector.

Also keep data and return status seperate. Explicitly identify
a unknown template_id situation and return failure status.
@hanishkvc hanishkvc changed the title Update main's interactive mode to use the chat templates support already there for server Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) Apr 20, 2024
@hanishkvc
Copy link
Contributor Author

hanishkvc commented Apr 20, 2024

Adding this attached patch to this PR, allows me to chat with llama3 also using main -i --chaton llama3

llamacpp-llama3-exp-v1.patch

@DifferentialityDevelopment
Copy link
Contributor

DifferentialityDevelopment commented Apr 21, 2024

This sounds like an excellent and much needed addition to main.
Did you add a flag for specifying the system roles message?

@ngxson
Copy link
Collaborator

ngxson commented Apr 21, 2024

I've made a detailed research on the same subject, so I strongly recommend you to refer to this issue: #6391

Also, a new function named llama_token_is_eog will be introduced with llama3 in the other PR, so it's better to wait

@hanishkvc
Copy link
Contributor Author

This sounds like an excellent and much needed addition to main. Did you add a flag for specifying the system roles message?

In interactive mode (ie -i) any prompt file (-f) or prompt (-p) passed using the command line argument is treated as a system prompt and inturn this PR formats it to match the system prompt template expected.

@arch-btw
Copy link
Contributor

arch-btw commented Apr 21, 2024

Here's the patch running llama3 with --verbose-prompt. I think there might be too many new lines?

main: prompt: '<|start_header_id|>system<|end_header_id|>
You are an assistant
<|eot_id|>

'
main: number of tokens in prompt = 11
128006 -> ''
  9125 -> 'system'
128007 -> ''
   198 -> '
'
  2675 -> 'You'
   527 -> ' are'
   459 -> ' an'
 18328 -> ' assistant'
   198 -> '
'
128009 -> ''
   271 -> '

'
main: static prompt based on n_keep: 'system
You are an assistant


'

main: interactive mode on.
Reverse prompt: '<|eot_id|>'
128009 -> ''

Without --verbose-prompt:

system
You are an assistant



> 

@hanishkvc
Copy link
Contributor Author

There is a new PR, which is again a experiment which tries to use a simple minded json file to try and drive the logic, so that many aspects can be controlled by editing the json file, rather than needed to update the code.

#6834

@hanishkvc hanishkvc closed this Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants