Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) #6795

hanishkvc · 2024-04-20T18:26:53Z

Currently the interactive mode of main doesnt add any tags to identify system or user messages to the model, by default.

One will have to either

use the seperate chatml mode to specifically work with chatml supported models.
or pass in-prefix, in-suffix and reverse-prompt arguments as required to try and match the required chatting template.

This PR tries to add a generic chat mode to main, which can make use of any chat templates already added to llama_chat_apply_template_internal, which is currently used by server logic, but not main logic.

To help with the same a new chaton.hpp file is added to common, which contains

llama_chat_apply_template_simple, which is a wrapper around llama_chat_apply_template(inturn internal) of lama.cpp
llama_chat_reverse_prompt which helps add any needed reverse prompts for the requested template standard

To add new chat handshake templates remember to add needed logic to

llama_chat_apply_template_internal (llama.cpp) and
llama_chat_reverse_prompt (common/chaton.hpp)

To use this support pass -i and --chaton TEMPLATE_ID to main.
Currently supported templates is chatml and llama2, for other chat handshake template standards already support by chat_apply_template_internal, suitable reverse prompts need to be added to llama_chat_reverse_prompt.

user needs to pass --chaton TEMPLATE_ID TEMPLATE_ID will be one of the predefined chat templates already in llama.cpp's llama_chat_apply_template_internal and related like chatml, llama2, llama3, ...

Helper to return reverse prompts needed for a given chat template A wrapper that will allow wrapping a given message within a tagged chat template based on the role and chat template specified.

Glanced through existing interactive and chatml flow, to incorporate this flow. Need to look deeper later. NOTE: Till this point is reapplying of my initial go at chaton, by simplifying the amount of change done to existing code, a bitmore.

This is a commit with dbug messages. ChatApplyTemplateSimple * wasnt handling unknown template ids properly, this is identified now and a warning logged, rather than trying to work with len of -1. Need to change to quit later. * Also avoid wrapping in a vector, as only a single message can be tagged wrt chat handshake template. ReversePrompt Add support for llama2

Cleanup the associated log messages. Dont overload the return for status as well as data. Now the data returned if any is kept independent of the status of the operation. On failure log a message and exit.

Avoid the use of the seperate vector, which inturn is copied to the main vector on return. Now directly pass the main reverse prompt vector and inturn directly add to passed vector. Also keep data and return status seperate. Explicitly identify a unknown template_id situation and return failure status.

hanishkvc · 2024-04-20T19:11:53Z

Adding this attached patch to this PR, allows me to chat with llama3 also using main -i --chaton llama3

llamacpp-llama3-exp-v1.patch

DifferentialityDevelopment · 2024-04-21T06:34:00Z

This sounds like an excellent and much needed addition to main.
Did you add a flag for specifying the system roles message?

ngxson · 2024-04-21T09:21:05Z

I've made a detailed research on the same subject, so I strongly recommend you to refer to this issue: #6391

Also, a new function named llama_token_is_eog will be introduced with llama3 in the other PR, so it's better to wait

hanishkvc · 2024-04-21T16:14:25Z

This sounds like an excellent and much needed addition to main. Did you add a flag for specifying the system roles message?

In interactive mode (ie -i) any prompt file (-f) or prompt (-p) passed using the command line argument is treated as a system prompt and inturn this PR formats it to match the system prompt template expected.

arch-btw · 2024-04-21T19:13:24Z

Here's the patch running llama3 with --verbose-prompt. I think there might be too many new lines?

main: prompt: '<|start_header_id|>system<|end_header_id|>
You are an assistant
<|eot_id|>

'
main: number of tokens in prompt = 11
128006 -> ''
  9125 -> 'system'
128007 -> ''
   198 -> '
'
  2675 -> 'You'
   527 -> ' are'
   459 -> ' an'
 18328 -> ' assistant'
   198 -> '
'
128009 -> ''
   271 -> '

'
main: static prompt based on n_keep: 'system
You are an assistant


'

main: interactive mode on.
Reverse prompt: '<|eot_id|>'
128009 -> ''

Without --verbose-prompt:

system
You are an assistant



>

hanishkvc · 2024-04-22T20:50:20Z

There is a new PR, which is again a experiment which tries to use a simple minded json file to try and drive the logic, so that many aspects can be controlled by editing the json file, rather than needed to update the code.

#6834

hanishkvc added 7 commits April 20, 2024 11:44

Common:ChatOn: Add arguments for chaton

db6f775

user needs to pass --chaton TEMPLATE_ID TEMPLATE_ID will be one of the predefined chat templates already in llama.cpp's llama_chat_apply_template_internal and related like chatml, llama2, llama3, ...

Common:ChatOn: ReversePrompts, SingleMsgChatTemplate wrapper

efbcdc1

Helper to return reverse prompts needed for a given chat template A wrapper that will allow wrapping a given message within a tagged chat template based on the role and chat template specified.

Main:Update to support chaton mode

0a8797b

Glanced through existing interactive and chatml flow, to incorporate this flow. Need to look deeper later. NOTE: Till this point is reapplying of my initial go at chaton, by simplifying the amount of change done to existing code, a bitmore.

ChatOn+Main: ChatApplyTemplateSimple cleanup

ca55da2

Cleanup the associated log messages. Dont overload the return for status as well as data. Now the data returned if any is kept independent of the status of the operation. On failure log a message and exit.

ChatON: Add a note

9037892

hanishkvc changed the title ~~Update main's interactive mode to use the chat templates support already there for server~~ Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) Apr 20, 2024

hanishkvc mentioned this pull request Apr 21, 2024

Implement (properly) different chat templates in main.cpp #6391

Closed

hanishkvc closed this Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) #6795

Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) #6795

hanishkvc commented Apr 20, 2024 •

edited

Loading

hanishkvc commented Apr 20, 2024 •

edited

Loading

DifferentialityDevelopment commented Apr 21, 2024 •

edited

Loading

ngxson commented Apr 21, 2024

hanishkvc commented Apr 21, 2024

arch-btw commented Apr 21, 2024 •

edited

Loading

hanishkvc commented Apr 22, 2024

Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) #6795

Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...) #6795

Conversation

hanishkvc commented Apr 20, 2024 • edited Loading

hanishkvc commented Apr 20, 2024 • edited Loading

DifferentialityDevelopment commented Apr 21, 2024 • edited Loading

ngxson commented Apr 21, 2024

hanishkvc commented Apr 21, 2024

arch-btw commented Apr 21, 2024 • edited Loading

hanishkvc commented Apr 22, 2024

hanishkvc commented Apr 20, 2024 •

edited

Loading

hanishkvc commented Apr 20, 2024 •

edited

Loading

DifferentialityDevelopment commented Apr 21, 2024 •

edited

Loading

arch-btw commented Apr 21, 2024 •

edited

Loading