-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic Chat templating code with text/json file based config; main chat updated to drive its in-prefix, in-suffix and reverse-prompt from same; chat-apply-template equivalent c-api to allow use by other codes also #6834
base: master
Are you sure you want to change the base?
Conversation
This is interesting. The only issue I see with this is that it doesn't account for FIM (Fill-in-the-Middle). Other than that, it seems alright. Something to note is that this, in practice, plays out a bit differently though and should be considered. For example, do we want to use only the file and/or the CLI options. I personally prefer simply using the file because it centralizes the template structure, exposes it to the API, and simplifies calling it. There are always going to be injection risks, so maybe handle those separately. I'm just thinking out loud at the moment. Take this input with a grain of salt. |
By fill in the middle, if you mean that if the user message has special-token related tags in it which inturn when being tokenised will treat them has special tokens, which can mess with things etal, then if you look at the flow wrt main, the user message is tokenized without parse_special flag. However my generic chat-apply-template currently, doesnt handle this, because it would require returning a vector of strings rather than a single string, as noted in the PR comment. Which if I am not wrong would be different from how others expect chat-apply-template to work, so I havent decided on the same, nor have I looked into other libraries chat-apply-template in detail, I am guessing a bit here. However if you mean something else, please do explain a bit, so I can see if I can do something about it. Do note that I am not a big user of current crop of LLMs for various reasons, while still do look at it once in a while to see where things are, so I am not that tuned in with the conventions / concept-names etal. I wanted a simple program with minimal inter dependencies to use on my limited resources based machine, and I had some issues with ollama and llama3, so I just hacked this in with mostly guess work and crude generalisation by looking at existing flow to some extent and what I was seeing when I experimented on what I needed. I am hacking xyz, without understanding abc in some sense. |
Or are you meaning coding related models and I dont know, if they have some fill-in-the-blank or is it fill-in-the-middle or some such phrase I may have previously seen wrt them, I dont remember now, I havent looked at them, if it is something like that you are talking about, I have to look at it. Be it general LLM or coding related LLM and you are talking about it filling some blanks in the middle of a statement the user has entered, then I assume, user will put some special tokens in the middle of their prompt, in which case the user message will have to be tokenized using parse_special, if that is what you are talking about, then maybe a cmdline argument can be added to inform the logic, whether to treat user message as a normal text or has potentially including special token related tags |
Yes, this is what I meant. One of the models (that I know of) that's capable of infill is the Refact model. Sorry if I caused confusion or made assumptions. |
Updated notes OverviewHelps chat with a model, by allowing role based special token tagging, based on the specified chat-handshake-template-standard.
Adding support for new model / chat-handshake-template-standard
NotesCurrently Main doesnt use chaton-tmpl-apply, but only
|
Sample chaton_meta.json includes template info for
I noticed some difference between deepseek's actual tokenizer config and what is there in llama.cpp's chat-apply-template, so for my logic, I have added two entries deepseek-alt (which matches existing llama.cpp tempalte) and deepseek (which matches role related tags and eos from tokenizer_config.json). However both will potentially work. Later need to cross check the tokenizer_config.json of the other models, with what I have put in chaton_meta.json, to see if they are in sync or not. However based on minimal testing of these models, the existing template in chaton_meta.json does seem to work. NOTE: Even if there is some difference in EoS specified using reverse-prompt, chances are the default logic in main already looks for the EoS specified in the model file loaded also, so things should still be fine, even if the json doesnt match the one in model. |
In middle of somethings, but later will try look into this, as well as add cmdline option to control whether user prompt is parsed wrt special tokens or not. |
Have added support for Begin and Prefix entries wrt User role and inturn one can configure both of them individually wrt whether either of them get added to the 1st user message following the system message from chat-template-apply perspective, in commons/chaton.hpp. Look at llama3, llama2 and monarch entries in the examples/chaton_meta.json wrt how things can differ wrt begin and prefix and inturn 1st user msg following system message. |
At first lance, I'm not sure if it's a good idea to move the implementation completely into a separated JSON. While the good point is that it allows users to edit the list of templates easily, it brings some problems:
Also, could you do a test implementation with the examples in |
25e1da3
to
9626779
Compare
@ngxson for now I purposefully kept the new flow outside llama.h and within common/chaton.hpp for these reasons
And inturn providing flexibility wrt (2) would either way require adding a new api wrt chat template applying, while potentially retaining the old api through potentially a wrapper over a more flexible newer api. As a possible step towards that more flexible flow, on experimenting towards same, I have added the initial skelton of ChatParts class in common/chaton.hpp (Note: I have been away from C++ for 1.5-2-decade++ now, and jumped through too many languages which were at much lower or similar or higher abstraction compared to c++ over the years, so my c++ memory is only so and so, and I have depended more on the compiler not warning/erroring out and not stricitly throught through from memory mgmt perspective of the new classes in c++ etal, so there could be some inefficiencies and or gotchas in there). Also if it makes sense to expose a more flexible api to differentiate between special-tokens++ parts and user provided parts in the formated/tagged string, then whether to expose it through the C only hop in llama.h by using
Also I remember earlier today somewhere reading about possible deprecating of antiprompt/reverse-prompt, but rather I feel the EoS/EoG tracking by main's logic should be built on top of antiprompt in that antiprompt vector should maintain a bunch of possible antiprompts which can be filled from the EoS info in the model file itself, any commandline argument passed by user as well as potentially set from a chat-template manager like chat-template-apply driven logic. The reason is because if I am not wrong, some of the models may allow more than 1 chat-handshake-template-standard, in which case the model file may not explicitly provide all of the possible EoS/EoG token(s) across all of their supported standards. So retaining the antiprompt vector provides flexibility for multiple levels of intervening like what I mentioned. This was originally a weekend project, to solve a immidiate issue I had at my end. And later to see if there can be a generic flow, which can be ideally modified and or extended in future for models/standards which follow a sensible convention, without needing to modify code. And the skeleton which I have added in chaton.hpp seems to provide that for the 5 to 6 models which I tested at a minimal level (ie few back and forth handshake using main interactive flow augumented with my logic/PR) and the corresponding entries added to chaton_meta.json in my PR. I glanced through test-chat-template.cpp, but I feel currently it uses a vector of chat templates from models or ... without identifying the individual templates explicitly, like through a map instead of a vector or so, thus requiring to manually map each template with the chat-apply-template code to see what model/standard it may be mapping to. I will see if I can create a duplicate file which uses this alternate chat-template-apply logic, after I have flushed out Chatparts a bit more. |
Also as json library seems to be co-opted into llama.cpp/common, so I used the same and built my concept on top of it actually. If the logic in this PR works out in handling most of the sensible model/standards out there using a generic flow, and inturn if there is interest in this PR, then may be we can avoid json and replace it with a simple 1-level heirarchy text file something like below and simple parser for it. Template-id1 Template-Id2 .... |
I've just have a look in detail for this PR. The idea seems ok (i.e. using input_prefix/input_suffix/antiprompt), but I still find the implementation is quite complicated IMO:
Also I don't really understand the differences between this PR and #6822 , as I'm also trying to implement a system of prefix/postfix for chat templates. Can you explain this a bit more?
The problem is that all the new chat templates are moved away from antiprompt. They're all using special token to stop generation. This will still be true for all future models, so I don't think antiprompt is something that is future-proof (but special tokens are) |
@ngxson hope below gives some more background and or info on the idea behind this PR
Based on further experimentation, if it is found that a good number of chat-handshake-template-standards can be driven using a config file (json/...), then as I had mentioned in previous comment, we could look at a simple text file based config file instead of json, so that the code can be portable, without depending on a seperate json library.
If you are talking about if there is a system+user message one kind of tagging is required and for user only message a different kind of tagging is required, using systemuser-user-1st-has-begin/prefix flag in the json file, I have tried to handle the difference in tagging across many models/standards. However I agree that there may be few more variations when looked across multiple models/chats. I am looking at a more detailed (in terms of fields) json to see if more combinations can be covered, and that too without adding more custom flags. Maybe tomorrow I will give it a shot. However do note that if we are looking at a pure main program based chatting, yesterdays simple json and corresponding logic already allows chatting with around 5 to 6 models which I have tested yesterday. However wrt server/web-service related flow, I need to cross check with the more detailed json, because some more variations come into picture.
If you read my previous comments, as I had mentioned, ChatParts is more to help keep the different parts that make up a tagged message/chat seperate so that additional data can be extracted to tokenize in a more fine grained manner. However at the same time to allow exposing the api interface over a standard c-extern gating, instead of ChatParts, its helpers can be used to expose the additional info using a array of chars and array of ints. My todays commit already has this mechanism implemented, do have a look, to see, what I mean. You will see that people who want to follow the old api related flow of working with a single tagged string as is, they can do that, at the same time additional info is exposed, if they want to tokenize user prompt parts different from the tags parts.
If I am not wrong, you are looking at implementing the prefix/postfix using hardcoded tags in the code, while this PR tries to see if the needed functionality can be achieved by using a combination of json/text based config file + code, with the idea being to try allow end users to manipulate tagging to some extent, without needing to recompile things. As well as try allow new modes/standards to be supported using a generic flow where possible. ALERT: This is still a experiment, I need to cross check this bit more, before I can categorically say that this can handle most common combinations or not.
Rather you seem to have looked at only a part of my comment, if you read the para fully, you will see the reason why I have suggested to retain the current flexible antiprompt mechanism and then to add the EoS from the model file into the antiprompt flow itself ie by inserting the EoS info in the model to the antiprompt vector. |
By more detailed json/text config file to try support more combinations parallley without too many flags, what I am thinking of (need to cross check)
This is still just a initial idea in my mind by looking at few jinga files, I need to think through and try out this detailed fields based flow still. However the existing simpler json and corresponding support added to drive main's in-prefix/suffix does work for main based chatting. Its the server/web-service kind of flow, where this more detailed fields based flow needs to be thought through bit more and cross checked. Also the idea is to try and see if a common generic logic can be used to drive templating for many models/standards, while still providing the flexiblity to hardcode in code if required for specific models/standards. |
aa66db1
to
30efa0b
Compare
@ngxson have a look at the latest commit here, using a simple generic logic (which you can checkout in chaton_tmpl_apply_ex function) and a json file containing the details of the chat-template in a simple and detailed way, this logic tries to allow tagging of the messages across different models/template standards. For around 9 models/chat-handshake-template-standards I have included sample json config in examples/chaton_meta.json
The c-api which follows a similar semantic as the previous llama_chat_apply_template, is available in the common/chaton.hpp. As the models for which I have added sample tempate config info and inturn checked using modified main, is bit different from those in test-chat-templates, so I have add a new test-chat-template-chaton.cpp to tests folder, which if you run will show what will be the tagged messages wrt the 9 models which I have mentioned above, so that you can check if what it generates is similar to what you may be expecting or not. I feel this mechanism of a generic flow driven by a json is vaible in general, based on the models which I have tested against. And either way, if a particular model requires a very different structure beyond what can be generated by the generic logic, one can always add custom code into chaton_tmpl_apply_ex. This should allow supporting new models in many cases, by just adding to the json config file. Also do go through the detailed Notes/Comments at the begining of the common/chaton.cpp to get a rough feel about this code. |
f47fe25
to
32d2752
Compare
I understand the high level idea but sorry I really don't have time to look at the detailed implementation. While it's a good idea, IMO the chat template infrastructure should be kept simple and support for customizable formats can be added later on. Maybe we can keep your PR as a demo and we will see if it can be merged in the future. Also for context, there's already a discussion on chat templates in the beginning of server development. You can have a look here: #4216 (comment) |
Hi @ngxson, @ggerganov generic code flow + config file based template handlingplease do have a look at the implementation, the generic flow is actually very simple yet flexible, and I feel this idea can accomodate many different models / handshake standards by just updating the config file without touching the code flow (there could be some small differences in terms of white spaces in the worst case, which potentially may not matter beyond a limit, even that may be handleable by adding some generic flags like trim content or so, but it may make it unnecessiraly too detailed a control). I have tried to add support for 8(+1) different models/standards in examples/chaton_meta.json, all by using the generic flow itself, without requiring any customization in code to accomodate that specific model/standard. At a initial glance the tagged messages seem ok to me, but it would be useful for someone else to also cross check once to be sure. To test wrt main one needs to use
To test the possible server flow related multi message tagging at once
This PR specific code is in common/chaton.hpp and inturn the generic logic which uses the config file to do the tagging is in the function chaton_tmpl_apply_ex. You will notice that the generic flow basically just builds on the basic pattern used by most models/standards in a simple and straight forward way, without much complexity. It also provides the basic plumbing for differentiating between the user provided parts and the handshake template provided parts in the tagged message, so that in future, if required the tokenisation can be controlled interms of using parse_special for the template provided parts and avoiding parse_special for end user entered parts ie their querys during chatting, if needed. This is currently not exposed in the c api. Because this config file + associated generic flow tries to expose all parts of the generic pattern followed wrt all the 3 roles, so anyone wanting to experiment with different templates for a given model, will also be potentially able to do that, by just updating the config file. Unless one is doing some fancy conditional inserting of tokens etal beyond the basic system+1st-user-message related one which I have seen in the 8(+1) models, that I have checked to some extent. In which case they will have to add custom code, like what they would have done even now in the existing flow. (this partly relates to a query/comment I noticed wrt PR #4216) simple text based config fileAs you had noted a concern about this potentially making the users of the core llama.cpp needing to bring in json library, if this config file based flow is used, I have added a simple text based config file logic, to try and avoid the dependence on json, while still giving sufficient flexibility for this use. The code for the same is in common/simpcfg.hpp and the sample simpcfg text based config file is in examples/chaton_meta.simpcfg NOTE: currently I have not updated the chaton.hpp to use this simpcfg based files instead of json files. If all of you find that the chaton generic flow is doing what is expected in a sufficiently proper enough way, and inturn that, it is better to avoid needing json dependency wrt 3rd party users of llama.cpp as a library, then I can look at replacing the json (picked from what was already in common dir) with this simpcfg based flow. NoteDo have a look at the note in the chaton.hpp for uptodate overall flow and reasoning. For now the 1st note in this PR conversion, is updated to match the note in chaton.hpp. Also I agree that lets not look from merging angle yet, only after both of you and any others with knowledge that you want to look at this flow, have gone through it and find that it seems to be ok and flexible enough, we can look at merging NOTE: I am a casual (non-regular) user of LLMs as well as llama.cpp, so dont have that much experience with it beyond basics, but I feel if this idea works out, as I feel it seems to currently, then in future for many new models/chat-handshake-template-standards if they follow a sane generic pattern as many seem to be, then the generic flow itself will be able to support those, by just updating the config file, without needing to modify code and recompile it. However I need eyes from experienced users and developers of llama.cpp like you to cross check, if what I am seeing with my limited testing actually makes sense. NOTE: If new models/standards follow a sane pattern, then other than updating the config file, the only change that may be required in code, is in tokenizer wrt any new specifial tokens that they may have added or different encoding for existing special token tag or so, ie if there is no generic way to pick this info across models from their model file. This is a logical guess based on my limited knowledge of llama.cpp and llms in general. |
Updates wrt SimpCfg
ChatOn
|
Make it similar to user-begin+prefix control. ie only wrt 1st msg of respective type.
Use same to bypass any msg count based tagging behaviour for the single message tagging through its helper wrapper.
However still retain the wrappers, which work with a predefined global instance of ChatTemplates.
GroupKV dump adds needed ":" seperator on its own, so calling functions can just pass the tag string they want in the log without worrying about any demarkation.
Also add simple note wrt itself and its helper.
The initial version was rooted around a json object, while the new version is rooted around a MapOfMapOfVariant (GroupKV), which could be preloaded with chat templates info at compile time itself and used as is. Or optionally one could allow the configurable template data to be extended/updated at runtime from a text(/SimpCfg)/json file.
Hi @khimaros, This patch auto sets the example/main's in-prefix/suffix as well as antiprompt/reverse-prompt from the equivalent configuration data in the specified chaton_meta.json file, that is the reason, why its no longer required to be explicitly specified. The extra "\n> " you are seeing is the only-visible-to-end-user prompt added by the existing main code, as I reuse/extend the existing main flow, you see the same.
|
Hi @ggerganov @ngxson @teleprint-me @khimaros @mofosyne The initial/previous version was rooted around a json object, while the new version is rooted around a MapOfMapOfVariant (GroupKV), which could be preloaded with chat templates info at compile time itself and used as is. Or optionally one could allow the configurable template data to be extended/updated at runtime from a text(/SimpCfg)/json file. Thus this new flow should allow for using the new chat templating logic without needing to load additional data at runtime, if one doesnt want to, thus also avoiding need to bring in common/json library. At the same time for a use case like examples/main where it is useful to allow the user to either change the existing (pre/compiled-in) template info and or try adding support for new models/finetunes/template-standards, the same can be achieved by loading it from json file. Optionally in some use-cases, if one wants the runtime augumenting capability but still doesnt want to bring in the common/json, then one could optionally switch ChatTemplates to use SimpCfg (which builds on GroupKV) and inturn use its load logic to load from a simple text file. The Notes in common/chaton.hpp has been updated to capture the new mechanism. Currently by default CHATON_JSON (which brings in json based loading) as well as GKV_DEBUGLOG_ON (which makes the logic more log verbose) is enabled, which needs to be disabled. Rather as I was writing this, come to think of it, I need to move the CHATON_JSON block into its own file, so that the library by default can be compiled without needing json and inturn only programs which use it like main can include the new file with this json based loading helper. NOTE: The compile time pre/compiled-in configurable template data is picked from chaton_meta.hpp. There is a simple and stupid minded python helper added to scripts to convert from chaton_meta.json to chaton_meta.hpp. NOTE: Currently I have not updated the code to follow some of the naming/coding convention mentioned. |
Any program which wants to use json file to update/extend the chaton's configurable template data, can include this new file chaton_json.hpp, to get the reqd functionality. Update chaton_meta_ok, _chaton_meta_validate_dump and chaton_meta_load_json to either work with a passed ChatTemplates instance, or fallback to the compiled-in global instance of same.
Merge upstream as of 20240515IST11XY
Hi @ggerganov @ngxson @mofosyne Just to give a rough context, for a code using the existing chat template logic like examples/server, a simple change like below will allow it to use the new chat template logic from this PR. Once the code+setup is updated to exist in llama(.cpp) library rather than common library. Along with specifying/passing the tempalte-id rather than passing the ninja template string/... wrt the template argument (I have hardcoded to llama3 below). I have done a crude transplanting of chaton into llama.cpp in the below repo, in case if anyone wants to test it. Note that this doesnt integrate chaton into llama library in a proper way, nor does it take cmdline argument wrt template-id etal. ` +#include <chaton.hpp>
` NOTE: Inserting code doesnt seem to work properly wrt the comment's include code mechanism or so. Or I dont understand its proper use. So the above may look bit odd. |
Rename chaton-meta hpp to cpp and include this cpp file which brings in the compile time built-in global chaton configurable template data into the common library, and avoid the nop hpp file references. Update chaton.hpp to not include the meta-cpp, instead just make a reference to the global ChatTemplates instance, so that the hpp can be used as a header file proper. Avoid pragma once in the chaton-meta.cpp, including the script, which helps create it.
C++17 provides a good enough variant as a standard feature, and chaton uses the same at its core, instead of rolling out its own struct of union based variant. And given that currently chaton is part of common library and not the base llama library, so limit the use of c++17 to common library. Initially while experimenting, had set the flag for full llama, limitting it for now. Also by now most embedded targets should be potentially having c++ compilers and libraries with support for c++17 features. So chances are it is a ok enough path to take.
Hi @ngxson, I was trying to test how server would work, if it is updated to use the current version of this PR, inturn with the minimalist of changes. So I basically changed llama_chat_appy_template in llama.cpp to call my chaton_tmpl_apply_ex rather than llama_chat_apply_template_internal. With that change, what I am noticing is that It appears like examples/server doesnt call llama_chat_apply_template beyond the initial generic test that it does before the actual user interaction. But once user puts some chat content, I see that HandleCompletion calls into ServerContextTokenize which inturn seems to directly tokenizes the user visible text (their own as well as model responses) directly without any of the special tokens wrt demarkating of system/user/model messages. Am I missing something fundamental, or is it what the server code is currently setup to do, is it that I need to pass any additional argument beyond -m THE_MODEL. The code I used with the above mentioned patch is at bin/server -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf --chaton-template-id llama3 |
Based on a further quick glance, I feel there is a bug in that server's web frontend is calling into /completions endpoint and not /chat/completions endpoint, even when chat option is selected. Background Given the strange behaviour I saw yesterday when trying to test examples/server with this PR's chaton-template-apply integrated, I noticed that the full chat transcript (without special tags) was being processed as a single prompt and inturn directly tokenized without chat-templating; instead of the expected behaviour of getting a array of chat-role+message objects and inturn running through chat templating and then tokenizing. Wanted to be sure I am not missing something basic, so had a look at http api reference from openai site, which I assume is the convention followed by most llm web services. This is a assumption given that I look at LLMs only once in a bluemoon, that too more as a end user to see where it has reached. However looking at what I am seeing this is what it appears to be.
|
Is this PR still active or any other PR to allow custom templates? It's the only thing I miss after ditching Ollama :/ Just discovered today that But to test that it requires writing a custom clause in C++ that will have to be updated each pull :( This is probably a weird case and likely won't work with the "chat completion" API anyway, but there are definitely lots of other cases where small tweaks to the "official" template are very useful. It doesn't look like 99% of the Jinga2 templates even use a fraction of the power of Jinga2, and even a tiny subset would work: If we could scrape most of the available templates off huggingface then it would just be a case of improving your parser to handle more and more of the corner cases, and wouldn't be that hard IMO, nor require loads to C++ templates or regex code... It maybe wouldn't be all that robust compared to the full parser but it would be better than nothing. I once had to parse millions of semi-broken "Portable Game Notation" and "Forsyth–Edwards Notation" chess data files and it really wasn't that bad to do - you just have to plod along getting the fraction of failures down until you get to the truly "WTF" files and call it a day. |
The other alternative is just to implement some super-simple subset of: https://pkg.go.dev/text/template (what Ollama uses) https://github.com/antlr/stringtemplate4/blob/master/doc/cheatsheet.md (most popular Java template library) If you accept it doesn't need to be as robust about error detection and reporting, it's really easy to implement something like this with nothing but recursion and a couple of string matching helper functions. There's literally 100s of open source projects that do this too, ranging from C++ template-heavy / regex-heavy: https://github.com/lexxmark/string_template to barebones C: https://github.com/cozis/tinytemplate and anything like this would likely be able to accommodate our use case for the subset of Jinga2 used to write the real templates, and as Ollama has shown; just a couple of of added boolean variables (eg: |
*** Updated to match latest commit ***
Overview
Helps chat with models, by tagging chat messages based on the specified
chat-handshake-template-standard. This uses a generic tagging code driven
by a json meta data file, which specifies the handshake template details.
This can be used by
main, to build on existing interactive flow and its in-prefix, in-suffix
and antiprompt/reverse-prompt
server, by replacing its existing llama_chat_apply_template with the
equivalent helper here.
The common pattern
As a convention, the tagging used by LLMs to differentiate between the
different parts when chatting with them normally follows a general pattern of
<BeginOfSentenceIfAny> <RolePrefixIfAny> <TheContent> <RoleSuffixIfAny> <EndOfSentenceIfAny>
The Roles could include System, User and Assistant (ie the Model)
A chat normally consists of
a System message/prompt followed by
multiple user message/query - model message/response pairs
The different models will normally have all or some subset of the tagging mentioned above.
You may also notice some common patterns like
Because a user message is normally followed by model/assistant response, in most models
user messages wont have EndOfSentenceTag and
the following model response wont have BeginOfSentenceTag
Because a system message will normally be immidiately followed by a user query,
in many models, there wont be a EndOfSentenceTag following the system message and
BeginOfSentenceTag wrt the 1st user message following the system message.
in some models there wont even be a RoleSuffixTag following system message
and RolePrefixTag wrt the 1st user message following the system message.
however in many of these models, the subsequent user messages will have the
BeginOfSentenceTag and or RolePrefixTag.
The Strategy
The template meta data json file allows the user to specify the above mentioned tags wrt
each of the Role. Depending on whether a given model uses a given tag or not you either
specify the required tag or else you specify a empty string.
A tag could be a single word or multiple words, and may include newline char specified
using \n and so on. The tag is always demarcated using double quotes and thus also allows
spaces at the begining or end of the tag, if needed.
In order to account for the conditionality of tags between the system message and the 1st
user message, flags are provided to explicitly control whether each of these possible tags
is used by a specific model or not, as part of its template info.
The Roles are identified in the json file using "system", "user" and "assistant". However
the model may use different words to identify these roles, in which case setup RolePrefix
and or RoleSuffix appropriately.
To identify that model is finished with generating response to user query, depending on
the model's handshake template standard, one will need to set the reverse-prompt to either
the assistant's suffix or end tag or to the user's begin or prefix tag, depending on what
is generated by the model at the end of its response.
The JSON File
Can contain the template info wrt multiple models/handshake-standards. And inturn each
unique template is identified by a unique template id string.
The fields that make up a given chat-handshake-template-standard include
global-> begin & end
system -> begin, prefix, suffix & end
user -> begin, prefix, suffix & end
assistant -> begin, prefix, suffix & end
reverse-prompt
systemuser-system-has-suffix, systemuser-system-has-end,
systemuser-1st-user-has-begin and systemuser-1st-user-has-prefix
Usage
One needs to load the json file containing the template meta data and inturn call the
other helper functions as needed.
Inturn one can use the helper functions to either extract a given tag or to apply all
tags specified wrt a given role to the passed message or to apply tags as needed for
a bunch of messages in one go.
The individual message tagging helper, will apply all tags specified wrt that role.
The multiple messages tagging helper chaton-tmpl-apply, will look at the boolean flags
when tagging the passed messages. In this the system suffix, system end, user begin and
user prefix get included only if corresponding flag is set.
Both the single and multi messages tagging helpers provide two versions.
which divides the returned string into parts.
part is a normal part which needs to be tokenized without parse_special
or is a special part which needs to be tokenized with parse-special.
example/main
The interactive commandline program under example/main, uses
Currently Main doesnt use chaton-tmpl-apply, but only
to in-prefix, in-suffix and antiprompt of main.
These always adds any role specific begin+prefix and suffix+end around
the passed message.
Adding support for new model / chat-handshake-template-standard
before trying to add a custom logic.
If you update the generic flow, cross check if existing json files will
need to be updated or not.
Notes
Look at the sample chaton_meta.json in examples folder for how the above may apply