Open
Description
Is your feature request related to a problem? Please describe.
When generating chat completion, it is hard-coded to generate a non-standard prompt template that looks something like:
### User: <blabla>
### Assistant: <blabla>
system message is currently ignored.
llama-cpp-python/llama_cpp/llama.py
Line 1578 in 255d653
This mostly works for most models. But it's not correct.
Describe the solution you'd like
- add a set of built-in prompt templates user can specify at inference time ["vicuna","alpaca","chatml","llama2-chat","oasst"] at minimum
- recommend copying design from ooba's instruction templates or fastchat's conversation
- add ability to pass a template string for other nonstandard formats (such as the one currently implemented in llama-cpp-python).
Describe alternatives you've considered
modifying llama-cpp-python to hard code it to llama2-chat format, not a great solution.
Additional context