Server: add function calling API #5588

ngxson · 2024-02-19T13:47:28Z

Motivation

This subject is already brought up in #4216 , but my initial research failed.

Recently, I discovered a new line of model designed specifically for this usage: https://github.com/MeetKai/functionary

This model can decide whether to call functions (and which function to be called) in a given context. The chat template looks like this:

{#v2.2#}
{% for message in messages %}
  {% if message['role'] == 'user' or message['role'] == 'system' %}
    {{ '<|from|>' + message['role'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}
  {% elif message['role'] == 'tool' %}
    {{ '<|from|>' + message['name'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}
  {% else %}
    {% set contain_content='no'%}
    {% if message['content'] is not none %}
      {{ '<|from|>assistant\n<|recipient|>all\n<|content|>' + message['content'] }}
      {% set contain_content='yes'%}
    {% endif %}
    {% if 'tool_calls' in message and message['tool_calls'] is not none %}
      {% for tool_call in message['tool_calls'] %}
        {% set prompt='<|from|>assistant\n<|recipient|>' + tool_call['function']['name'] + '\n<|content|>' + tool_call['function']['arguments'] %}
        {% if loop.index == 1 and contain_content == \"no\" %}
          {{ prompt }}
        {% else %}
          {{ '\n' + prompt}}
        {% endif %}
      {% endfor %}
    {% endif %}
    {{ '<|stop|>\n' }}
  {% endif %}
{% endfor %}
{% if add_generation_prompt %}
  {{ '<|from|>assistant\n<|recipient|>' }}
{% endif %}

Example:

<|from|>system
<|recipient|>all
<|content|>// Supported function definitions that should be called when necessary.
namespace functions {
// Get the current weather
type get_current_weather = (_: {
// The city and state, e.g. San Francisco, CA
location: string,
}) => any;
} // namespace functions
<|from|>system
<|recipient|>all
<|content|>A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary
<|from|>user
<|recipient|>all
<|content|>What is the weather for Istanbul?

Possible implementation

Since this is the only one model available publicly that can do this function, it's quite risky to modify llama_chat_apply_template to support it (we may end up pollute the code base).

The idea is to firstly keep the implementation in server example, then when the template become more mainstream, we can adopt it in llama_chat_apply_template.

Data passing in the direction from user ==> model (input direction)

Add function in server example to parse input request and format the prompt. Attention: with function calling, we will have 2 types of system messages: one for the actual prompt (You are a helpful assistant) and one for function definition.

Data passing in the direction from model ==> user (output direction)

Add grammar to for model to output JSON when it's inside function argument message
Add parser to extract function arguments and return it as JSON

The text was updated successfully, but these errors were encountered:

ngxson · 2024-02-19T13:50:07Z

Research on MeetKai's implementation

My python snippet: https://gist.github.com/ngxson/c477fd9fc8e0a25c52ff4aa6129dc7a1

Key things to notice:

This implementation accepts OpenAI tool_calls format as input: https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models
Then, the OpenAI schema is converted into MeetKai schema (which is more compact and human-readable): https://github.com/MeetKai/functionary/blob/main/functionary/schema.py
The assistant response is in format (simplified version, see my python for more details):
- Only response (no function call): <|from|>assistant + <|recipient|>all + message + <|stop|>
- Response with one or multiple function calls: <|from|>assistant + <|recipient|>all + message + multiples times (<|recipient|>{{function_name}} + arguments) + <|stop|>
- Reponse with tool_calls=none: <|from|>assistant + <|recipient|>no-tool-call + message + <|stop|>
- Additionally, it also supports code interpreter, but it's too complicated to integrate for now: <|from|>assistant + <|recipient|>code-interpreter + ... + <|stop|>
MeetKai seems to have grammar-based sampling
Official example for the formatted prompt: https://github.com/MeetKai/functionary/blob/main/tests/prompt_test_v2.txt

Link to OAI docs for tool_calls: https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools

github-actions · 2024-04-08T01:06:26Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

teleprint-me · 2024-04-08T02:53:23Z

I'm actually early and waiting on some things to finish through. I thought I would be busy until mid April, but I might have some free time sooner than I thought. I mention this because while some models are trained to use tools, I've noticed some models are smart enough to do it on their own with the right amount of solicitation.

https://github.com/teleprint-me/py.gpt.prompt/blob/main/docs/notebooks/llama_cpp_grammar_api.ipynb

I'm planning on implementing the proof of concept in more detail in a simplified and streamlined way.

https://github.com/teleprint-me/llama-cpp-client

There's also a fine-tuned mistral model trained to do this as well

https://huggingface.co/Trelis/Mistral-7B-Instruct-v0.2-function-calling-v3

I don't think it needs it, but it probably helps reduce the amount of necessary context to orientate it.

@abetlen Also has the functionary model

https://huggingface.co/abetlen/functionary-7b-v1-GGUF

I was "discussing" it with the Mistral 7B v0.2 model quantized to Q4_0 and it understood exactly what I wanted, but this was only after I provided it with the appropriate context. It did surprisingly well regardless.

The only reason I really care about this is because I want the models to have a "memory" via a SQLite database. It's something I've been working on for over a year because I genuinely do not like "RAG" which is just a Q & A with segmentation and language models. I never really liked it and always felt dissatisfied with it.

https://github.com/teleprint-me/py.gpt.prompt/blob/main/pygptprompt/function/memory.py

skoulik · 2024-04-30T02:03:16Z

Please Let me cast my humble vote in favour of this issue. It seems that agents capability is going to be the next big thing in LLMs. I mean, seriosly, chatting and RAG is supported by literally every possible toolkit, with its simplicity and limitations, but in order to keep up with the big tech the open source community must move on.
Ok, enough talk.

My goal is to be able to run (at the very least) this: https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent/ or this: https://github.com/abetlen/llama-cpp-python/blob/main/examples/notebooks/Functions.ipynb
But with llama.cpp server as a backend. Directly or with a wrapper/adaptor. Currently it fails, obviously, with
openai.InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'Unsupported param: tools', 'type': 'server_error'}}

I am yet to explore @ngxson 's #5695 solution. It seems though, that it is geared towards MeetKai (Can anyone confirm this?) while we need a universal solution that can support llama-3, open ai, etc. interfaces. To my intermediate understanding the support boils down to a set of prompt templates appropriate for a particular model (Can anyone confirm this, too?). I am partucularly interested in llama-3-instruct model support.

I have found a similar solution that works with llama.cpp server (more or less), see: https://github.com/Maximilian-Winter/llama-cpp-agent
Unfortunately, it is not compatible with llamaindex out of the box.

ngxson · 2024-04-30T11:34:39Z

To my intermediate understanding the support boils down to a set of prompt templates appropriate for a particular model

Yes that's correct. Function calling is simply just a more complicated chat template.

When I first started this PR, MeetKai was the only open-source model to implement this idea. Of course we have many new models now, but the problem is still the same with chat templates: there is no "standard" way, each model uses its own template.

Also because we're having more visibility now (i.e. more models to see the pattern), I'm planning to re-make all of this. Maybe as a dedicated side project - a wrapper for llama.cpp's server, because it will be quite messy. Then we will the if one day we can merge it back to llama.cpp

skoulik · 2024-05-01T01:24:24Z

Hi @ngxson , thank you for getting back.
I've quickly skimmed through your commits and haven't found mentions of JSON to grammar conversions (https://github.com/ggerganov/llama.cpp/tree/master/grammars). (Or have I just missed it?) If this is the case, it is something worth exploring. Grammars that restrict models' output have shown to greatly increase the quality of function calling output (a random but relevant fact that I've learned googling around).

teleprint-me · 2024-05-01T02:59:26Z

@ngxson @skoulik #6389

skoulik · 2024-05-01T06:32:10Z

@ngxson @skoulik #6389

This seems to be it. Great!

github-actions · 2024-06-16T01:07:13Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

foldl · 2024-06-17T11:21:34Z

These models support function calling (without fine-tuning):

ChatGLM3/GLM-4
Mistral v0.3
Qwen v1.5 & v2

For Qwen, function calling can be implemented outside the interference application.

I have implemented these in chatllm.cpp.

ngxson added the enhancement New feature or request label Feb 19, 2024

ngxson changed the title ~~Server: add function calling API [need investigation]~~ Server: add function calling API Feb 23, 2024

ngxson linked a pull request Feb 23, 2024 that will close this issue

Server: add support for "tool_calls" (MeetKai/functionary model) #5695

Draft

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 8, 2024

phymbert removed the stale label Apr 8, 2024

phymbert reopened this Apr 8, 2024

phymbert added the server/webui label Apr 8, 2024

ngxson added the demo Demonstrate some concept or idea, not intended to be merged label Apr 30, 2024

github-actions bot added the stale label Jun 1, 2024

github-actions bot closed this as completed Jun 16, 2024

davidmigloz mentioned this issue Jul 3, 2024

Add Llama.cpp integration davidmigloz/langchain_dart#317

Open

nguyenhoangthuan99 mentioned this issue Sep 9, 2024

epic: llama.cpp params are settable via API call or model.yaml janhq/cortex.cpp#1151

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server: add function calling API #5588

Server: add function calling API #5588

ngxson commented Feb 19, 2024 •

edited

Loading

ngxson commented Feb 19, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024

teleprint-me commented Apr 8, 2024 •

edited

Loading

skoulik commented Apr 30, 2024

ngxson commented Apr 30, 2024

skoulik commented May 1, 2024

teleprint-me commented May 1, 2024 •

edited

Loading

skoulik commented May 1, 2024

github-actions bot commented Jun 16, 2024

foldl commented Jun 17, 2024

Server: add function calling API #5588

Server: add function calling API #5588

Comments

ngxson commented Feb 19, 2024 • edited Loading

Motivation

Possible implementation

ngxson commented Feb 19, 2024 • edited Loading

Research on MeetKai's implementation

github-actions bot commented Apr 8, 2024

teleprint-me commented Apr 8, 2024 • edited Loading

skoulik commented Apr 30, 2024

ngxson commented Apr 30, 2024

skoulik commented May 1, 2024

teleprint-me commented May 1, 2024 • edited Loading

skoulik commented May 1, 2024

github-actions bot commented Jun 16, 2024

foldl commented Jun 17, 2024

ngxson commented Feb 19, 2024 •

edited

Loading

ngxson commented Feb 19, 2024 •

edited

Loading

teleprint-me commented Apr 8, 2024 •

edited

Loading

teleprint-me commented May 1, 2024 •

edited

Loading