Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: add function calling API #5588

Closed
3 tasks
ngxson opened this issue Feb 19, 2024 · 10 comments · May be fixed by #5695
Closed
3 tasks

Server: add function calling API #5588

ngxson opened this issue Feb 19, 2024 · 10 comments · May be fixed by #5695
Labels
demo Demonstrate some concept or idea, not intended to be merged enhancement New feature or request server/webui stale

Comments

@ngxson
Copy link
Collaborator

ngxson commented Feb 19, 2024

Motivation

This subject is already brought up in #4216 , but my initial research failed.

Recently, I discovered a new line of model designed specifically for this usage: https://github.com/MeetKai/functionary

This model can decide whether to call functions (and which function to be called) in a given context. The chat template looks like this:

{#v2.2#}
{% for message in messages %}
  {% if message['role'] == 'user' or message['role'] == 'system' %}
    {{ '<|from|>' + message['role'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}
  {% elif message['role'] == 'tool' %}
    {{ '<|from|>' + message['name'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}
  {% else %}
    {% set contain_content='no'%}
    {% if message['content'] is not none %}
      {{ '<|from|>assistant\n<|recipient|>all\n<|content|>' + message['content'] }}
      {% set contain_content='yes'%}
    {% endif %}
    {% if 'tool_calls' in message and message['tool_calls'] is not none %}
      {% for tool_call in message['tool_calls'] %}
        {% set prompt='<|from|>assistant\n<|recipient|>' + tool_call['function']['name'] + '\n<|content|>' + tool_call['function']['arguments'] %}
        {% if loop.index == 1 and contain_content == \"no\" %}
          {{ prompt }}
        {% else %}
          {{ '\n' + prompt}}
        {% endif %}
      {% endfor %}
    {% endif %}
    {{ '<|stop|>\n' }}
  {% endif %}
{% endfor %}
{% if add_generation_prompt %}
  {{ '<|from|>assistant\n<|recipient|>' }}
{% endif %}

Example:

<|from|>system
<|recipient|>all
<|content|>// Supported function definitions that should be called when necessary.
namespace functions {
// Get the current weather
type get_current_weather = (_: {
// The city and state, e.g. San Francisco, CA
location: string,
}) => any;
} // namespace functions
<|from|>system
<|recipient|>all
<|content|>A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary
<|from|>user
<|recipient|>all
<|content|>What is the weather for Istanbul?

Possible implementation

Since this is the only one model available publicly that can do this function, it's quite risky to modify llama_chat_apply_template to support it (we may end up pollute the code base).

The idea is to firstly keep the implementation in server example, then when the template become more mainstream, we can adopt it in llama_chat_apply_template.

Data passing in the direction from user ==> model (input direction)

  • Add function in server example to parse input request and format the prompt. Attention: with function calling, we will have 2 types of system messages: one for the actual prompt (You are a helpful assistant) and one for function definition.

Data passing in the direction from model ==> user (output direction)

  • Add grammar to for model to output JSON when it's inside function argument message
  • Add parser to extract function arguments and return it as JSON
@ngxson ngxson added the enhancement New feature or request label Feb 19, 2024
@ngxson
Copy link
Collaborator Author

ngxson commented Feb 19, 2024

Research on MeetKai's implementation

My python snippet: https://gist.github.com/ngxson/c477fd9fc8e0a25c52ff4aa6129dc7a1

Key things to notice:


Link to OAI docs for tool_calls: https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools

@ngxson ngxson changed the title Server: add function calling API [need investigation] Server: add function calling API Feb 23, 2024
@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 8, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 8, 2024
@teleprint-me
Copy link
Contributor

teleprint-me commented Apr 8, 2024

I'm actually early and waiting on some things to finish through. I thought I would be busy until mid April, but I might have some free time sooner than I thought. I mention this because while some models are trained to use tools, I've noticed some models are smart enough to do it on their own with the right amount of solicitation.

I'm planning on implementing the proof of concept in more detail in a simplified and streamlined way.

There's also a fine-tuned mistral model trained to do this as well

I don't think it needs it, but it probably helps reduce the amount of necessary context to orientate it.

@abetlen Also has the functionary model

I was "discussing" it with the Mistral 7B v0.2 model quantized to Q4_0 and it understood exactly what I wanted, but this was only after I provided it with the appropriate context. It did surprisingly well regardless.

The only reason I really care about this is because I want the models to have a "memory" via a SQLite database. It's something I've been working on for over a year because I genuinely do not like "RAG" which is just a Q & A with segmentation and language models. I never really liked it and always felt dissatisfied with it.

@phymbert phymbert removed the stale label Apr 8, 2024
@phymbert phymbert reopened this Apr 8, 2024
@skoulik
Copy link

skoulik commented Apr 30, 2024

Please Let me cast my humble vote in favour of this issue. It seems that agents capability is going to be the next big thing in LLMs. I mean, seriosly, chatting and RAG is supported by literally every possible toolkit, with its simplicity and limitations, but in order to keep up with the big tech the open source community must move on.
Ok, enough talk.

My goal is to be able to run (at the very least) this: https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent/ or this: https://github.com/abetlen/llama-cpp-python/blob/main/examples/notebooks/Functions.ipynb
But with llama.cpp server as a backend. Directly or with a wrapper/adaptor. Currently it fails, obviously, with
openai.InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'Unsupported param: tools', 'type': 'server_error'}}

I am yet to explore @ngxson 's #5695 solution. It seems though, that it is geared towards MeetKai (Can anyone confirm this?) while we need a universal solution that can support llama-3, open ai, etc. interfaces. To my intermediate understanding the support boils down to a set of prompt templates appropriate for a particular model (Can anyone confirm this, too?). I am partucularly interested in llama-3-instruct model support.

I have found a similar solution that works with llama.cpp server (more or less), see: https://github.com/Maximilian-Winter/llama-cpp-agent
Unfortunately, it is not compatible with llamaindex out of the box.

@ngxson
Copy link
Collaborator Author

ngxson commented Apr 30, 2024

To my intermediate understanding the support boils down to a set of prompt templates appropriate for a particular model

Yes that's correct. Function calling is simply just a more complicated chat template.

When I first started this PR, MeetKai was the only open-source model to implement this idea. Of course we have many new models now, but the problem is still the same with chat templates: there is no "standard" way, each model uses its own template.

Also because we're having more visibility now (i.e. more models to see the pattern), I'm planning to re-make all of this. Maybe as a dedicated side project - a wrapper for llama.cpp's server, because it will be quite messy. Then we will the if one day we can merge it back to llama.cpp

@ngxson ngxson added the demo Demonstrate some concept or idea, not intended to be merged label Apr 30, 2024
@skoulik
Copy link

skoulik commented May 1, 2024

Hi @ngxson , thank you for getting back.
I've quickly skimmed through your commits and haven't found mentions of JSON to grammar conversions (https://github.com/ggerganov/llama.cpp/tree/master/grammars). (Or have I just missed it?) If this is the case, it is something worth exploring. Grammars that restrict models' output have shown to greatly increase the quality of function calling output (a random but relevant fact that I've learned googling around).

@teleprint-me
Copy link
Contributor

teleprint-me commented May 1, 2024

@ngxson @skoulik #6389

@skoulik
Copy link

skoulik commented May 1, 2024

@ngxson @skoulik #6389

This seems to be it. Great!

@github-actions github-actions bot added the stale label Jun 1, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@foldl
Copy link
Contributor

foldl commented Jun 17, 2024

These models support function calling (without fine-tuning):

  • ChatGLM3/GLM-4
  • Mistral v0.3
  • Qwen v1.5 & v2

For Qwen, function calling can be implemented outside the interference application.

I have implemented these in chatllm.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged enhancement New feature or request server/webui stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants