Skip to content

tool-call: add support for tool-calls using Model Context Protocol #11556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 86 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
183029d
Add tools option to llama-cli
bandoti Jan 31, 2025
4ad8258
tools_json_arr now properly passed to apply-template
bandoti Jan 31, 2025
352f79c
Merge branch 'master' into llamacli-tools
bandoti Feb 3, 2025
becf9b4
add tool-choice parameter
bandoti Feb 4, 2025
cd16957
Add variant include
bandoti Feb 4, 2025
4e8beb0
Reset tools when empty string provided
bandoti Feb 4, 2025
3437080
Pass template group to common_chat_apply_template
bandoti Feb 4, 2025
36c2f38
Merge branch 'master' into llamacli-tools
bandoti Feb 4, 2025
a30111b
Merge branch 'ggerganov:master' into llamacli-tools
bandoti Feb 5, 2025
a726ada
Copy sampler parameters from chat template
bandoti Feb 5, 2025
a024747
Merge branch 'master' into llamacli-tools
bandoti Feb 12, 2025
1dd2e3b
Add handler and MCP message types
bandoti Feb 13, 2025
6458c71
Merge branch 'master' into llamacli-tools
bandoti Feb 13, 2025
b41f57c
Comment out unused parameters
bandoti Feb 13, 2025
e7efd7c
Remove tabs
bandoti Feb 13, 2025
2c07ce7
Only use MCP handler with non-empty string
bandoti Feb 13, 2025
b67a04c
Switch to compile-time polymorphic message types
bandoti Feb 14, 2025
20a19f8
Add tools/list request
bandoti Feb 14, 2025
9dbe42f
Add tools/list response
bandoti Feb 14, 2025
93b54e4
Tokenize output from toolcall response
bandoti Feb 14, 2025
99f2fe3
Add MCP sse/stdio transport types
bandoti Feb 14, 2025
3309b58
Fix indent
bandoti Feb 14, 2025
376fbba
throw exceptions in stdio transport for now
bandoti Feb 14, 2025
80e6790
Only include SSE transport when LLAMA_CURL is set
bandoti Feb 14, 2025
ff44762
Split toolcall params into separate files
bandoti Feb 15, 2025
a9e3404
Separate tool-call from template application
bandoti Feb 15, 2025
608304f
Merge branch 'master' into llamacli-tools
bandoti Feb 15, 2025
a345aa9
Merge branch 'llamacli-tools-sse' into llamacli-tools
bandoti Feb 15, 2025
7b93c31
Add noreturn to stdio transport methods
bandoti Feb 15, 2025
60bca9c
Squashed commit of the following:
bandoti Feb 19, 2025
6ce23b6
Merge branch 'master' into llamacli-tools
bandoti Feb 19, 2025
f2af859
Post-Merge refactoring
bandoti Feb 19, 2025
90efb90
Rearrange the furniture!
bandoti Feb 19, 2025
78a8d90
Fix input processing
bandoti Feb 19, 2025
4d81086
Clean up some header inclusions
bandoti Feb 19, 2025
3b0dd4e
Split toolcall into separate library
bandoti Feb 20, 2025
8668d89
Convert chat_add_and_format to functor
bandoti Feb 21, 2025
a19ed47
Enable LLAMA_TOOLCALL by default (for now)
bandoti Feb 21, 2025
5c0b0cb
Use cxx_std_17
bandoti Feb 21, 2025
3e46978
Impl. initialize and tool_list routines
bandoti Feb 21, 2025
5d6a058
Store callbacks in map
bandoti Feb 21, 2025
b0d3162
No need to explicitly convert int to string
bandoti Feb 21, 2025
ba57885
Initialize tc_handler
bandoti Feb 21, 2025
88bace3
Impl. tools_list_to_oai_json
bandoti Feb 21, 2025
b2c340d
Remove mcp+ URI prefix
bandoti Feb 21, 2025
7d58e9b
WIP: fixing SSE issues
bandoti Feb 22, 2025
ea4cc2f
Add timeout to initialize routine
bandoti Feb 22, 2025
86f83f3
Allow send routine to lock
bandoti Feb 22, 2025
c07a452
Handle relative URI returned from SSE
bandoti Feb 23, 2025
3a11fa2
Fix CRLF case
bandoti Feb 23, 2025
3418d37
Strip trailing NL from endpoint event URI
bandoti Feb 23, 2025
7d6c29b
Explicitly create empty capability object
bandoti Feb 23, 2025
40156ff
Add tools_list_response::fromJson
bandoti Feb 23, 2025
8a3497b
Convert tool list string
bandoti Feb 23, 2025
1209b95
Only invoke toolcall with valid JSON
bandoti Feb 24, 2025
52b98d8
Add tool-call request/response types
bandoti Feb 24, 2025
606993d
Move transport message-dispatch to base type
bandoti Feb 24, 2025
67438a3
Add tools_call_response fromJson
bandoti Feb 24, 2025
7a23d06
Implement call routine
bandoti Feb 24, 2025
c2e531a
Fix whitespace
bandoti Feb 25, 2025
ce5c46c
Preserver argument value
bandoti Feb 25, 2025
850e043
Refactor tool/call response
bandoti Feb 25, 2025
0b52627
Add missing header
bandoti Feb 25, 2025
66eff76
Move tool-call invocation into main loop
bandoti Feb 25, 2025
7b076ee
Merge branch 'master' into llamacli-tools
bandoti Feb 25, 2025
a097b4f
Add tighter check before running toolcalls
bandoti Feb 25, 2025
9db9686
Remove toolcall dependency from common
bandoti Feb 25, 2025
e8dd857
Rename handler -> client to reflect MCP terminology
bandoti Feb 25, 2025
d1b12b8
Merge branch 'master' into llamacli-tools
bandoti Mar 2, 2025
f354ff9
Ensure toolcalls are registered when no -sys provided
bandoti Mar 2, 2025
34697cd
Merge branch 'master' into llamacli-tools
bandoti Mar 4, 2025
8871c8d
Add toolcall output after single-turn run
bandoti Mar 4, 2025
95ed663
Merge branch 'master' into llamacli-tools
bandoti Mar 5, 2025
46766c1
Update grammar_trigger processing
bandoti Mar 5, 2025
ac1fc31
WIP: use common_chat_parse for toolcall
bandoti Mar 5, 2025
ba098af
Extract toolcall format from model
bandoti Mar 5, 2025
787fa89
Oops
bandoti Mar 5, 2025
c36c7e6
Squashed commit of the following:
bandoti Mar 6, 2025
b25fc0d
Merge branch 'master' into llamacli-tools
bandoti Mar 10, 2025
f5c209f
Sync trigger-token fix ggml-org#12291
bandoti Mar 10, 2025
4e378fb
Clear assistant_ss before returning control to loop
bandoti Mar 10, 2025
ff18e24
Revert changes to common_chat_format_single
bandoti Mar 10, 2025
1e67578
Merge branch 'master' into llamacli-tools
bandoti Mar 17, 2025
ee2dad2
Merge branch 'master' into llamacli-tools
bandoti Mar 21, 2025
9a339c9
Merge branch 'master' into llamacli-tools
bandoti Apr 1, 2025
19dc0a2
Merge branch 'master' into llamacli-tools
bandoti Apr 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,9 @@ option(LLAMA_BUILD_SERVER "llama: build server example" ${LLAMA_STANDALONE})
option(LLAMA_CURL "llama: use libcurl to download model from an URL" OFF)
option(LLAMA_LLGUIDANCE "llama-common: include LLGuidance library for structured output in common utils" OFF)

# Toolcall support - needs LLAMA_CURL support to connect with SSE endpoints
option(LLAMA_TOOLCALL "llama: add toolcall support via Model Context Protocol" ON)

# Required for relocatable CMake package
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/build-info.cmake)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/common.cmake)
Expand Down Expand Up @@ -168,6 +171,10 @@ add_subdirectory(src)
# utils, programs, examples and tests
#

if (LLAMA_TOOLCALL)
add_subdirectory(toolcall)
endif()

if (LLAMA_BUILD_COMMON)
add_subdirectory(common)
endif()
Expand Down
18 changes: 18 additions & 0 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2740,6 +2740,24 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
std::back_inserter(params.chat_template));
}
).set_examples({LLAMA_EXAMPLE_MAIN, LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_CHAT_TEMPLATE_FILE"));

add_opt(common_arg(
{"--tools"}, "JINJA_TOOLS",
"set to URI of a Model Context Protocol server, or "
"a JSON array containing tool definitions (requires --jinja)",
[](common_params &params, const std::string & value) {
params.toolcall.tools = value;

}).set_examples({LLAMA_EXAMPLE_MAIN}));

add_opt(common_arg(
{"--tool-choice"}, "JINJA_TOOL_CHOICE",
"set to \"auto\", \"required\", or \"none\" (default: \"auto\")",
[](common_params &params, const std::string & value) {
params.toolcall.choice = value;

}).set_examples({LLAMA_EXAMPLE_MAIN}));

add_opt(common_arg(
{"-sps", "--slot-prompt-similarity"}, "SIMILARITY",
string_format("how much the prompt of a request must match the prompt of a slot in order to use that slot (default: %.2f, 0.0 = disabled)\n", params.slot_prompt_similarity),
Expand Down
62 changes: 62 additions & 0 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include "common.h"
#include "log.h"
#include "llama.h"
#include "chat.h"

#include <algorithm>
#include <cinttypes>
Expand Down Expand Up @@ -1291,6 +1292,67 @@ std::string common_detokenize(const struct llama_vocab * vocab, const std::vecto
return text;
}

void common_chat_grammar_to_sampler(const common_chat_params * src,
const llama_vocab * vocab,
common_params_sampling * sparams)
{
GGML_ASSERT(src && vocab && sparams);

auto & dst = *sparams;

dst.grammar = src->grammar;
dst.grammar_lazy = src->grammar_lazy;

for (const auto & preserved : src->preserved_tokens) {
auto ids = common_tokenize(vocab, preserved, false, true);
if (ids.size() == 1) {
LOG_DBG("Preserved token: %d\n", ids[0]);
dst.preserved_tokens.insert(ids[0]);

} else {
// This may happen when using a tool call style meant for a model
// with special tokens to preserve on a model without said tokens.
LOG_WRN("Not preserved because more than 1 token (wrong chat template override?): %s\n",
preserved.c_str());
}
}

for (const auto & trigger : src->grammar_triggers) {
if (trigger.type == COMMON_GRAMMAR_TRIGGER_TYPE_WORD) {
const auto & word = trigger.value;
auto ids = common_tokenize(vocab, word, /* add_special= */ false, /* parse_special= */ true);

if (ids.size() == 1) {
auto token = ids[0];
auto found = std::find(dst.preserved_tokens.begin(), dst.preserved_tokens.end(),
(llama_token) token);

if (found == dst.preserved_tokens.end()) {
throw std::runtime_error("Grammar trigger word should be marked as preserved token: " + word);
}

LOG_DBG("Grammar trigger token: %d (`%s`)\n", token, word.c_str());
common_grammar_trigger trigger;
trigger.type = COMMON_GRAMMAR_TRIGGER_TYPE_TOKEN;
trigger.value = word;
trigger.token = token;
dst.grammar_triggers.push_back(std::move(trigger));

} else {
LOG_DBG("Grammar trigger word: `%s`\n", word.c_str());
dst.grammar_triggers.push_back({COMMON_GRAMMAR_TRIGGER_TYPE_WORD, word});
}

} else {
dst.grammar_triggers.push_back(trigger);
}
}
if (dst.grammar_lazy && dst.grammar_triggers.empty()) {
throw std::runtime_error("Error: no triggers set for lazy grammar!");
}
}


//
// KV cache utils
//
Expand Down
14 changes: 14 additions & 0 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,11 @@ enum common_reasoning_format {
COMMON_REASONING_FORMAT_DEEPSEEK, // Extract thinking tag contents and return as `message.reasoning_content`
};

struct common_toolcall_params {
std::string tools = "";
std::string choice = "auto";
};

struct common_params {
int32_t n_predict = -1; // new tokens to predict
int32_t n_ctx = 4096; // context size
Expand Down Expand Up @@ -363,6 +368,9 @@ struct common_params {
std::string chat_template = ""; // NOLINT
bool use_jinja = false; // NOLINT
bool enable_chat_template = true;

struct common_toolcall_params toolcall;

common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;

std::vector<std::string> api_keys;
Expand Down Expand Up @@ -609,6 +617,12 @@ std::string common_detokenize(
const std::vector<llama_token> & tokens,
bool special = true);

struct common_chat_params;
void common_chat_grammar_to_sampler(const common_chat_params * src,
const llama_vocab * vocab,
common_params_sampling * sparams);


//
// KV cache utils
//
Expand Down
5 changes: 5 additions & 0 deletions examples/main/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,9 @@ set(TARGET llama-cli)
add_executable(${TARGET} main.cpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})

if (LLAMA_TOOLCALL)
target_link_libraries(${TARGET} PRIVATE toolcall)
endif()

target_compile_features(${TARGET} PRIVATE cxx_std_17)
143 changes: 130 additions & 13 deletions examples/main/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "sampling.h"
#include "llama.h"
#include "chat.h"
#include <json.hpp>

#include <cstdio>
#include <cstring>
Expand All @@ -15,6 +16,10 @@
#include <string>
#include <vector>

#ifdef LLAMA_USE_TOOLCALL
# include "toolcall-client.h"
#endif

#if defined (__unix__) || (defined (__APPLE__) && defined (__MACH__))
#include <signal.h>
#include <unistd.h>
Expand Down Expand Up @@ -83,6 +88,99 @@ static void sigint_handler(int signo) {
}
#endif

class chat_formatter {
public:

struct result {
std::string formatted;
bool tool_was_called;
};

chat_formatter(common_params & params,
std::vector<common_chat_msg> & chat_msgs,
struct common_chat_templates * chat_templates)

: params_(params), chat_msgs_(chat_msgs), chat_templates_(chat_templates) {}

#ifdef LLAMA_USE_TOOLCALL
chat_formatter(common_params & params,
std::vector<common_chat_msg> & chat_msgs,
struct common_chat_templates * chat_templates,
const llama_vocab * vocab,
toolcall::client::ptr tc_client)

: params_(params), chat_msgs_(chat_msgs), chat_templates_(chat_templates),
vocab_(vocab), tc_client_(tc_client),
chat_format_(COMMON_CHAT_FORMAT_CONTENT_ONLY),
formatted_() {}
#endif

chat_formatter::result operator() (const std::string & role, const std::string & content) {

common_chat_msg new_msg = common_chat_parse(content, chat_format_);
new_msg.role = role;

common_chat_templates_inputs cinputs;
cinputs.use_jinja = params_.use_jinja;
cinputs.add_generation_prompt = (role == "user");
#ifdef LLAMA_USE_TOOLCALL
if (tc_client_ != nullptr) {
cinputs.tool_choice = common_chat_tool_choice_parse_oaicompat(tc_client_->tool_choice());
cinputs.tools = common_chat_tools_parse_oaicompat(tc_client_->tool_list());
}
#endif
cinputs.messages.assign(chat_msgs_.cbegin(), chat_msgs_.cend());
cinputs.messages.push_back(new_msg);
chat_msgs_.push_back(new_msg);

bool tool_was_called = false;
if (! new_msg.tool_calls.empty()) { // Call tool and re-prompt
nlohmann::json result_array = nlohmann::json::array();
for (const auto & tc : new_msg.tool_calls) {
toolcall::result_set res = tc_client_->call(tc.name, tc.arguments, tc.id);
if (! res.empty()) {
for (const auto & r : res) {
result_array.push_back(r.data);
}
}
}
common_chat_msg toolcall_msg;
toolcall_msg.role = "tool";
toolcall_msg.content = result_array.dump(-1);

cinputs.add_generation_prompt = true;
cinputs.messages.push_back(toolcall_msg);
chat_msgs_.push_back(toolcall_msg);

tool_was_called = true;
}

common_chat_params cparams = common_chat_templates_apply(chat_templates_, cinputs);
std::string formatted = cparams.prompt.substr(formatted_.size(), cparams.prompt.size());
formatted_ = cparams.prompt;

LOG_DBG("formatted: '%s'\n", formatted.c_str());

#ifdef LLAMA_USE_TOOLCALL
chat_format_ = cparams.format;
common_chat_grammar_to_sampler(&cparams, vocab_, &params_.sampling);
#endif
return chat_formatter::result{std::move(formatted), tool_was_called};
}

private:
common_params & params_;
std::vector<common_chat_msg> & chat_msgs_;
struct common_chat_templates * chat_templates_;

#ifdef LLAMA_USE_TOOLCALL
const llama_vocab * vocab_;
toolcall::client::ptr tc_client_;
common_chat_format chat_format_;
std::string formatted_;
#endif
};

int main(int argc, char ** argv) {
common_params params;
g_params = &params;
Expand All @@ -94,6 +192,11 @@ int main(int argc, char ** argv) {

auto & sparams = params.sampling;

#ifdef LLAMA_USE_TOOLCALL
// Ensure parameters are validated before the model loads
toolcall::params tc_params(params.toolcall.tools, params.toolcall.choice);
#endif

// save choice to use color for later
// (note for later: this is a slightly awkward choice)
console::init(params.simple_io, params.use_color);
Expand Down Expand Up @@ -266,15 +369,16 @@ int main(int argc, char ** argv) {
std::vector<llama_token> embd_inp;

bool waiting_for_first_input = false;
auto chat_add_and_format = [&chat_msgs, &chat_templates](const std::string & role, const std::string & content) {
common_chat_msg new_msg;
new_msg.role = role;
new_msg.content = content;
auto formatted = common_chat_format_single(chat_templates.get(), chat_msgs, new_msg, role == "user", g_params->use_jinja);
chat_msgs.push_back(new_msg);
LOG_DBG("formatted: '%s'\n", formatted.c_str());
return formatted;
};

#ifdef LLAMA_USE_TOOLCALL
auto tc_client = toolcall::create_client(tc_params);
if (tc_client) {
tc_client->initialize();
}
chat_formatter chat_add_and_format(params, chat_msgs, chat_templates.get(), vocab, tc_client);
#else
chat_formatter chat_add_and_format(params, chat_msgs, chat_templates.get());
#endif

std::string prompt;
{
Expand All @@ -296,6 +400,12 @@ int main(int argc, char ** argv) {
inputs.messages = chat_msgs;
inputs.add_generation_prompt = !params.prompt.empty();

#ifdef LLAMA_USE_TOOLCALL
if (tc_client != nullptr) {
inputs.tool_choice = common_chat_tool_choice_parse_oaicompat(tc_client->tool_choice());
inputs.tools = common_chat_tools_parse_oaicompat(tc_client->tool_list());
}
#endif
prompt = common_chat_templates_apply(chat_templates.get(), inputs).prompt;
}
} else {
Expand Down Expand Up @@ -814,10 +924,17 @@ int main(int argc, char ** argv) {
}

if (params.enable_chat_template) {
chat_add_and_format("assistant", assistant_ss.str());
auto format_res = chat_add_and_format("assistant", assistant_ss.str());
if (format_res.tool_was_called) {
auto format_res_tok = common_tokenize(ctx, format_res.formatted, false, true);
embd_inp.insert(embd_inp.end(), format_res_tok.begin(), format_res_tok.end());
assistant_ss.str("");

} else {
is_interacting = true;
LOG("\n");
}
}
is_interacting = true;
LOG("\n");
}
}

Expand Down Expand Up @@ -884,7 +1001,7 @@ int main(int argc, char ** argv) {

bool format_chat = params.conversation_mode && params.enable_chat_template;
std::string user_inp = format_chat
? chat_add_and_format("user", std::move(buffer))
? chat_add_and_format("user", std::move(buffer)).formatted
: std::move(buffer);
// TODO: one inconvenient of current chat template implementation is that we can't distinguish between user input and special tokens (prefix/postfix)
const auto line_pfx = common_tokenize(ctx, params.input_prefix, false, true);
Expand Down
Loading
Loading