Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Proxy and SSL Config Options to Python SDK #3180

Open
wants to merge 112 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
e8003c8
Add proxy and SSL config options
cgivre Nov 11, 2024
a00b012
Added args to additional method
cgivre Nov 11, 2024
88ef003
Remove binary state from high-level API and use Jinja templates (#3147)
cebtenzzre Nov 25, 2024
c7c99a1
Fixups for Jinja PR (#3215)
cebtenzzre Dec 4, 2024
0ae1ae3
ci: do not run online installer or publish jobs on PR branches (#3217)
cebtenzzre Dec 4, 2024
1ed30da
llamamodel: add missing softmax to fix temperature (#3202)
cebtenzzre Dec 4, 2024
2cad0d7
chat: cut v3.5.0-rc1 release candidate (#3218)
cebtenzzre Dec 4, 2024
87b5127
add changelog entries for Jinja PR (#3223)
cebtenzzre Dec 6, 2024
49363ed
changelog: add more changes from #3147 (#3226)
cebtenzzre Dec 6, 2024
db4d975
Animate the removal of chat items when editing prompts. (#3227)
manyoso Dec 6, 2024
4807e6a
qml: tweaks to new edit/redo buttons (#3228)
cebtenzzre Dec 6, 2024
3b26a65
chat: cut v3.5.0-rc2 release candidate (#3229)
cebtenzzre Dec 6, 2024
a1e38da
chat: run update_translations for v3.5.0 (#3230)
cebtenzzre Dec 6, 2024
7a71600
changelog: fix parenthesis
cebtenzzre Dec 9, 2024
f325cea
Italian localization update (#3236)
Harvester62 Dec 9, 2024
38c1ab2
fixups for GPT4All v3.5.0-rc2 (#3239)
cebtenzzre Dec 9, 2024
3912990
update Romanian translation for v3.5.0 (#3232)
SINAPSA-IC Dec 9, 2024
9a64f52
chat: cut v3.5.0 release (#3240)
cebtenzzre Dec 9, 2024
d11e18c
chat: release v3.5.0 (#3241)
cebtenzzre Dec 9, 2024
f5cee70
Bump version to v3.5.1-dev0 (#3242)
manyoso Dec 9, 2024
ee9dd88
chatmodel: fix incorrect currentResponse argument (#3245)
cebtenzzre Dec 9, 2024
d8f141a
Fix the z-ordering of the home button. (#3246)
manyoso Dec 9, 2024
0107a8c
metadata: fix typos in release notes
cebtenzzre Dec 10, 2024
6077c39
fix several bad chat templates (#3250)
cebtenzzre Dec 10, 2024
e647581
models3: fix Llama 3.2 chat template (#3251)
cebtenzzre Dec 10, 2024
66c9ffe
changelog: add PR #3251
cebtenzzre Dec 10, 2024
b5d67d1
Update changlog and version to make 3.5.1 hotfix release. (#3252)
manyoso Dec 10, 2024
337afa0
Release notes and latestnews for v3.5.1. (#3253)
manyoso Dec 10, 2024
52e8ea4
Bump the version to 3.5.2-dev0. (#3254)
manyoso Dec 10, 2024
167e0de
latestnews: make it more compact
cebtenzzre Dec 12, 2024
cc30175
Fix local server regressions caused by Jinja PR (#3256)
cebtenzzre Dec 13, 2024
e4b0a8d
modellist: fix cloning of chat template and system message (#3262)
cebtenzzre Dec 13, 2024
816158b
StartupDialog: fix two untranslated strings (#3293)
cebtenzzre Dec 13, 2024
b988e82
Break the explore models view into two. (#3269)
manyoso Dec 13, 2024
9eea8b7
chat: cut v3.5.2 release (#3292)
cebtenzzre Dec 13, 2024
383a99b
fix chatmodel.h #includes
cebtenzzre Dec 13, 2024
3218466
ci: attempt to fix Ubuntu build
cebtenzzre Dec 13, 2024
c6f01e0
chat: release version 3.5.2 (#3296)
cebtenzzre Dec 14, 2024
200c5a9
chat: fix localdocs breakage in v3.5.2 (#3302)
cebtenzzre Dec 16, 2024
6fedb79
New v3.5.3 hotfix release. (#3304)
manyoso Dec 16, 2024
e1a9048
ci: downgrade Windows image to fix build (#3306)
cebtenzzre Dec 16, 2024
0b029fa
chat: release version 3.5.3 (#3307)
cebtenzzre Dec 16, 2024
cf342e1
chat: bump version to 3.5.4-dev0
cebtenzzre Dec 16, 2024
cb00613
Update maintainers. (#3322)
manyoso Dec 18, 2024
11c285f
Fix for remote model templates when messages contain xml. (#3318)
manyoso Dec 18, 2024
6b1a140
Fix Jinja2Cpp bug that broke system msg detection in templates (#3325)
cebtenzzre Dec 19, 2024
1ba56b2
chatmodel: fix sources showing as unconsolidated in UI (#3328)
cebtenzzre Dec 19, 2024
33d7166
Code interpreter (#3173)
manyoso Dec 19, 2024
0f95f7c
modellist: automatically replace known chat templates with our versio…
cebtenzzre Dec 19, 2024
0c2c15e
undo unintentional partial revert of #3173
cebtenzzre Dec 19, 2024
b2b9be4
Release of 3.6.0. (#3329)
manyoso Dec 19, 2024
ac08448
qml: fix missing localdocs and prefill progress (#3330)
cebtenzzre Dec 19, 2024
0e5e4ce
Release notes and latestnews for v3.6.0, and bump version. (#3331)
manyoso Dec 19, 2024
d0d857d
ChatView: make "stop" and "copy conversation" work again (#3336)
manyoso Dec 20, 2024
a49c2cf
Release notes for v3.6.1 and bump version (#3339)
manyoso Dec 20, 2024
6499f27
updated settings page (#3368)
mcembalest Jan 7, 2025
5b8bd22
fix: format of language and locale setting (#3370)
mcembalest Jan 7, 2025
68d4ed7
Properly report that the computation was timedout to the model (#3369)
manyoso Jan 7, 2025
1cb34d1
code interpreter: support variadic console.log (#3371)
cebtenzzre Jan 7, 2025
5cd0bd1
chat templates: work around Jinja2Cpp issue with 'not X is defined' (…
cebtenzzre Jan 7, 2025
8ffac8f
jinja2cpp: update submodule for else/endif crash fix (#3373)
cebtenzzre Jan 8, 2025
dd0cef2
Update README.md - brokenlink (#3380)
AndriyMulyar Jan 10, 2025
4c1c026
Save chats on quit, even if window isn't closed first (#3387)
cebtenzzre Jan 16, 2025
ced514c
ci: use the shared 'gpt4all' context for environment variables (#3392)
cebtenzzre Jan 17, 2025
e629721
Add more chat template substitutions (#3393)
cebtenzzre Jan 21, 2025
7336de3
jinja2cpp: update submodule for partial subscript crash fix (#3394)
cebtenzzre Jan 21, 2025
0acdeee
Sign maintenancetool.app on macOS (#3391)
cebtenzzre Jan 21, 2025
e509457
add Windows ARM build (#3385)
cebtenzzre Jan 21, 2025
2ad4aea
ci: add missing context to Windows ARM builds (#3400)
cebtenzzre Jan 21, 2025
db36e30
Italian localization update (#3389)
Harvester62 Jan 21, 2025
f6399a3
jinja2cpp: update submodule for 'not X is defined' fix (#3402)
cebtenzzre Jan 21, 2025
f0ebabd
jinja2cpp: update submodule to fix unused var (#3403)
cebtenzzre Jan 22, 2025
5ee9b97
Bump version for 3.7.0 release. (#3401)
manyoso Jan 21, 2025
58515cf
changelog: add missing link
cebtenzzre Jan 22, 2025
35d9936
changelog: fix reference to wrong macOS version
cebtenzzre Jan 22, 2025
c13b71f
ci: fix macOS codesigning (#3408)
cebtenzzre Jan 23, 2025
8023ba2
chat: release version 3.7.0 (#3407)
cebtenzzre Jan 23, 2025
9809a2a
metadata: fix typo
cebtenzzre Jan 23, 2025
9641b47
chat: bump version to v3.7.1-dev0
cebtenzzre Jan 23, 2025
881ac19
Fix regression while using localdocs with server API. (#3410)
manyoso Jan 24, 2025
2230628
Server view fix (#3411)
manyoso Jan 24, 2025
45a171c
Update to Qt 6.8.1 (#3386)
cebtenzzre Jan 24, 2025
62a5623
cmake: do not modify gpt4all.app after signing it (#3413)
cebtenzzre Jan 24, 2025
722dcb0
Revert "cmake: do not modify gpt4all.app after signing it (#3413)"
cebtenzzre Jan 24, 2025
919b415
cmake: do not modify gpt4all.app after signing it (#3417)
cebtenzzre Jan 24, 2025
a71fed7
codeinterpreter: permit console.log with single string arg (#3426)
cebtenzzre Jan 27, 2025
b5670ae
[Jinja] Fix typo in Phi-3.1-mini-128k-instruct replacement template (…
ThiloteE Jan 28, 2025
e607840
ci: selective signing and automatic release builds (#3430)
cebtenzzre Jan 28, 2025
32badd2
Support DeepSeek-R1 Qwen (#3431)
cebtenzzre Jan 29, 2025
4207680
ci: verify that installers we build function and are signed (#3432)
cebtenzzre Jan 29, 2025
0db1651
Don't block the gui thread for tool calls (#3435)
manyoso Jan 29, 2025
92ada07
ci: build offline installers when pipeline is scheduled (#3436)
cebtenzzre Jan 30, 2025
44b059b
chat: bump version to 3.8.0-dev0
cebtenzzre Jan 30, 2025
0ea95f3
ci: add missing signing holds to Windows ARM builds
cebtenzzre Jan 30, 2025
4e0fda7
chat: replace Jinja2Cpp with minja (#3433)
cebtenzzre Jan 30, 2025
f8b65c5
Display DeepSeek-R1 thinking like Reasoner (#3440)
manyoso Jan 30, 2025
8e055e9
models: add DeepSeek-R1 distillations to official models list (#3437)
cebtenzzre Jan 30, 2025
f8d5224
chat: cut v3.8.0 release (#3441)
cebtenzzre Jan 30, 2025
6b76d6d
ci: remove conflicting pipeline.git.branch requirement
cebtenzzre Jan 30, 2025
436e6de
ci: fix missing job_allow_tags
cebtenzzre Jan 30, 2025
1a11860
ci: allow generate-config to run on tags
cebtenzzre Jan 30, 2025
8c65950
chat: fix emoji corruption (#3443)
cebtenzzre Jan 30, 2025
81f9624
remove ancient README
cebtenzzre Jan 31, 2025
f348050
chat: release version 3.8.0 (#3439)
cebtenzzre Jan 31, 2025
02ee873
ci: update to Qt 6.8.2 (#3442)
cebtenzzre Jan 31, 2025
568c8a1
cmake: remove reference to deleted README
cebtenzzre Jan 31, 2025
3d01855
Fix index used by LocalDocs when tool calling/thinking is active (#3451)
cebtenzzre Feb 3, 2025
7cdc0bc
minja: update submodule to fix `{#` hang (#3446)
cebtenzzre Feb 3, 2025
31753d4
chat: work around Direct3D 11 rendering artifacts on win11 arm (#3450)
cebtenzzre Feb 3, 2025
498e651
Revert "minja: update submodule to fix `{#` hang (#3446)"
cebtenzzre Feb 3, 2025
5bfc071
Update README.md
AndriyMulyar Feb 3, 2025
668214a
Fix rebase conflicts
cgivre Feb 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ workflows:
generate-config:
jobs:
- path-filtering/filter:
filters:
tags:
only:
- /.*/
base-revision: main
config-path: .circleci/continue_config.yml
mapping: |
Expand Down
825 changes: 681 additions & 144 deletions .circleci/continue_config.yml

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,9 @@
[submodule "gpt4all-chat/deps/QXlsx"]
path = gpt4all-chat/deps/QXlsx
url = https://github.com/nomic-ai/QXlsx.git
[submodule "gpt4all-chat/deps/minja"]
path = gpt4all-chat/deps/minja
url = https://github.com/nomic-ai/minja.git
[submodule "gpt4all-chat/deps/json"]
path = gpt4all-chat/deps/json
url = https://github.com/nlohmann/json.git
5 changes: 0 additions & 5 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,6 @@ Thiago Ramos ([@thiagojramos](https://github.com/thiagojramos))<br/>
E-mail: thiagojramos@outlook.com<br/>
- pt\_BR translation

Victor Emanuel ([@SINAPSA-IC](https://github.com/SINAPSA-IC))<br/>
E-mail: contact@sinapsaro.ro<br/>
Discord: `@sinapsa_ic_56124_99632`
- ro\_RO translation

不知火 Shiranui ([@supersonictw](https://github.com/supersonictw))<br/>
E-mail: supersonic@livemail.tw<br/>
Discord: `@supersonictw`
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
<h1 align="center">GPT4All</h1>

<p align="center">
Now with support for DeepSeek R1 Distillations
</p>

<p align="center">
<a href="https://www.nomic.ai/gpt4all">Website</a> &bull; <a href="https://docs.gpt4all.io">Documentation</a> &bull; <a href="https://discord.gg/mGZE39AS3e">Discord</a> &bull; <a href="https://www.youtube.com/watch?v=gQcZDXRVJok">YouTube Tutorial</a>
</p>
Expand All @@ -23,9 +27,6 @@ https://github.com/nomic-ai/gpt4all/assets/70534565/513a0f15-4964-4109-89e4-4f9a
<p align="center">
GPT4All is made possible by our compute partner <a href="https://www.paperspace.com/">Paperspace</a>.
</p>
<p align="center">
<a href="https://www.phorm.ai/query?projectId=755eecd3-24ad-49cc-abf4-0ab84caacf63"><img src="https://img.shields.io/badge/Phorm-Ask_AI-%23F2777A.svg" alt="phorm.ai"></a>
</p>

## Download Links

Expand Down
1 change: 0 additions & 1 deletion common/common.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ function(gpt4all_add_warning_options target)
-Wextra-semi
-Wformat=2
-Wmissing-include-dirs
-Wstrict-overflow=2
-Wsuggest-override
-Wvla
# errors
Expand Down
2 changes: 1 addition & 1 deletion gpt4all-backend/deps/llama.cpp-mainline
Submodule llama.cpp-mainline updated 64 files
+3 −3 .devops/full-rocm.Dockerfile
+3 −3 .devops/llama-cli-rocm.Dockerfile
+3 −3 .devops/llama-server-rocm.Dockerfile
+1 −1 .github/workflows/build.yml
+3 −1 .github/workflows/python-type-check.yml
+8 −3 CMakeLists.txt
+5 −0 CONTRIBUTING.md
+1 −12 Makefile
+5 −2 README.md
+76 −9 ci/run.sh
+15 −3 common/arg.cpp
+7 −0 common/common.cpp
+1 −0 common/common.h
+3 −0 common/console.cpp
+102 −31 convert_hf_to_gguf.py
+2 −0 convert_hf_to_gguf_update.py
+4 −1 convert_lora_to_gguf.py
+73 −19 docs/backend/SYCL.md
+0 −1 examples/CMakeLists.txt
+0 −6 examples/benchmark/CMakeLists.txt
+0 −275 examples/benchmark/benchmark-matmult.cpp
+7 −7 examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
+0 −7 examples/cvector-generator/pca.hpp
+6 −1 examples/embedding/embedding.cpp
+49 −37 examples/gguf-split/gguf-split.cpp
+0 −6 examples/llava/clip.cpp
+2 −2 examples/llava/convert_image_encoder_to_gguf.py
+38 −1 examples/server/README.md
+199 −16 examples/server/server.cpp
+1 −1 examples/server/tests/features/embeddings.feature
+42 −0 examples/server/tests/features/rerank.feature
+53 −1 examples/server/tests/features/steps/steps.py
+1 −1 examples/server/tests/requirements.txt
+24 −1 examples/server/utils.hpp
+3 −3 flake.lock
+0 −5 ggml/include/ggml-metal.h
+36 −30 ggml/include/ggml.h
+17 −2 ggml/src/CMakeLists.txt
+2 −11 ggml/src/ggml-aarch64.c
+0 −1 ggml/src/ggml-cuda/im2col.cu
+1,899 −1,787 ggml/src/ggml-metal.m
+2 −2 ggml/src/ggml-quants.c
+0 −4 ggml/src/ggml-quants.h
+207 −209 ggml/src/ggml-vulkan.cpp
+533 −1,037 ggml/src/ggml.c
+4 −6 ggml/src/vulkan-shaders/argsort.comp
+30 −0 gguf-py/gguf/constants.py
+3 −0 gguf-py/gguf/gguf_writer.py
+14 −2 gguf-py/gguf/tensor_mapping.py
+9 −4 include/llama.h
+112 −0 models/ggml-vocab-chameleon.gguf.inp
+46 −0 models/ggml-vocab-chameleon.gguf.out
+2 −1 pyrightconfig.json
+1 −1 requirements/requirements-convert_legacy_llama.txt
+1 −1 scripts/sync-ggml.last
+191 −108 src/llama-vocab.cpp
+9 −0 src/llama-vocab.h
+370 −29 src/llama.cpp
+6 −4 src/unicode-data.cpp
+4 −4 src/unicode-data.h
+14 −7 src/unicode.cpp
+240 −217 tests/test-backend-ops.cpp
+9 −5 tests/test-grad0.cpp
+55 −35 tests/test-tokenizer-0.cpp
68 changes: 34 additions & 34 deletions gpt4all-backend/include/gpt4all-backend/llmodel.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <expected>
#include <functional>
#include <optional>
#include <span>
Expand All @@ -24,6 +25,10 @@ using namespace std::string_literals;
class LLModel {
public:
using Token = int32_t;
using PromptCallback = std::function<bool(std::span<const Token> batch, bool cached)>;
using ResponseCallback = std::function<bool(Token token, std::string_view piece)>;
using EmbedCancelCallback = bool(unsigned *batchSizes, unsigned nBatch, const char *backend);
using ProgressCallback = std::function<bool(float progress)>;

class BadArchError: public std::runtime_error {
public:
Expand Down Expand Up @@ -101,6 +106,7 @@ class LLModel {
static int32_t maxContextLength(const std::string &modelPath);
static int32_t layerCount(const std::string &modelPath);
static bool isEmbeddingModel(const std::string &modelPath);
static auto chatTemplate(const char *modelPath) -> std::expected<std::string, std::string>;
static void setImplementationsSearchPath(const std::string &path);
static const std::string &implementationsSearchPath();
static bool hasSupportedCPU();
Expand All @@ -124,7 +130,6 @@ class LLModel {
};

struct PromptContext {
int32_t n_past = 0; // number of tokens in past conversation
int32_t n_predict = 200;
int32_t top_k = 40;
float top_p = 0.9f;
Expand All @@ -136,8 +141,6 @@ class LLModel {
float contextErase = 0.5f; // percent of context to erase if we exceed the context window
};

using ProgressCallback = std::function<bool(float progress)>;

explicit LLModel() {}
virtual ~LLModel() {}

Expand All @@ -154,16 +157,12 @@ class LLModel {

// This method requires the model to return true from supportsCompletion otherwise it will throw
// an error
virtual void prompt(const std::string &prompt,
const std::string &promptTemplate,
std::function<bool(int32_t)> promptCallback,
std::function<bool(int32_t, const std::string&)> responseCallback,
bool allowContextShift,
PromptContext &ctx,
bool special = false,
std::optional<std::string_view> fakeReply = {});
virtual void prompt(std::string_view prompt,
const PromptCallback &promptCallback,
const ResponseCallback &responseCallback,
const PromptContext &ctx);

using EmbedCancelCallback = bool(unsigned *batchSizes, unsigned nBatch, const char *backend);
virtual int32_t countPromptTokens(std::string_view prompt) const;

virtual size_t embeddingSize() const {
throw std::logic_error(std::string(implementation().modelType()) + " does not support embeddings");
Expand Down Expand Up @@ -209,23 +208,22 @@ class LLModel {
void setProgressCallback(ProgressCallback callback) { m_progressCallback = callback; }

virtual int32_t contextLength() const = 0;
virtual auto specialTokens() -> std::unordered_map<std::string, std::string> const = 0;

protected:
// These are pure virtual because subclasses need to implement as the default implementation of
// 'prompt' above calls these functions
virtual std::vector<Token> tokenize(std::string_view str, bool special = false) = 0;
virtual std::vector<Token> tokenize(std::string_view str) const = 0;
virtual bool isSpecialToken(Token id) const = 0;
virtual std::string tokenToString(Token id) const = 0;
virtual void initSampler(PromptContext &ctx) = 0;
virtual void initSampler(const PromptContext &ctx) = 0;
virtual Token sampleToken() const = 0;
virtual bool evalTokens(PromptContext &ctx, std::span<const Token> tokens) const = 0;
virtual void shiftContext(PromptContext &promptCtx) = 0;
virtual bool evalTokens(int32_t nPast, std::span<const Token> tokens) const = 0;
virtual void shiftContext(const PromptContext &promptCtx, int32_t *nPast) = 0;
virtual int32_t inputLength() const = 0;
virtual void setTokenizeInputPosition(int32_t pos) = 0;
virtual auto computeModelInputPosition(PromptContext &ctx, const std::vector<Token> &input)
-> std::vector<Token>::const_iterator = 0;
virtual void setModelInputPosition(PromptContext &ctx, int32_t pos) = 0;
virtual void appendInputToken(PromptContext &ctx, Token tok) = 0;
virtual int32_t computeModelInputPosition(std::span<const Token> input) const = 0;
virtual void setModelInputPosition(int32_t pos) = 0;
virtual void appendInputToken(Token tok) = 0;
virtual std::span<const Token> inputTokens() const = 0;
virtual const std::vector<Token> &endTokens() const = 0;
virtual bool shouldAddBOS() const = 0;
Expand All @@ -242,6 +240,12 @@ class LLModel {
return -1;
}

virtual auto chatTemplate(const char *modelPath) const -> std::expected<std::string, std::string>
{
(void)modelPath;
return std::unexpected("not implemented");
}

const Implementation *m_implementation = nullptr;

ProgressCallback m_progressCallback;
Expand All @@ -253,19 +257,15 @@ class LLModel {
return true;
}

bool decodePrompt(std::function<bool(int32_t)> promptCallback,
std::function<bool(int32_t, const std::string&)> responseCallback,
bool allowContextShift,
PromptContext &promptCtx,
std::vector<Token> embd_inp,
bool isResponse = false,
bool alwaysDecode = false);
void generateResponse(std::function<bool(int32_t, const std::string&)> responseCallback,
bool allowContextShift,
PromptContext &promptCtx);

protected:
Token m_tokenize_last_token = -1; // not serialized
// prefill context with prompt
auto decodePrompt(const PromptCallback &promptCallback,
const PromptContext &promptCtx,
std::vector<Token> embd_inp)
-> std::optional<int32_t>;
// generate a response
void generateResponse(const ResponseCallback &responseCallback,
const PromptContext &promptCtx,
int32_t nPast);

friend class LLMImplementation;
};
Expand Down
44 changes: 23 additions & 21 deletions gpt4all-backend/include/gpt4all-backend/llmodel_c.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,15 @@ typedef int32_t token_t;
* behavior.
*/
struct llmodel_prompt_context {
int32_t n_past; // number of tokens in past conversation
int32_t n_predict; // number of tokens to predict
int32_t top_k; // top k logits to sample from
float top_p; // nucleus sampling probability threshold
float min_p; // Min P sampling
float temp; // temperature to adjust model's output distribution
float top_p; // nucleus sampling probability threshold
float min_p; // Min P sampling
float temp; // temperature to adjust model's output distribution
int32_t n_batch; // number of predictions to generate in parallel
float repeat_penalty; // penalty factor for repeated tokens
float repeat_penalty; // penalty factor for repeated tokens
int32_t repeat_last_n; // last n tokens to penalize
float context_erase; // percent of context to erase if we exceed the context window
float context_erase; // percent of context to erase if we exceed the context window
};

struct llmodel_gpu_device {
Expand All @@ -63,18 +62,20 @@ typedef struct llmodel_gpu_device llmodel_gpu_device;

/**
* Callback type for prompt processing.
* @param token_id The token id of the prompt.
* @param token_ids An array of token ids of the prompt.
* @param n_token_ids The number of tokens in the array.
* @param cached Whether the tokens were already in cache.
* @return a bool indicating whether the model should keep processing.
*/
typedef bool (*llmodel_prompt_callback)(int32_t token_id);
typedef bool (*llmodel_prompt_callback)(const token_t *token_ids, size_t n_token_ids, bool cached);

/**
* Callback type for response.
* @param token_id The token id of the response.
* @param response The response string. NOTE: a token_id of -1 indicates the string is an error string.
* @return a bool indicating whether the model should keep generating.
*/
typedef bool (*llmodel_response_callback)(int32_t token_id, const char *response);
typedef bool (*llmodel_response_callback)(token_t token_id, const char *response);

/**
* Embedding cancellation callback for use with llmodel_embed.
Expand All @@ -85,6 +86,8 @@ typedef bool (*llmodel_response_callback)(int32_t token_id, const char *response
*/
typedef bool (*llmodel_emb_cancel_callback)(unsigned *batch_sizes, unsigned n_batch, const char *backend);

typedef void (*llmodel_special_token_callback)(const char *name, const char *token);

/**
* Create a llmodel instance.
* Recognises correct model type from file at model_path
Expand Down Expand Up @@ -183,22 +186,17 @@ uint64_t llmodel_state_set_data(llmodel_model model, const uint8_t *state, uint6
* Generate a response using the model.
* @param model A pointer to the llmodel_model instance.
* @param prompt A string representing the input prompt.
* @param prompt_template A string representing the input prompt template.
* @param prompt_callback A callback function for handling the processing of prompt.
* @param response_callback A callback function for handling the generated response.
* @param allow_context_shift Whether to allow shifting of context to make room for more input.
* @param special True if special tokens in the prompt should be processed, false otherwise.
* @param fake_reply A string to insert into context as the model's reply, or NULL to generate one.
* @param ctx A pointer to the llmodel_prompt_context structure.
* @param error A pointer to a string; will only be set on error.
*/
void llmodel_prompt(llmodel_model model, const char *prompt,
const char *prompt_template,
llmodel_prompt_callback prompt_callback,
llmodel_response_callback response_callback,
bool allow_context_shift,
llmodel_prompt_context *ctx,
bool special,
const char *fake_reply);
bool llmodel_prompt(llmodel_model model,
const char *prompt,
llmodel_prompt_callback prompt_callback,
llmodel_response_callback response_callback,
llmodel_prompt_context *ctx,
const char **error);

/**
* Generate an embedding using the model.
Expand Down Expand Up @@ -310,6 +308,10 @@ const char *llmodel_model_backend_name(llmodel_model model);
*/
const char *llmodel_model_gpu_device_name(llmodel_model model);

int32_t llmodel_count_prompt_tokens(llmodel_model model, const char *prompt, const char **error);

void llmodel_model_foreach_special_token(llmodel_model model, llmodel_special_token_callback callback);

#ifdef __cplusplus
}
#endif
Expand Down
Loading