feat: add llamacpp params #221

nguyenhoangthuan99 · 2024-09-10T12:04:34Z

nguyenhoangthuan99 · 2024-09-10T12:14:46Z

In this PR the following params will be added:

Chat completion:
- seed = -1; RNG seed (default: -1, use random seed for < 0)
- dynatemp_range = 0.0f; dynamic temperature range (default: 0.0, 0.0 = disabled)
- dynatemp_exponent = 1.0f; dynamic temperature exponent (default: 1.0)
- top_k = 40; top-k sampling (default: 40, 0 = disabled)
- min_p = 0.05f; min-p sampling (default: 0.05, 0.0 = disabled)
- tfs_z = 1.0f; tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
- typ_p = 1.0f; locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
- repeat_last_n = 64; last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
- penalty_repeat = 1.0f; penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)
- mirostat = false; use Mirostat sampling.
- mirostat_tau = 5.0f; Mirostat target entropy, parameter tau (default: 5.0)
- mirostat_eta = 0.1f; Mirostat learning rate, parameter eta (default: 0.1)
- penalize_nl = false; penalize newline tokens (default: false)
- ignore_eos = false; ignore end of stream token and continue generating (implies --logit-bias EOS-inf)
- n_probs = 0; number of log probs per token to returb
- min_keep = 0;
- grammar;
Load Model:
- n_predict : number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)
- prompt : prompt to start generation with
  in conversation mode, this will be used as system prompt
  (default: '')
- conversation: run in conversation mode, does not print special tokens and suffix/prefix
  if suffix/prefix are not specified, default chat template will be used
  (default: false)
- special : special tokens output enabled (default: false)

This Pr also support return log probs when the option n_probs > 0

dan-menlo · 2024-09-11T01:38:40Z

src/chat_completion_request.h

    completion.stop = (*jsonBody)["stop"];
    completion.model_id = (*jsonBody).get("model", {}).asString();
+
+    completion.seed = (*jsonBody).get("seed", -1).asInt();


I notice this PR defines default values twice:

struct definition (above)

JSON parsing default value

Are we able to define once?

DRY principle: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

I followed the previous implementation like this one https://github.com/janhq/cortex.llamacpp/blob/main/src/chat_completion_request.h#L8, maybe some weird bug in the past force us to do it. Like this PR I fixed the race condition, even if we checked the everything code (mutex, only return slot if available, ... ) but error still pop up. So I have to add another check for if (slot==null) so that the issue can be resolved.

We are using third party lib for json so I think it's no harm to double check and make sure it work well, but if it necessary, I'll change it, but we need to test more. to make sure it won't break any thing

dan-menlo · 2024-09-11T01:42:11Z

src/llama_engine.cc

+  data["tfs_z"] = completion.tfs_z;
+  data["typical_p"] = completion.typ_p;
+  data["repeat_last_n"] = completion.repeat_last_n;
+  data["repeat_penalty"] = completion.penalty_repeat;


Woah, is there a way for us to align our penalty_repeat param with the original llama.cpp repeat_penalty?

This is the sort of thing that trips an intern up a year from now

If we align all params, is there a more elegant way to copy aligned k-v pairs from one struct to another? (llama3.1 tells me std::copy)

I think it's impossible because the data is a json datatype, and completion is our custom struct datatype. json don't have overload operator = for json with our custom completion struct

about the penalty_repeat and repeat_penalty, it is the same with previous implementation with frequency_penalty https://github.com/janhq/cortex.llamacpp/blob/main/src/llama_server_context.cc#L445 . I think it is a way to provide unique params template interface for API.

dan-menlo · 2024-09-11T01:54:59Z

src/llama_engine.cc

        if (!result.error) {
-          std::string to_send = result.result_json["content"];
+          std::string to_send;
+          if (n_probs > 0){


Can I verify my understanding about n_probs:

From llama.cpp server docs if n_probs > 0, resp contains probabilities of N tokens

However, we don't seem to send the content back to the user in this case?

Or: do we send both content, and also completion_probabilities?

We should align with the conventions in llama.cpp's server, as much as possible

Resources

n_probs doesnot return completion_probabilities ggml-org/llama.cpp#4088

our implementation can return this form ggml-org/llama.cpp#4088 (comment), both content and list of (token - probs) for each token

dan-menlo · 2024-09-11T01:57:48Z

src/llama_server_context.cc

+  slot->sparams.dynatemp_exponent =
+      json_value(data, "dynatemp_exponent", default_sparams.dynatemp_exponent);
+  slot->sparams.ignore_eos =
+      json_value(data, "ignore_eos", default_sparams.ignore_eos);


Can I check my understanding:

This code filters for top N tokens given sampling settings

Fills out completion_probabilities k-v

Actually, the return n probs is in serveral place inside codebase.
At this part, for every inference step it will add n probs to the result https://github.com/janhq/cortex.llamacpp/blob/main/src/llama_server_context.cc#L1675.

With stream mode in this line it will form the return json https://github.com/janhq/cortex.llamacpp/blob/main/src/llama_server_context.cc#L919

feat-add-llamacpp-params

a430f59

nguyenhoangthuan99 requested review from dan-menlo and vansangpfiev September 10, 2024 12:08

nguyenhoangthuan99 mentioned this pull request Sep 10, 2024

epic: llama.cpp params are settable via API call or model.yaml janhq/cortex.cpp#1151

Closed

7 tasks

vansangpfiev marked this pull request as ready for review September 11, 2024 01:20

vansangpfiev changed the title ~~feat-add-llamacpp-params~~ feat: add llamacpp params Sep 11, 2024

dan-menlo reviewed Sep 11, 2024

View reviewed changes

Merge branch 'main' into feat-add-model-and-chat-llamacpp-params

ab53861

vansangpfiev approved these changes Sep 12, 2024

View reviewed changes

nguyenhoangthuan99 merged commit b6372bb into main Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add llamacpp params #221

feat: add llamacpp params #221

Uh oh!

nguyenhoangthuan99 commented Sep 10, 2024 •

edited

Loading

Uh oh!

nguyenhoangthuan99 commented Sep 10, 2024 •

edited

Loading

Uh oh!

dan-menlo Sep 11, 2024

Uh oh!

nguyenhoangthuan99 Sep 11, 2024 •

edited

Loading

Uh oh!

dan-menlo Sep 11, 2024

Uh oh!

nguyenhoangthuan99 Sep 11, 2024

Uh oh!

nguyenhoangthuan99 Sep 11, 2024

Uh oh!

dan-menlo Sep 11, 2024

Uh oh!

nguyenhoangthuan99 Sep 11, 2024

Uh oh!

dan-menlo Sep 11, 2024

Uh oh!

nguyenhoangthuan99 Sep 11, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add llamacpp params #221

feat: add llamacpp params #221

Uh oh!

Conversation

nguyenhoangthuan99 commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nguyenhoangthuan99 commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dan-menlo Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

nguyenhoangthuan99 Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dan-menlo Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

nguyenhoangthuan99 Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

nguyenhoangthuan99 Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

dan-menlo Sep 11, 2024

Choose a reason for hiding this comment

Resources

Uh oh!

nguyenhoangthuan99 Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

dan-menlo Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

nguyenhoangthuan99 Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nguyenhoangthuan99 commented Sep 10, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 10, 2024 •

edited

Loading

nguyenhoangthuan99 Sep 11, 2024 •

edited

Loading

nguyenhoangthuan99 Sep 11, 2024 •

edited

Loading