You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`--props`| enable changing global properties via POST /props (default: disabled)<br/>(env: LLAMA_ARG_ENDPOINT_PROPS) |
@@ -320,7 +317,6 @@ node index.js
320
317
321
318
- The prompt is a string or an array with the first element given as a string
322
319
- The model's `tokenizer.ggml.add_bos_token` metadata is `true`
323
-
- The system prompt is empty
324
320
325
321
`temperature`: Adjust the randomness of the generated text. Default: `0.8`
326
322
@@ -378,6 +374,8 @@ node index.js
378
374
379
375
`min_keep`: If greater than 0, force samplers to return N possible tokens at minimum. Default: `0`
380
376
377
+
`t_max_predict_ms`: Set a time limit in milliseconds for the prediction (a.k.a. text-generation) phase. The timeout will trigger if the generation takes more than the specified time (measured since the first token was generated) and if a new-line character has already been generated. Useful for FIM applications. Default: `0`, which is disabled.
378
+
381
379
`image_data`: An array of objects to hold base64-encoded image `data` and its `id`s to be reference in `prompt`. You can determine the place of the image in the prompt as in the following: `USER:[img-12]Describe the image in detail.\nASSISTANT:`. In this case, `[img-12]` will be replaced by the embeddings of the image with id `12` in the following `image_data` array: `{..., "image_data": [{"data": "<BASE64_STRING>", "id": 12}]}`. Use `image_data` only with multimodal models, e.g., LLaVA.
382
380
383
381
`id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot. Default: `-1`
@@ -536,14 +534,12 @@ This endpoint is public (no API key check). By default, it is read-only. To make
536
534
537
535
```json
538
536
{
539
-
"system_prompt": "",
540
537
"default_generation_settings": { ... },
541
538
"total_slots": 1,
542
539
"chat_template": ""
543
540
}
544
541
```
545
542
546
-
-`system_prompt` - the system prompt (initial prompt of all slots). Please note that this does not take into account the chat template. It will append the prompt at the beginning of formatted prompt.
547
543
-`default_generation_settings` - the default generation settings for the `/completion` endpoint, which has the same fields as the `generation_settings` response object from the `/completion` endpoint.
548
544
-`total_slots` - the total number of slots for process requests (defined by `--parallel` option)
549
545
-`chat_template` - the model's original Jinja2 prompt template
@@ -554,7 +550,7 @@ To use this endpoint with POST method, you need to start server with `--props`
554
550
555
551
*Options:*
556
552
557
-
-`system_prompt`: Change the system prompt (initial prompt of all slots). Please note that this does not take into account the chat template. It will append the prompt at the beginning of formatted prompt.
553
+
-None yet
558
554
559
555
### POST `/v1/chat/completions`: OpenAI-compatible Chat Completions API
0 commit comments