You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+6-2
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,8 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
11
11
12
12
### Hot topics
13
13
14
-
- Parallel decoding + continuous batching support incoming: [#3228](https://github.com/ggerganov/llama.cpp/pull/3228)\
14
+
- ‼️ Breaking change: `rope_freq_base` and `rope_freq_scale` must be set to zero to use the model default values: [#3401](https://github.com/ggerganov/llama.cpp/pull/3401)
15
+
- Parallel decoding + continuous batching support added: [#3228](https://github.com/ggerganov/llama.cpp/pull/3228)\
15
16
**Devs should become familiar with the new API**
16
17
- Local Falcon 180B inference on Mac Studio
17
18
@@ -92,7 +93,8 @@ as the main playground for developing new features for the [ggml](https://github
-[X][Baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B) and its derivations (such as [baichuan-7b-sft](https://huggingface.co/hiyouga/baichuan-7b-sft))
The `grammars/` folder contains a handful of sample grammars. To write your own, check out the [GBNF Guide](./grammars/README.md).
664
666
667
+
For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save for local use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
668
+
665
669
### Instruction mode with Alpaca
666
670
667
671
1. First, download the `ggml` Alpaca model into the `./models` folder
This example shows how to use the infill mode with Code Llama models supporting infill mode.
4
+
Currently the 7B and 13B models support infill mode.
5
+
6
+
Infill supports most of the options available in the main example.
7
+
8
+
For further information have a look at the main README.md in llama.cpp/example/main/README.md
9
+
10
+
## Common Options
11
+
12
+
In this section, we cover the most commonly used options for running the `infill` program with the LLaMA models:
13
+
14
+
-`-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.bin`).
15
+
-`-i, --interactive`: Run the program in interactive mode, allowing you to provide input directly and receive real-time responses.
16
+
-`-n N, --n-predict N`: Set the number of tokens to predict when generating text. Adjusting this value can influence the length of the generated text.
17
+
-`-c N, --ctx-size N`: Set the size of the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference.
18
+
19
+
## Input Prompts
20
+
21
+
The `infill` program provides several ways to interact with the LLaMA models using input prompts:
22
+
23
+
-`--in-prefix PROMPT_BEFORE_CURSOR`: Provide the prefix directly as a command-line option.
24
+
-`--in-suffix PROMPT_AFTER_CURSOR`: Provide the suffix directly as a command-line option.
25
+
-`--interactive-first`: Run the program in interactive mode and wait for input right away. (More on this below.)
26
+
27
+
## Interaction
28
+
29
+
The `infill` program offers a seamless way to interact with LLaMA models, allowing users to receive real-time infill suggestions. The interactive mode can be triggered using `--interactive`, and `--interactive-first`
30
+
31
+
### Interaction Options
32
+
33
+
-`-i, --interactive`: Run the program in interactive mode, allowing users to get real time code suggestions from model.
34
+
-`--interactive-first`: Run the program in interactive mode and immediately wait for user input before starting the text generation.
35
+
-`--color`: Enable colorized output to differentiate visually distinguishing between prompts, user input, and generated text.
0 commit comments