instruct.cpp, continue on empty line, endless instruct mode, refactors #555

anzz1 · 2023-03-27T13:33:55Z

move instruct mode from main.cpp to instruct.cpp
entering empty line passes back control without new input in interactive/instruct modes
endless instruct mode with --n_predict -1
small refactorings

Closes

Create "instruct" example #508

- move instruct mode from main.cpp to instruct.cpp - entering empty line passes back control without new input in interactive/instruct modes - endless instruct mode with --n_predict -1 - small refactorings

anzz1 · 2023-03-27T13:37:45Z

It didn't require much changes tbh, just moving stuff around. Only major thing was checking whether the '### Instruction:` prefix was already found as antiprompt and not inputing it twice. this should greatly reduce instruct mode getting stuck in a loop.

./instruct -c 2024 -n -1 -m ./models/alpaca-7B-ggml/ggml-model-q4_0.bin -f ./prompts/alpaca.txt for infinite generation.

rabidcopy · 2023-03-27T13:50:19Z

Can't test right now but glancing over, things look good! 👍 Will test instruct with a low ctx_size to make sure context swapping and --keep works and it doesn't lose the plot. Edit: Is the alpaca.txt prompt not used? Not sure how important the Below is an instruction that describes a task. Write a response that appropriately completes the request. part is as a starting prompt before the Instruction and Response conversation. From my understanding the Alpaca model is meant to use this starting prompt and format for best results as that was how it was trained. (I think..)

not removing it completely though, just the help message for it

anzz1 · 2023-03-27T14:16:23Z

@rabidcopy Yeah, you are right, the alpaca models are designed to use exactly that prompt when running the inference. It has caused confusion in the past since there are other prompts too, but they are used for training (with input layer too) while only that single one is supposed to be used when running the inference afterwards.

I do however think that the original design of having the prompt as a file of its own is better design than hardcoding it. If hardcoded, the 'instruct.cpp' should probably be named 'alpaca.cpp' then, as it's then definitely only for the alpaca model. Without hardcoding the prompt, you can use the instruct-response mode with other models too or test if you get different results with different prompts.

And even for alpacas themselves, I know the stanford people said that 'this is the thing you're supposed to use it with', it still is really just a guideline. Who knows what results can be got possible by tweaking the prompt. I think it makes no sense to remove such flexibility just to save a command line option.

Writing the readmes and pointing towards the right scripts is the right option imo, not decreasing choice.

rabidcopy · 2023-03-27T14:21:00Z

@rabidcopy Yeah, you are right, the alpaca models are designed to use exactly that prompt when running the inference. It has caused confusion in the past since there are other prompts too, but they are used for training (with input layer too) while only that single one is supposed to be used when running the inference afterwards.

I do however think that the original design of having the prompt as a file of its own is better design than hardcoding it. If hardcoded, the 'instruct.cpp' should probably be named 'alpaca.cpp' then, as it's then definitely only for the alpaca model. Without hardcoding the prompt, you can use the instruct-response mode with other models too or test if you get different results with different prompts.

Yeah, I was just making sure that it makes sense to use alpaca.txt manually if running Alpaca. Don't want to hardcode a prompt or anything. Much prefer the flexibility. As an example tweaking alpaca.txt to specify that you want it to write regexes or code specifically.

anzz1 · 2023-03-27T14:26:45Z

Yeah, I was just making sure that it makes sense to use alpaca.txt manually if running Alpaca. Don't want to hardcode a prompt or anything. Much prefer the flexibility.

Oh yeah, sure 👍 . Should get into getting the readmes up to snuff so everything is clear, which it surely isn't now 😄 . I pushed a update to the alpaca.sh script too so the args are now correct.

Idk why, but I seem to get better results with the default values than the high batch number and top_k values.
Like the values before on the script --temp 0.2 -b 256 --top_k 10000 --repeat_penalty 1 it produces better writing but worse info, while the default values are the other way around. Like the higher batch and top_k produces more consistent but wrong info. Lower values are less coherent and consistent, but the information is more often correct. strange...

But for anyone, feel free to change if you got better default values for the alpacas. I have no idea whether to tweak them or not.

x02Sylvie · 2023-03-27T14:46:36Z

And even for alpacas themselves, I know the stanford people said that 'this is the thing you're supposed to use it with', it still is really just a guideline. Who knows what results can be got possible by tweaking the prompt. I think it makes no sense to remove such flexibility just to save a command line option.

Does using --instruct instead of --interactive give greater results with alpaca models?

I had good results in alpaca with --interactive for chatbot stuff by pretty much using modified version of this prompt (https://github.com/ggerganov/llama.cpp/blob/master/prompts/chat-with-bob.txt) since I wasn't sure how to do chatbot stuff with --instruct thing, and results were better than llama model so I went along with it

rabidcopy · 2023-03-27T14:51:57Z

Does using --instruct instead of --interactive give greater results with alpaca models?

Yes and no. Instruct prepends "### Instruction:" to all your inputs and "### Response:" to all the outputs. This is how the Alpaca models are meant to be used if doing instruction-based tasks. But for chatbots you'll want to stick to interactive and setting a reverse prompt like -r "User: if using the chat-with-bob.txt for example. I don't know if Alpaca is worse or better there compared to Llama when not doing instruction based prompts.

anzz1 · 2023-03-27T16:07:55Z

@x02Sylvie

First of all, this update doesn't really change much anything , just moves the instruct portion as its' own file and does some fixes, the general functionality is still the same.

And answer to your question is, well, it depends. Like said, the alpaca models are supposed to be used with the instruction-response model. However, that doesnt mean you have to use it.

There are a lot of things to tweak , different models to test and all sorts of combinations of everything which produce different results and there is really no right or wrong.

Okay here's my findings, purely anecdotal:

I actually found out exactly the same thing as you did, that the alpaca models for 7B were better for the "chat with bob" usage than their llama counterparts. that isn't using them how they are "supposed to" be used, but it works well. however upping to 13B, the llama versions work better with "chat with bob". alpacas are always better in the "instruct" model.

In general I've found that the instruct model, when used with alpacas, works very well for question-answer (well, its instruction-response after all) type of communication but not so much for long stories, chatbot style, or any other type of text than the question-answer style. The instruct mode generally produces better answers than asking the same ones in "chat with bob", but it also isn't so much of an conversation and it pretty much lacks any personality.

Having only 32GB of RAM, I can't run the 65B models so I have no idea of their quality.
However I've found that going from 13B -> 30B doesn't produce any tangible difference , maybe it's just a little bit better but not worth the speed and performance penalty in my opinion.

I've tested pretty much everything, and my S-tier list is currently as follows, depending on use case:

instruct mode: palpaca-7B-ggml (q4_1)
chat/other: llama-13B-ggml (GPTQ)

and if I couldn't run the 13B llama, the palpaca-7b is a good allrounder which can be used for everything really. the llama-13b is slightly better for chatbot/story usage, but the 7b palpaca is fine too.

the general wisdom and consensus is that the GPTQ quantization is the best one and the perplexity tests give it an slight edge over the RTN ones. however, im not certain if thats actually the case. the new q4_1 algorithm looks pretty hot, and ive found it to be best one for the palpaca 7B. however, for the larger 13B llama, GPTQ is better. idk what the perplexity scores are gonna say when they are tested, but subjectively ive come to this conclusion based in subjective analysis of 'quality' only.

all that being said, i've mostly sticked with the default settings and -c 2024 -n -1 , finding that I got usually worse results by tweaking. but there obviously could be some combination of tweaks which produce even better results, i just haven't found em yet. overall im pretty happy the way it currently is as it works well and is fast.

sha256:

26138e1e96883a398e0ff7b9ef6056844b4d28cc914064c54e3ed90a100c15df *models/llama-13B-ggml/ggml-model-gptq4.bin
2a5335b1b524cb6baee15861942de110d341acadbfff5c8a80c6a92726160991 *models/palpaca-7B-ggml/ggml-model-q4_1.bin

TL;DR;
Just try all of the combinations, there are no rules, see for yourself which you like best. There isn't really a right answer. One might prefer accuracy over personality or vice-versa, and things like that. One size doesn't fit all, at least not yet.

tjohnman · 2023-03-27T18:25:25Z

@anzz1 the torrent for ggml-model-f16.bin seems to be unseeded. Is there some other way to get this alpaca? I think it's the one model I haven't downloaded.

I'm going to see if I can quantize it from chavinlo/alpaca-native on HF.

sw · 2023-03-27T19:37:39Z

Should we include all the examples in the Makefile? Then it should be updated. #540 also concerns this.

thement · 2023-03-27T20:16:18Z

move instruct mode from main.cpp to instruct.cpp

entering empty line passes back control without new input in interactive/instruct modes

endless instruct mode with --n_predict -1

small refactorings

Closes

Create "instruct" example #508

The instruct.cpp is on the first glance just a copy-and-paste of main.cpp with minor changes. This does introduce a big technical debt for the future as all bug fixes/improvements would have to be done twice in both codebases. It also isn't a very good practice.

anzz1 · 2023-03-28T06:01:39Z

Should we include all the examples in the Makefile? Then it should be updated. #540 also concerns this.

You are correct.

edit: Added, good catch 👍

The instruct.cpp is on the first glance just a copy-and-paste of main.cpp with minor changes. This does introduce a big technical debt for the future as all bug fixes/improvements would have to be done twice in both codebases. It also isn't a very good practice.

Correct, that's exactly what it is. While I do mostly agree with you, this needs to be done to further the goal of moving towards multiple small example binaries rather than one large one. That was the decision done, but you're right it's not a inherently good or bad thing, just a choice of direction. What you're arguing against, is the direction which is already decided to be the one to take. I myself do agree with that direction but that's irrelevant.

So when that has sunk in that that the instruct.cpp needs to be its' own thing and why this PR exists, the next argument is the overlapping functions which could be refactored to a common file. I already have further refactoring of the common functions set_console_state , sigint_handler and win32_console_init to common.cpp on the pipeline. Once you remove those, there isn't really much of a accumulator of technical debt left as there isn't much code at all in the examples.

I just prefer doing PR's with smaller steps at a time rather than big ones. That's why it isn't refactored here yet.

sw · 2023-03-28T07:08:52Z

examples/instruct/instruct.cpp

+
+int main(int argc, char ** argv) {
+    gpt_params params;
+    params.model = "models/llama-7B/ggml-model.bin";


If you set ./models/alpaca-7B-ggml/ggml-model-q4_0.bin here as the default model, the command line might get simple enough to not need the shell script, which would be better for Windows users. Would require updating the top-level README.

ggerganov · 2023-03-28T07:23:27Z

I had the impression that Alpaca models require a more involved prompting via the "instruction" / "response" stuff and was thinking that the instruct example can provide more flexibility for this. I don't know the details, but if it is actually not too different from main then we can drop this idea.

Btw, the color / windows stuff can easily be moved to common for reusing.
Also the input-handling portion of reading stdin and updating embd_inp could probably also get factored as a common function for reusing.

But again, if you think it is not worth a separate example - let's just drop it for now

anzz1 · 2023-03-28T11:26:10Z

It is indeed not too different, I could merge the fixes back to main and move the stuff to common. Tbh it doesn't make really much sense to bifurcate it right now , but could make sense in the future.

anzz1 · 2023-03-28T12:44:57Z

merged the fixes in here to main.cpp

main.cpp fixes, refactoring #571

instruct.cpp, continue on empty line, endless instruct mode, refactors

c8048f0

- move instruct mode from main.cpp to instruct.cpp - entering empty line passes back control without new input in interactive/instruct modes - endless instruct mode with --n_predict -1 - small refactorings

anzz1 mentioned this pull request Mar 27, 2023

Continue on empty line #529

Closed

Update alpaca.sh

120bda4

anzz1 marked this pull request as ready for review March 27, 2023 16:47

anzz1 requested review from bitRAKE, ggerganov, blackhole89, cocktailpeanut, eiz, Green-Sky, j-f1, rgerganov, slaren, sw and thement March 27, 2023 16:48

anzz1 added 2 commits March 28, 2023 09:14

Add instruct.cpp to Makefile

5ffcf7e

Merge branch 'master' into feat-instruct-cpp

9e03cba

sw reviewed Mar 28, 2023

View reviewed changes

anzz1 closed this Mar 28, 2023

anzz1 deleted the feat-instruct-cpp branch March 28, 2023 11:27

anzz1 mentioned this pull request Mar 28, 2023

main.cpp fixes, refactoring #571

Merged

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instruct.cpp, continue on empty line, endless instruct mode, refactors #555

instruct.cpp, continue on empty line, endless instruct mode, refactors #555

anzz1 commented Mar 27, 2023

anzz1 commented Mar 27, 2023 •

edited

Loading

rabidcopy commented Mar 27, 2023 •

edited

Loading

anzz1 commented Mar 27, 2023

rabidcopy commented Mar 27, 2023 •

edited

Loading

anzz1 commented Mar 27, 2023 •

edited

Loading

x02Sylvie commented Mar 27, 2023

rabidcopy commented Mar 27, 2023 •

edited

Loading

anzz1 commented Mar 27, 2023 •

edited

Loading

tjohnman commented Mar 27, 2023 •

edited

Loading

sw commented Mar 27, 2023

thement commented Mar 27, 2023

anzz1 commented Mar 28, 2023 •

edited

Loading

sw Mar 28, 2023

ggerganov commented Mar 28, 2023

anzz1 commented Mar 28, 2023

anzz1 commented Mar 28, 2023

instruct.cpp, continue on empty line, endless instruct mode, refactors #555

instruct.cpp, continue on empty line, endless instruct mode, refactors #555

Conversation

anzz1 commented Mar 27, 2023

anzz1 commented Mar 27, 2023 • edited Loading

rabidcopy commented Mar 27, 2023 • edited Loading

anzz1 commented Mar 27, 2023

rabidcopy commented Mar 27, 2023 • edited Loading

anzz1 commented Mar 27, 2023 • edited Loading

x02Sylvie commented Mar 27, 2023

rabidcopy commented Mar 27, 2023 • edited Loading

anzz1 commented Mar 27, 2023 • edited Loading

tjohnman commented Mar 27, 2023 • edited Loading

sw commented Mar 27, 2023

thement commented Mar 27, 2023

anzz1 commented Mar 28, 2023 • edited Loading

sw Mar 28, 2023

Choose a reason for hiding this comment

ggerganov commented Mar 28, 2023

anzz1 commented Mar 28, 2023

anzz1 commented Mar 28, 2023

anzz1 commented Mar 27, 2023 •

edited

Loading

rabidcopy commented Mar 27, 2023 •

edited

Loading

rabidcopy commented Mar 27, 2023 •

edited

Loading

anzz1 commented Mar 27, 2023 •

edited

Loading

rabidcopy commented Mar 27, 2023 •

edited

Loading

anzz1 commented Mar 27, 2023 •

edited

Loading

tjohnman commented Mar 27, 2023 •

edited

Loading

anzz1 commented Mar 28, 2023 •

edited

Loading