-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
instruct.cpp, continue on empty line, endless instruct mode, refactors #555
Conversation
- move instruct mode from main.cpp to instruct.cpp - entering empty line passes back control without new input in interactive/instruct modes - endless instruct mode with --n_predict -1 - small refactorings
It didn't require much changes tbh, just moving stuff around. Only major thing was checking whether the '### Instruction:` prefix was already found as antiprompt and not inputing it twice. this should greatly reduce instruct mode getting stuck in a loop.
|
Can't test right now but glancing over, things look good! 👍 Will test instruct with a low ctx_size to make sure context swapping and |
not removing it completely though, just the help message for it
@rabidcopy Yeah, you are right, the alpaca models are designed to use exactly that prompt when running the inference. It has caused confusion in the past since there are other prompts too, but they are used for training (with input layer too) while only that single one is supposed to be used when running the inference afterwards. I do however think that the original design of having the prompt as a file of its own is better design than hardcoding it. If hardcoded, the 'instruct.cpp' should probably be named 'alpaca.cpp' then, as it's then definitely only for the alpaca model. Without hardcoding the prompt, you can use the instruct-response mode with other models too or test if you get different results with different prompts. And even for alpacas themselves, I know the stanford people said that 'this is the thing you're supposed to use it with', it still is really just a guideline. Who knows what results can be got possible by tweaking the prompt. I think it makes no sense to remove such flexibility just to save a command line option. Writing the readmes and pointing towards the right scripts is the right option imo, not decreasing choice. |
Yeah, I was just making sure that it makes sense to use alpaca.txt manually if running Alpaca. Don't want to hardcode a prompt or anything. Much prefer the flexibility. As an example tweaking alpaca.txt to specify that you want it to write regexes or code specifically. |
Oh yeah, sure 👍 . Should get into getting the readmes up to snuff so everything is clear, which it surely isn't now 😄 . I pushed a update to the alpaca.sh script too so the args are now correct. Idk why, but I seem to get better results with the default values than the high batch number and top_k values. But for anyone, feel free to change if you got better default values for the alpacas. I have no idea whether to tweak them or not. |
Does using --instruct instead of --interactive give greater results with alpaca models? I had good results in alpaca with --interactive for chatbot stuff by pretty much using modified version of this prompt (https://github.com/ggerganov/llama.cpp/blob/master/prompts/chat-with-bob.txt) since I wasn't sure how to do chatbot stuff with --instruct thing, and results were better than llama model so I went along with it |
Yes and no. Instruct prepends "### Instruction:" to all your inputs and "### Response:" to all the outputs. This is how the Alpaca models are meant to be used if doing instruction-based tasks. But for chatbots you'll want to stick to interactive and setting a reverse prompt like |
First of all, this update doesn't really change much anything , just moves the instruct portion as its' own file and does some fixes, the general functionality is still the same. And answer to your question is, well, it depends. Like said, the alpaca models are supposed to be used with the instruction-response model. However, that doesnt mean you have to use it. There are a lot of things to tweak , different models to test and all sorts of combinations of everything which produce different results and there is really no right or wrong. Okay here's my findings, purely anecdotal: I actually found out exactly the same thing as you did, that the alpaca models for 7B were better for the "chat with bob" usage than their llama counterparts. that isn't using them how they are "supposed to" be used, but it works well. however upping to 13B, the llama versions work better with "chat with bob". alpacas are always better in the "instruct" model. In general I've found that the instruct model, when used with alpacas, works very well for question-answer (well, its instruction-response after all) type of communication but not so much for long stories, chatbot style, or any other type of text than the question-answer style. The instruct mode generally produces better answers than asking the same ones in "chat with bob", but it also isn't so much of an conversation and it pretty much lacks any personality. Having only 32GB of RAM, I can't run the 65B models so I have no idea of their quality. I've tested pretty much everything, and my S-tier list is currently as follows, depending on use case: instruct mode: palpaca-7B-ggml (q4_1) and if I couldn't run the 13B llama, the palpaca-7b is a good allrounder which can be used for everything really. the llama-13b is slightly better for chatbot/story usage, but the 7b palpaca is fine too. the general wisdom and consensus is that the GPTQ quantization is the best one and the perplexity tests give it an slight edge over the RTN ones. however, im not certain if thats actually the case. the new q4_1 algorithm looks pretty hot, and ive found it to be best one for the palpaca 7B. however, for the larger 13B llama, GPTQ is better. idk what the perplexity scores are gonna say when they are tested, but subjectively ive come to this conclusion based in subjective analysis of 'quality' only. all that being said, i've mostly sticked with the default settings and sha256:
TL;DR; |
@anzz1 the torrent for ggml-model-f16.bin seems to be unseeded. Is there some other way to get this alpaca? I think it's the one model I haven't downloaded. I'm going to see if I can quantize it from chavinlo/alpaca-native on HF. |
Should we include all the examples in the |
The |
You are correct. edit: Added, good catch 👍
Correct, that's exactly what it is. While I do mostly agree with you, this needs to be done to further the goal of moving towards multiple small example binaries rather than one large one. That was the decision done, but you're right it's not a inherently good or bad thing, just a choice of direction. What you're arguing against, is the direction which is already decided to be the one to take. I myself do agree with that direction but that's irrelevant. So when that has sunk in that that the I just prefer doing PR's with smaller steps at a time rather than big ones. That's why it isn't refactored here yet. |
|
||
int main(int argc, char ** argv) { | ||
gpt_params params; | ||
params.model = "models/llama-7B/ggml-model.bin"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set ./models/alpaca-7B-ggml/ggml-model-q4_0.bin
here as the default model, the command line might get simple enough to not need the shell script, which would be better for Windows users. Would require updating the top-level README.
I had the impression that Alpaca models require a more involved prompting via the "instruction" / "response" stuff and was thinking that the Btw, the color / windows stuff can easily be moved to But again, if you think it is not worth a separate example - let's just drop it for now |
It is indeed not too different, I could merge the fixes back to main and move the stuff to common. Tbh it doesn't make really much sense to bifurcate it right now , but could make sense in the future. |
merged the fixes in here to main.cpp |
Closes