-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Alpaca integration to match it's trained prompt syntax #302
Comments
Aha, it seemed there is something wrong there. Thanks for this clarification! One more thing: we will very soon change the format of the |
https://github.com/tloen/alpaca-lora#checkpoint-export-export__checkpointpy the project also auto downloads models from huggingface |
@ggerganov Generating ggml models is explained here: antimatter15#13 13B model took me about 130gb space and like an hour to run convert and quantize scripts. I can write a full tutorial on that tomorrow my time |
Alpaca RLHF is for instruction only. Make it interactive doesn't really make sense. It would be simpler just to have two different command line arguments tailored for instructions with or without input field respectively. |
Tried https://github.com/tloen/alpaca-lora on a 13B model from Hugging Face. This is the diff for alpaca-lora, diff --git a/export_state_dict_checkpoint.py b/export_state_dict_checkpoint.py
index 78e9d1f..3b88cb9 100644
--- a/export_state_dict_checkpoint.py
+++ b/export_state_dict_checkpoint.py
@@ -11,10 +11,10 @@ assert (
), "LLaMA is now in HuggingFace's main branch.\nPlease reinstall it: pip uninstall transformers && pip install git+https://github.com/huggingface/transformers.git"
from transformers import LlamaTokenizer, LlamaForCausalLM
-tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
+tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-13b-hf")
base_model = LlamaForCausalLM.from_pretrained(
- "decapoda-research/llama-7b-hf",
+ "decapoda-research/llama-13b-hf",
load_in_8bit=False,
torch_dtype=torch.float16,
device_map={"": "cpu"},
@@ -22,7 +22,7 @@ base_model = LlamaForCausalLM.from_pretrained(
lora_model = PeftModel.from_pretrained(
base_model,
- "tloen/alpaca-lora-7b",
+ "mattreid/alpaca-lora-13b",
device_map={"": "cpu"},
torch_dtype=torch.float16,
)
@@ -37,10 +37,10 @@ lora_model.train(False)
lora_model_sd = lora_model.state_dict()
params = {
- "dim": 4096,
+ "dim": 5120,
"multiple_of": 256,
- "n_heads": 32,
- "n_layers": 32,
+ "n_heads": 40,
+ "n_layers": 40,
"norm_eps": 1e-06,
"vocab_size": -1,
} With the above patch, running diff --git a/main.cpp b/main.cpp
index 3321818..e26a26d 100644
--- a/main.cpp
+++ b/main.cpp
@@ -90,7 +90,7 @@ struct llama_model {
};
// load the model's weights from a file
-bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab & vocab, int n_ctx, ggml_type memory_type = GGML_TYPE_F32) {
+bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab & vocab, int n_ctx, int n_parts = 0, ggml_type memory_type = GGML_TYPE_F32) {
fprintf(stderr, "%s: loading model from '%s' - please wait ...\n", __func__, fname.c_str());
std::vector<char> f_buf(1024*1024);
@@ -127,7 +127,6 @@ bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab
}
int n_ff = 0;
- int n_parts = 0;
// load hparams
{
@@ -145,7 +144,8 @@ bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab
hparams.n_ctx = n_ctx;
n_ff = ((2*(4*hparams.n_embd)/3 + hparams.n_mult - 1)/hparams.n_mult)*hparams.n_mult;
- n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
+ if (n_parts < 1)
+ n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
fprintf(stderr, "%s: n_vocab = %d\n", __func__, hparams.n_vocab);
fprintf(stderr, "%s: n_ctx = %d\n", __func__, hparams.n_ctx);
@@ -839,7 +839,7 @@ int main(int argc, char ** argv) {
{
const ggml_type memory_type = params.memory_f16 ? GGML_TYPE_F16 : GGML_TYPE_F32;
const int64_t t_start_us = ggml_time_us();
- if (!llama_model_load(params.model, model, vocab, params.n_ctx, memory_type)) {
+ if (!llama_model_load(params.model, model, vocab, params.n_ctx, params.n_parts, memory_type)) {
fprintf(stderr, "%s: failed to load model from '%s'\n", __func__, params.model.c_str());
return 1;
}
diff --git a/utils.cpp b/utils.cpp
index 188f114..163441d 100644
--- a/utils.cpp
+++ b/utils.cpp
@@ -64,6 +64,8 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
params.n_batch = std::stoi(argv[++i]);
} else if (arg == "-m" || arg == "--model") {
params.model = argv[++i];
+ } else if (arg == "--n_parts") {
+ params.n_parts = std::stoi(argv[++i]);
} else if (arg == "-i" || arg == "--interactive") {
params.interactive = true;
} else if (arg == "-ins" || arg == "--instruct") {
@@ -119,6 +121,7 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
fprintf(stderr, " -b N, --batch_size N batch size for prompt processing (default: %d)\n", params.n_batch);
fprintf(stderr, " -m FNAME, --model FNAME\n");
fprintf(stderr, " model path (default: %s)\n", params.model.c_str());
+ fprintf(stderr, " --n_parts N number of model files, 0 automatic based on model size (default: %d)\n", params.n_parts);
fprintf(stderr, "\n");
}
diff --git a/utils.h b/utils.h
index 65fe02b..0939117 100644
--- a/utils.h
+++ b/utils.h
@@ -30,6 +30,7 @@ struct gpt_params {
std::string model = "models/lamma-7B/ggml-model.bin"; // model path
std::string prompt = "";
+ int32_t n_parts = 0; // default based on the model size
bool random_prompt = false;
I'm not sure if people would prefer sharded model weights or not. If needed, I can make a pull request for the patch above. |
Sharded would be preferred. And also sha256 sums from alpaca 7B, 13B and 30B models converted with the latest file format. I'm trying to compile a complete list of sha256 checksums for all the |
Does Alpaca model require quantization ? Or they are already quantized ? |
Hi, I don't think changing the model format is very smart, as people will end up witha bunch of HUGE incompatible files on their laptop disks. There will be a real issue figuring out which format works with what forked version downstream. It might be nice to have a base format -eg. the current one, and with each version of llama.cpp or alpaca.cpp, a script which converts to the needed format and writes the right magic numbers. I would expect a 100B and then a 200B or so model to arrive within a few months, which will make the space problem worse. I am using alpaca.cpp and not llama, but I believe my comment is relevant to both. |
Reading the data release closely,
How it's currently implemented is how it's "supposed" to be used. It was trained with the instruction-input-response model, but inference is done with the instruction-response model. That being said, it is not a rule to use it like that. The instruction-input-response option should be researched whether the output is improved when using the 2-input approach. These models are an ongoing research and there is no right and wrong, and testing all different things is fun and provide valuable feedback. A basic example, does this
produce better output than this
You can already try these out as one-shots as-is, when not using the With the interactive mode, it could be almost done already with a command line like this If this is something that works well, it's not a problem at all to implement. The only hard part I can think of is communicating and describing the command line options to the user so it's understandable how it works. 😄 |
How to delete "### Human:" part from Vicuna's response? |
Alpaca format compatibility with openai's 'system', 'assistant' and 'user' roles lowercase. Alpaca prompt syntax is as follows: ``` Task description... ### Instruction: Summarize following text. ### Input: Text to be summarized ### Response: ``` https://github.com/tatsu-lab/stanford_alpaca/blob/65512697dc67779a6e53c267488aba0ec4d7c02a/train.py#L31 ggml-org/llama.cpp#302 Notice how the "roles" (Instruction, Input, Response) have a capital letter. Same goes when using the following format: ``` Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Write a poem about the transformers Python library. ### Response: ``` Or for openai's compatibility: ``` Below is an instruction that describes a task. Write a response that appropriately completes the request. ### User: Write a poem about the transformers Python library. ### Assistant: ```
Hi all, I have question that: "if I want to combine 2 type instruction with input and without input. How to do this?". I have followed the instruction with input. if no input. I've set I am very happy if you give me advance methods?? Or it only create step by step, not combine 2 methods above? |
Alpaca LoRA model was trained on the same dataset as original Stanford Alpaca.
However, this dataset contains two types of instructions, namely:
For more details about the instructions format see details here.
In case of instructions such as text summarization, instruction alone only "explain" the task, while the text to be summarized is inserted into the "input" part of the prompt.
Current integration of alpaca in
llama.cpp
mimics the current integration in alpaca.cpp which completely omits the "instructions with input" type of instructions. This may have significant impact on the model performance using task which were trained to be used in "instruction with input" prompt syntax when using just ordinary "instruction without input" prompt syntax instead.I suggest to build some small tutorial with example usage in order for users to be able to know which type of instruction should be used in input mode and which not.
Then I suggest to integrate this "input" mode somehow into the current implementation. Easiest way would be to let user type text prompt like:
which will be transformed into:
While when user don't specify
***input***
tag, the instruction will be transformed into "standard" (currently implemented) format:The text was updated successfully, but these errors were encountered: