-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
give different answer for the same question between llama.cpp‘s main.exe and this project #384
Comments
Try setting the exact same parameters (temp, rand_seed, n_threads, etc.) in Python that you're setting for |
|
still the same output, the main code show below llm = Llama(model_path=model_path, n_threads=self._n_thread,n_ctx=2048)
user_ctx = "Q:" + promote + " A: "
output = llm(user_ctx, max_tokens=256, stop=["Q:"], echo=True, temperature=0.2) but i can't find the other two param named here are the output of main.exe -h usage: ./main [options]
options:
-h, --help show this help message and exit
-i, --interactive run in interactive mode
--interactive-first run in interactive mode and wait for input right away
-ins, --instruct run in instruction mode (use with Alpaca models)
--multiline-input allows you to write or paste multiple lines without ending each in '\'
-r PROMPT, --reverse-prompt PROMPT
halt generation at PROMPT, return control in interactive mode
(can be specified more than once for multiple prompts).
--color colorise output to distinguish prompt and user input from generations
-s SEED, --seed SEED RNG seed (default: -1, use random seed for < 0)
-t N, --threads N number of threads to use during computation (default: 8)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: empty)
-e process prompt escapes sequences (\n, \r, \t, \', \", \\)
--prompt-cache FNAME file to cache prompt state for faster startup (default: none)
--prompt-cache-all if specified, saves user input and generations to cache as well.
not supported with --interactive or other interactive options
--prompt-cache-ro if specified, uses the prompt cache but does not update it.
--random-prompt start with a randomized prompt.
--in-prefix STRING string to prefix user inputs with (default: empty)
--in-suffix STRING string to suffix after user inputs with (default: empty)
-f FNAME, --file FNAME
prompt file to start generation.
-n N, --n-predict N number of tokens to predict (default: -1, -1 = infinity)
--top-k N top-k sampling (default: 40, 0 = disabled)
--top-p N top-p sampling (default: 0.9, 1.0 = disabled)
--tfs N tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
--typical N locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
--repeat-last-n N last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
--repeat-penalty N penalize repeat sequence of tokens (default: 1.1, 1.0 = disabled)
--presence-penalty N repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
--frequency-penalty N repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
--mirostat N use Mirostat sampling.
Top K, Nucleus, Tail Free and Locally Typical samplers are ignored if used.
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
--mirostat-lr N Mirostat learning rate, parameter eta (default: 0.1)
--mirostat-ent N Mirostat target entropy, parameter tau (default: 5.0)
-l TOKEN_ID(+/-)BIAS, --logit-bias TOKEN_ID(+/-)BIAS
modifies the likelihood of token appearing in the completion,
i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'
-c N, --ctx-size N size of the prompt context (default: 512)
--ignore-eos ignore end of stream token and continue generating (implies --logit-bias 2-inf)
--no-penalize-nl do not penalize newline token
--memory-f32 use f32 instead of f16 for memory key+value (default: disabled)
not recommended: doubles context memory required and no measurable increase in quality
--temp N temperature (default: 0.8)
-b N, --batch-size N batch size for prompt processing (default: 512)
--perplexity compute perplexity over the prompt
--keep number of tokens to keep from the initial prompt (default: 0, -1 = all)
--mlock force system to keep model in RAM rather than swapping or compressing
--no-mmap do not memory-map model (slower load but may reduce pageouts if not using mlock)
-ngl
N, --n-gpu-layers N
number of layers to store in VRAM
-ts SPLIT --tensor-split SPLIT
how to split tensors across multiple GPUs, comma-separated list of proportions, e.g. 3,1
-mg i, --main-gpu i the GPU to use for scratch and small tensors
--mtest compute maximum memory usage
--export export the computation graph to 'llama.ggml'
--verbose-prompt print prompt before generation
--lora FNAME apply LoRA adapter (implies --no-mmap)
--lora-base FNAME optional model to use as a base for the layers modified by the LoRA adapter
-m FNAME, --model FNAME
model path (default: models/7B/ggml-model.bin) |
I don't think it makes sense to compare anything when using "temperature=0.2". Try temperature 0. The "c" parameter is "n_ctx". There are also many other parameters that are in play, even if you don't specify them either in llama.cpp or llama-cpp-python. |
I have the same question with my fine-tune model from llama. And may the param |
I solve this problem using the example in |
I'm having the same problem, I don't know if it's possible to change the mode from interactive to instruction in Llama-cpp-python. @gpxin Could you specify where you found this folder? I only have the llama-cpp and llama_cpp_python-0.2.11.dist-info and none of them have an "examples" folder. Thanks! |
@AndreCarasas not in the python pkg but in this porject, its folder path: |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Please provide a detailed written description of what you were trying to do, and what you expected
llama-cpp-python
to do.what i am trying to do: i want the model translate a sentence from chinese to english for me.
when i call the model with original llama.cpp with cmd
the model works fine and give the right output like:
notice that the yellow line
Below is an ......
is the content for a prompt file , the file has been passed to the model with-f prompts/alpaca.txt
and i can't find this param in this project thus i can't tell whether it is the reason for this issue.Current Behavior
when i run the same thing with llama-cpp-python like this:
the output were:
you can see that in this way, the model just return the content to me instead of translate it.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Linux xxxxx 5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
it worked but not as the way i want, so i don't think the questions below will help thus remove them.
I can totally understand that models are bulid on probability things so they may give answers with little differentce but i still want to get some help here.
thanks in advance.
The text was updated successfully, but these errors were encountered: