You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am conducting a series of performance analyses on PowerInfer. Out of consideration of stability, I need to obtain the same output after each execution. I have refered to #109, but it is not working.
Command
./main -m ../../models/llama-re-lu-7b-sparse/llama-7b-re-lu.powerinfer.gguf --temp 0 -n 256 --seed 0 -t 8 --top-k 1 -p "Here is a code to calculate the first 20 primes"
llama_print_timings: load time = 1080.21 ms
llama_print_timings: sample time = 6.95 ms / 68 runs ( 0.10 ms per token, 9785.58 tokens per second)
llama_print_timings: prompt eval time = 253.00 ms / 14 tokens ( 18.07 ms per token, 55.34 tokens per second)
llama_print_timings: eval time = 5391.85 ms / 67 runs ( 80.48 ms per token, 12.43 tokens per second)
llama_print_timings: total time = 5668.73 ms
Log end
For the second execution with the same former part, I got different output text:
Here is a code to calculate the first 20 primes.
defprime_sieve(n):
""" Generate a list of primes up to n, using the sieve of Eratosthenes. Args: n (int): The upper limit for the primes. Returns: A list of primes up to n. """primes= [True] * (n//2) + [False] * (n//2)
# Mark all multiples of each prime as false.foriinrange(1, n//2):
ifprimes[i//2]:
primes[i//2] =False# Mark the first prime as true.primes[0] =Truereturn [primes[i//2]] * (n//2) + [False] * (n//2)
[end of text]
I wonder if the predictors have an effect on sampling.
Actually this is because of our sparse down operator in FFN. We utilize axpy to implement a matmul operator. In this process, the output is composed of many concurrent add operator, which will introduce slight fluctuation. For a stable output, it's advised to use PowerInfer with pure CPU inference using a single thread.
Actually this is because of our sparse down operator in FFN. We utilize axpy to implement a matmul operator. In this process, the output is composed of many concurrent add operator, which will introduce slight fluctuation. For a stable output, it's advised to use PowerInfer with pure CPU inference using a single thread.
Prerequisites
Before submitting your issue, please ensure the following:
Problem description
I am conducting a series of performance analyses on
PowerInfer
. Out of consideration of stability, I need to obtain the same output after each execution. I have refered to #109, but it is not working.Command
./main -m ../../models/llama-re-lu-7b-sparse/llama-7b-re-lu.powerinfer.gguf --temp 0 -n 256 --seed 0 -t 8 --top-k 1 -p "Here is a code to calculate the first 20 primes"
Current behaviour
For the second execution with the same former part, I got different output text:
I wonder if the predictors have an effect on sampling.
Environment
This inconsistent does NOT appear on another device with:
The text was updated successfully, but these errors were encountered: