You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LLAMA_METAL=1 make -j && ./main -m ./models/guanaco-7B.ggmlv3.q4_0.bin -p "I love fish" --ignore-eos -n 1024 -ngl 1
llama_print_timings: load time = 7918.69 ms
llama_print_timings: sample time = 1013.54 ms / 1024 runs ( 0.99 ms per token)
llama_print_timings: prompt eval time = 14705.49 ms / 775 tokens ( 18.97 ms per token)
llama_print_timings: eval time = 46435.82 ms / 1020 runs ( 45.53 ms per token)
llama_print_timings: total time = 69981.58 ms
my question is , it seems that the eval time is same on CPU, is it normal?
Macbook pro M1 , 32GB
The text was updated successfully, but these errors were encountered:
Yup, on M1 Pro I also get similar time for 8 thread CPU compared to GPU - ~45 ms / tok
My explanation is that the CPU and GPU share 100 GB/s bandwidth each from the total 200 GB/s of M1 Pro so parity is expected for this machine
Yup, on M1 Pro I also get similar time for 8 thread CPU compared to GPU - ~45 ms / tok My explanation is that the CPU and GPU share 100 GB/s bandwidth each from the total 200 GB/s of M1 Pro so parity is expected for this machine
LLAMA_METAL=1 make -j && ./main -m ./models/guanaco-7B.ggmlv3.q4_0.bin -p "I love fish" --ignore-eos -n 1024 -ngl 1
llama_print_timings: load time = 7918.69 ms
llama_print_timings: sample time = 1013.54 ms / 1024 runs ( 0.99 ms per token)
llama_print_timings: prompt eval time = 14705.49 ms / 775 tokens ( 18.97 ms per token)
llama_print_timings: eval time = 46435.82 ms / 1020 runs ( 45.53 ms per token)
llama_print_timings: total time = 69981.58 ms
my question is , it seems that the eval time is same on CPU, is it normal?
Macbook pro M1 , 32GB
The text was updated successfully, but these errors were encountered: