We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I use the --mtest option and get a report on how much memory is used.
--mtest
The program crashes because the fist token is not BOS.
BOS
As expected, the bug was introduced with f9a6364
$ lscpu
CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD Ryzen 7 3700X 8-Core Processor CPU family: 23 Model: 113 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 0 Frequency boost: enabled CPU(s) scaling MHz: 77% CPU max MHz: 4935.9370 CPU min MHz: 2200.0000 BogoMIPS: 7202.09 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr _opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalign sse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pst ate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsav ec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umi p rdpid overflow_recov succor smca sev sev_es Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 256 KiB (8 instances) L1i: 256 KiB (8 instances) L2: 4 MiB (8 instances) L3: 32 MiB (2 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerabilities: Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Not affected
$ uname -a Linux johannes-pc 6.3.0-1-MANJARO #1 SMP PREEMPT_DYNAMIC Mon Apr 3 10:46:56 UTC 2023 x86_64 GNU/Linux
$ uname -a
Linux johannes-pc 6.3.0-1-MANJARO #1 SMP PREEMPT_DYNAMIC Mon Apr 3 10:46:56 UTC 2023 x86_64 GNU/Linux
Python 3.10.10 GNU Make 4.4.1 g++ (GCC) 12.2.1 20230201
./main --model ./models/llama-7b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --mtest | tee chat.txt
/home/johannesg/Projects/llama.cpp [git::f9a6364 *] [johannesg@johannes-pc] [14:25] > ./main --model ./models/llama-7b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --mtest | tee chat.txt WARNING: when using cuBLAS generation results are NOT guaranteed to be reproducible. main: build = 520 (f9a6364) main: seed = 1337 llama.cpp: loading model from ./models/llama-7b-ggml-q4_0.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 68.20 KB llama_model_load_internal: mem required = 5809.33 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 1024.00 MB system_info: n_threads = 6 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | llama_eval_internal: first token must be BOS llama_eval: failed to eval llama_print_timings: load time = 1193.45 ms llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run) llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token) llama_print_timings: eval time = 112.75 ms / 1 runs ( 112.75 ms per run) llama_print_timings: total time = 1193.45 ms
The text was updated successfully, but these errors were encountered:
fb62f92
No branches or pull requests
Expected Behavior
I use the
--mtest
option and get a report on how much memory is used.Current Behavior
The program crashes because the fist token is not
BOS
.Environment and Context
As expected, the bug was introduced with f9a6364
$ lscpu
$ uname -a
Linux johannes-pc 6.3.0-1-MANJARO #1 SMP PREEMPT_DYNAMIC Mon Apr 3 10:46:56 UTC 2023 x86_64 GNU/Linux
Steps to Reproduce
Failure Logs
The text was updated successfully, but these errors were encountered: