Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick ask window fails to close #388

Open
gnuntoo opened this issue Nov 30, 2024 · 1 comment
Open

Quick ask window fails to close #388

gnuntoo opened this issue Nov 30, 2024 · 1 comment
Labels
bug Something isn't working working on Featurers that are actively being worked on

Comments

@gnuntoo
Copy link

gnuntoo commented Nov 30, 2024

Describe the bug
if you open the quick ask window with gnome search, you can close it once, then upon reopening it and trying to close it again, it does not close.

Expected behavior
The window should close every time like a good boy.

Debugging information

INFO	[main.py | main] Alpaca version: 2.9.0
INFO	[connection_handler.py | start] Starting Alpaca's Ollama instance...
INFO	[connection_handler.py | start] Started Alpaca's Ollama instance
2024/11/29 16:13:20 routes.go:1189: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
INFO	[connection_handler.py | start] client version is 0.4.2
time=2024-11-29T16:13:20.496-09:00 level=INFO source=images.go:755 msg="total blobs: 11"
time=2024-11-29T16:13:20.497-09:00 level=INFO source=images.go:762 msg="total unused blobs removed: 0"
time=2024-11-29T16:13:20.497-09:00 level=INFO source=routes.go:1240 msg="Listening on 127.0.0.1:11435 (version 0.4.2)"
time=2024-11-29T16:13:20.497-09:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners
time=2024-11-29T16:13:20.498-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz
time=2024-11-29T16:13:20.498-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz
time=2024-11-29T16:13:20.498-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz
time=2024-11-29T16:13:20.498-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz
time=2024-11-29T16:13:20.498-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz
time=2024-11-29T16:13:20.498-09:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm payload=linux/amd64/rocm/ollama_llama_server.gz
INFO	[connection_handler.py | request] GET : http://127.0.0.1:11435/api/tags
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu/ollama_llama_server
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx/ollama_llama_server
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx2/ollama_llama_server
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cuda_v11/ollama_llama_server
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cuda_v12/ollama_llama_server
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/rocm/ollama_llama_server
time=2024-11-29T16:13:20.612-09:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cuda_v12 rocm cpu cpu_avx cpu_avx2 cuda_v11]"
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-11-29T16:13:20.612-09:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-11-29T16:13:20.612-09:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-11-29T16:13:20.613-09:00 level=DEBUG source=gpu.go:94 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-11-29T16:13:20.613-09:00 level=DEBUG source=gpu.go:509 msg="Searching for GPU library" name=libcuda.so*
time=2024-11-29T16:13:20.613-09:00 level=DEBUG source=gpu.go:532 msg="gpu library search" globs="[/app/lib/ollama/libcuda.so* /app/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcuda.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcuda.so* /usr/lib/sdk/llvm15/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcuda.so* /usr/lib/ollama/libcuda.so* /app/plugins/AMD/lib/ollama/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-11-29T16:13:20.617-09:00 level=DEBUG source=gpu.go:566 msg="discovered GPU libraries" paths=[]
time=2024-11-29T16:13:20.617-09:00 level=DEBUG source=gpu.go:509 msg="Searching for GPU library" name=libcudart.so*
time=2024-11-29T16:13:20.617-09:00 level=DEBUG source=gpu.go:532 msg="gpu library search" globs="[/app/lib/ollama/libcudart.so* /app/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcudart.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcudart.so* /usr/lib/sdk/llvm15/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcudart.so* /usr/lib/ollama/libcudart.so* /app/plugins/AMD/lib/ollama/libcudart.so* /app/lib/ollama/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-11-29T16:13:20.619-09:00 level=DEBUG source=gpu.go:566 msg="discovered GPU libraries" paths="[/app/lib/ollama/libcudart.so.11.3.109 /app/lib/ollama/libcudart.so.12.4.127]"
cudaSetDevice err: 35
time=2024-11-29T16:13:20.621-09:00 level=DEBUG source=gpu.go:582 msg="Unable to load cudart library /app/lib/ollama/libcudart.so.11.3.109: your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
cudaSetDevice err: 35
time=2024-11-29T16:13:20.622-09:00 level=DEBUG source=gpu.go:582 msg="Unable to load cudart library /app/lib/ollama/libcudart.so.12.4.127: your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-11-29T16:13:20.622-09:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-11-29T16:13:20.623-09:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-11-29T16:13:20.623-09:00 level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-11-29T16:13:20.623-09:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-11-29T16:13:20.623-09:00 level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26591 unique_id=0
time=2024-11-29T16:13:20.623-09:00 level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
time=2024-11-29T16:13:20.623-09:00 level=WARN source=amd_linux.go:306 msg="amdgpu too old gfx803" gpu=0
time=2024-11-29T16:13:20.623-09:00 level=INFO source=amd_linux.go:399 msg="no compatible amdgpu devices detected"
time=2024-11-29T16:13:20.623-09:00 level=INFO source=gpu.go:386 msg="no compatible GPUs were discovered"
time=2024-11-29T16:13:20.623-09:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="31.2 GiB" available="28.0 GiB"
[GIN] 2024/11/29 - 16:13:20 | 200 |     711.771µs |       127.0.0.1 | GET      "/api/tags"
INFO	[connection_handler.py | request] POST : http://127.0.0.1:11435/api/show
INFO	[connection_handler.py | request] POST : http://127.0.0.1:11435/api/show
[GIN] 2024/11/29 - 16:13:20 | 200 |   44.575011ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/11/29 - 16:13:20 | 200 |    51.19467ms |       127.0.0.1 | POST     "/api/show"
INFO	[connection_handler.py | request] POST : http://127.0.0.1:11435/api/chat
time=2024-11-29T16:13:56.601-09:00 level=DEBUG source=gpu.go:398 msg="updating system memory data" before.total="31.2 GiB" before.free="28.0 GiB" before.free_swap="4.0 GiB" now.total="31.2 GiB" now.free="27.4 GiB" now.free_swap="4.0 GiB"
time=2024-11-29T16:13:56.601-09:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x811140 gpu_count=1
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=sched.go:211 msg="cpu mode with first model, loading"
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=gpu.go:398 msg="updating system memory data" before.total="31.2 GiB" before.free="27.4 GiB" before.free_swap="4.0 GiB" now.total="31.2 GiB" now.free="27.4 GiB" now.free_swap="4.0 GiB"
time=2024-11-29T16:13:56.656-09:00 level=INFO source=server.go:105 msg="system memory" total="31.2 GiB" free="27.4 GiB" free_swap="4.0 GiB"
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu/ollama_llama_server
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx/ollama_llama_server
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx2/ollama_llama_server
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cuda_v11/ollama_llama_server
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cuda_v12/ollama_llama_server
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/rocm/ollama_llama_server
time=2024-11-29T16:13:56.656-09:00 level=DEBUG source=memory.go:107 msg=evaluating library=cpu gpu_count=1 available="[27.4 GiB]"
time=2024-11-29T16:13:56.656-09:00 level=INFO source=memory.go:343 msg="offload to cpu" layers.requested=-1 layers.model=17 layers.offload=0 layers.split="" memory.available="[27.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.1 GiB" memory.required.partial="0 B" memory.required.kv="256.0 MiB" memory.required.allocations="[2.1 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="976.1 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="554.3 MiB"
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm payload=linux/amd64/rocm/ollama_llama_server.gz
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu/ollama_llama_server
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx/ollama_llama_server
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx2/ollama_llama_server
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cuda_v11/ollama_llama_server
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cuda_v12/ollama_llama_server
time=2024-11-29T16:13:56.657-09:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/rocm/ollama_llama_server
time=2024-11-29T16:13:56.660-09:00 level=DEBUG source=gpu.go:703 msg="no filter required for library cpu"
time=2024-11-29T16:13:56.660-09:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx2/ollama_llama_server --model /home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 8192 --batch-size 512 --verbose --threads 6 --no-mmap --parallel 4 --port 43981"
time=2024-11-29T16:13:56.660-09:00 level=DEBUG source=server.go:400 msg=subprocess environment="[LD_LIBRARY_PATH=/app/lib/ollama:/home/brandonludwig/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2889046696/runners/cpu_avx2:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/ollama:/app/plugins/AMD/lib/ollama PATH=/app/bin:/usr/bin]"
time=2024-11-29T16:13:56.661-09:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-11-29T16:13:56.661-09:00 level=INFO source=server.go:562 msg="waiting for llama runner to start responding"
time=2024-11-29T16:13:56.661-09:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error"
time=2024-11-29T16:13:56.665-09:00 level=INFO source=runner.go:883 msg="starting go runner"
time=2024-11-29T16:13:56.665-09:00 level=INFO source=runner.go:884 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=6
time=2024-11-29T16:13:56.665-09:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:43981"
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 16
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  18:                          general.file_type u32              = 7
llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   34 tensors
llama_model_loader: - type q8_0:  113 tensors
time=2024-11-29T16:13:56.913-09:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 16
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 1.24 B
llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW) 
llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size =    0.07 MiB
llm_load_tensors:        CPU buffer size =  1518.57 MiB
time=2024-11-29T16:13:57.918-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.18"
time=2024-11-29T16:13:58.170-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.37"
time=2024-11-29T16:13:58.420-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.45"
time=2024-11-29T16:13:58.671-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.55"
time=2024-11-29T16:13:58.923-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.64"
time=2024-11-29T16:13:59.174-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.76"
time=2024-11-29T16:13:59.425-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.84"
time=2024-11-29T16:13:59.676-09:00 level=DEBUG source=server.go:607 msg="model load progress 0.93"
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
time=2024-11-29T16:13:59.928-09:00 level=DEBUG source=server.go:607 msg="model load progress 1.00"
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     1.99 MiB
llama_new_context_with_model:        CPU compute buffer size =   544.01 MiB
llama_new_context_with_model: graph nodes  = 518
llama_new_context_with_model: graph splits = 1
time=2024-11-29T16:14:00.179-09:00 level=INFO source=server.go:601 msg="llama runner started in 3.52 seconds"
time=2024-11-29T16:14:00.179-09:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
time=2024-11-29T16:14:00.180-09:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nwrite a short poem<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
time=2024-11-29T16:14:00.181-09:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=29 used=0 remaining=29
[GIN] 2024/11/29 - 16:14:02 | 200 |   5.73835668s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-29T16:14:02.282-09:00 level=DEBUG source=sched.go:466 msg="context for request finished"
time=2024-11-29T16:14:02.282-09:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 duration=5m0s
time=2024-11-29T16:14:02.282-09:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 refCount=0
INFO	[connection_handler.py | request] POST : http://127.0.0.1:11435/api/chat
time=2024-11-29T16:14:20.292-09:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
time=2024-11-29T16:14:20.293-09:00 level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nwrite a short poem<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
time=2024-11-29T16:14:20.295-09:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=69 prompt=29 used=28 remaining=1
time=2024-11-29T16:14:21.858-09:00 level=DEBUG source=sched.go:407 msg="context for request finished"
[GIN] 2024/11/29 - 16:14:21 | 200 |  1.602350112s |       127.0.0.1 | POST     "/api/chat"
time=2024-11-29T16:14:21.859-09:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 duration=5m0s
time=2024-11-29T16:14:21.859-09:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/home/brandonludwig/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 refCount=0
@gnuntoo gnuntoo added the bug Something isn't working label Nov 30, 2024
@Jeffser
Copy link
Owner

Jeffser commented Dec 4, 2024

Hi thanks for the report, I'll look into it!

@Jeffser Jeffser added the working on Featurers that are actively being worked on label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working working on Featurers that are actively being worked on
Projects
None yet
Development

No branches or pull requests

2 participants