You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
version: 4783 (a800ae4)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hi."
}
]
}' | jq '.choices[0]'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2352 100 2254 100 98 2762 120 --:--:-- --:--:-- --:--:-- 2882
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"tools": [
{
"type":"function",
"function":{
"name":"python",
"description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
"parameters":{
"type":"object",
"properties":{
"code":{
"type":"string",
"description":"The code to run in the ipython interpreter."
}
},
"required":["code"]
}
}
}
],
"messages": [
{
"role": "user",
"content": "Hi"
}
]
}' | jq '.choices[0]'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4096 100 3527 100 569 4240 684 --:--:-- --:--:-- --:--:-- 4923
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "assistant\nHello! How can I assist you today?"
}
}
First Bad Commit
No response
Relevant log output
+ docker run --gpus all --rm --name llama.cpp -p 8080:8080 -v /etc/ssl/certs:/etc/ssl/certs:ro -v /home/ed/.llama.cpp/models:/root/.cache ghcr.io/ggml-org/llama.cpp:full-cuda -s --ctx-size 0 --jinja -fa -hf bartowski/functionary-small-v3.2-GGUF:Q4_K_M --host 0.0.0.0 -ngl 10 --verbose
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
build: 4783 (a800ae46) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
system info: n_threads = 8, n_threads_batch = 8, total_threads = 32
system_info: n_threads = 8 (n_threads_batch = 8) / 32 | CUDA : ARCHS = 500,610,700,750,800 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
main: HTTP server is listening, hostname: 0.0.0.0, port: 8080, http threads: 31
main: loading model
srv load_model: loading model '/root/.cache/llama.cpp/bartowski_functionary-small-v3.2-GGUF_functionary-small-v3.2-Q4_K_M.gguf'
common_download_file: previous metadata file found /root/.cache/llama.cpp/bartowski_functionary-small-v3.2-GGUF_functionary-small-v3.2-Q4_K_M.gguf.json: {"etag":"\"e0ce54ab24981f28174430665c1ed516-308\"","lastModified":"Thu, 08 Aug 2024 09:29:55 GMT","url":"https://huggingface.co/bartowski/functionary-small-v3.2-GGUF/resolve/main/functionary-small-v3.2-Q4_K_M.gguf"}
curl_perform_with_retry: Trying to download from https://huggingface.co/bartowski/functionary-small-v3.2-GGUF/resolve/main/functionary-small-v3.2-Q4_K_M.gguf (attempt 1 of 3)...
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4070 Laptop GPU) - 7793 MiB free
llama_model_loader: loaded meta data with 34 key-value pairs and 292 tensors from /root/.cache/llama.cpp/bartowski_functionary-small-v3.2-GGUF_functionary-small-v3.2-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv 3: general.version str = v3.2
llama_model_loader: - kv 4: general.organization str = Meta Llama
llama_model_loader: - kv 5: general.finetune str = Instruct
llama_model_loader: - kv 6: general.basename str = Meta-Llama-3.1
llama_model_loader: - kv 7: general.size_label str = 8B
llama_model_loader: - kv 8: general.license str = mit
llama_model_loader: - kv 9: llama.block_count u32 = 32
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 4096
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 15
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 128009llama_model_loader: - kv 28: tokenizer.chat_template str = {% for message in messages %}\n{% if m...llama_model_loader: - kv 29: general.quantization_version u32 = 2llama_model_loader: - kv 30: quantize.imatrix.file str = /models_out/functionary-small-v3.2-GG...llama_model_loader: - kv 31: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txtllama_model_loader: - kv 32: quantize.imatrix.entries_count i32 = 224llama_model_loader: - kv 33: quantize.imatrix.chunks_count i32 = 125llama_model_loader: - type f32: 66 tensorsllama_model_loader: - type q4_K: 193 tensorsllama_model_loader: - type q6_K: 33 tensorsprint_info: file format = GGUF V3 (latest)print_info: file type = Q4_K - Mediumprint_info: file size = 4.58 GiB (4.89 BPW) init_tokenizer: initializing tokenizer for type 2load: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOGload: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOGload: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOGload: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOGload: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOGload: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOGload: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOGload: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOGload: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOGload: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOGload: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOGload: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOGload: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOGload: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOGload: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOGload: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOGload: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOGload: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOGload: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOGload: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOGload: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOGload: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOGload: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOGload: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOGload: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOGload: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOGload: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOGload: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOGload: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOGload: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOGload: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOGload: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOGload: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOGload: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOGload: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOGload: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOGload: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOGload: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOGload: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOGload: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOGload: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOGload: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOGload: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOGload: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOGload: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOGload: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOGload: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOGload: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOGload: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOGload: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOGload: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOGload: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOGload: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOGload: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOGload: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOGload: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOGload: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOGload: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOGload: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOGload: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOGload: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOGload: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOGload: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOGload: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOGload: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOGload: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOGload: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOGload: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOGload: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOGload: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOGload: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOGload: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOGload: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOGload: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOGload: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOGload: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOGload: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOGload: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOGload: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOGload: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOGload: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOGload: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOGload: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOGload: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOGload: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOGload: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOGload: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOGload: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOGload: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOGload: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOGload: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOGload: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOGload: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOGload: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOGload: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOGload: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOGload: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOGload: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOGload: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOGload: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOGload: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOGload: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOGload: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOGload: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOGload: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOGload: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOGload: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOGload: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOGload: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOGload: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOGload: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOGload: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOGload: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOGload: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOGload: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOGload: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOGload: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOGload: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOGload: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOGload: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOGload: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOGload: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOGload: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOGload: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOGload: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOGload: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOGload: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOGload: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOGload: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOGload: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOGload: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOGload: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOGload: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOGload: control token: 128007 '<|end_header_id|>' is not marked as EOGload: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOGload: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOGload: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOGload: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOGload: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOGload: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOGload: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOGload: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOGload: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOGload: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOGload: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOGload: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOGload: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOGload: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOGload: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOGload: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOGload: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOGload: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOGload: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOGload: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOGload: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOGload: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOGload: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOGload: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOGload: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOGload: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOGload: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOGload: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOGload: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOGload: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOGload: control token: 128001 '<|end_of_text|>' is not marked as EOGload: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOGload: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOGload: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOGload: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOGload: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOGload: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOGload: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOGload: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOGload: control token: 128006 '<|start_header_id|>' is not marked as EOGload: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOGload: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOGload: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOGload: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOGload: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOGload: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOGload: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOGload: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOGload: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOGload: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOGload: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOGload: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOGload: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOGload: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOGload: control token: 128000 '<|begin_of_text|>' is not marked as EOGload: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOGload: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOGload: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOGload: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOGload: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOGload: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOGload: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOGload: control token: 128010 '<|python_tag|>' is not marked as EOGload: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOGload: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOGload: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOGload: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOGload: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOGload: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOGload: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOGload: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOGload: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOGload: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOGload: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOGload: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOGload: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOGload: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOGload: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOGload: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOGload: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOGload: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOGload: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOGload: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOGload: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOGload: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOGload: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOGload: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOGload: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOGload: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOGload: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOGload: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOGload: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOGload: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOGload: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOGload: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOGload: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOGload: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOGload: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOGload: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOGload: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOGload: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOGload: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOGload: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOGload: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOGload: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOGload: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOGload: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOGload: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOGload: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOGload: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOGload: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOGload: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOGload: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOGload: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOGload: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOGload: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOGload: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOGload: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOGload: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOGload: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOGload: special tokens cache size = 256load: token to piece cache size = 0.7999 MBprint_info: arch = llamaprint_info: vocab_only = 0print_info: n_ctx_train = 131072print_info: n_embd = 4096print_info: n_layer = 32print_info: n_head = 32print_info: n_head_kv = 8print_info: n_rot = 128print_info: n_swa = 0print_info: n_embd_head_k = 128print_info: n_embd_head_v = 128print_info: n_gqa = 4print_info: n_embd_k_gqa = 1024print_info: n_embd_v_gqa = 1024print_info: f_norm_eps = 0.0e+00print_info: f_norm_rms_eps = 1.0e-05print_info: f_clamp_kqv = 0.0e+00print_info: f_max_alibi_bias = 0.0e+00print_info: f_logit_scale = 0.0e+00print_info: n_ff = 14336print_info: n_expert = 0print_info: n_expert_used = 0print_info: causal attn = 1print_info: pooling type = 0print_info: rope type = 0print_info: rope scaling = linearprint_info: freq_base_train = 500000.0print_info: freq_scale_train = 1print_info: n_ctx_orig_yarn = 131072print_info: rope_finetuned = unknownprint_info: ssm_d_conv = 0print_info: ssm_d_inner = 0print_info: ssm_d_state = 0print_info: ssm_dt_rank = 0print_info: ssm_dt_b_c_rms = 0print_info: model type = 8Bprint_info: model params = 8.03 Bprint_info: general.name = Meta Llama 3.1 8B Instructprint_info: vocab type = BPEprint_info: n_vocab = 128256print_info: n_merges = 280147print_info: BOS token = 128000 '<|begin_of_text|>'print_info: EOS token = 128009 '<|eot_id|>'print_info: EOT token = 128009 '<|eot_id|>'print_info: EOM token = 128008 '<|eom_id|>'print_info: PAD token = 128009 '<|eot_id|>'print_info: LF token = 198 'Ċ'print_info: EOG token = 128008 '<|eom_id|>'print_info: EOG token = 128009 '<|eot_id|>'print_info: max token length = 256load_tensors: loading model tensors, this can take a while... (mmap = true)load_tensors: layer 0 assigned to device CPUload_tensors: layer 1 assigned to device CPUload_tensors: layer 2 assigned to device CPUload_tensors: layer 3 assigned to device CPUload_tensors: layer 4 assigned to device CPUload_tensors: layer 5 assigned to device CPUload_tensors: layer 6 assigned to device CPUload_tensors: layer 7 assigned to device CPUload_tensors: layer 8 assigned to device CPUload_tensors: layer 9 assigned to device CPUload_tensors: layer 10 assigned to device CPUload_tensors: layer 11 assigned to device CPUload_tensors: layer 12 assigned to device CPUload_tensors: layer 13 assigned to device CPUload_tensors: layer 14 assigned to device CPUload_tensors: layer 15 assigned to device CPUload_tensors: layer 16 assigned to device CPUload_tensors: layer 17 assigned to device CPUload_tensors: layer 18 assigned to device CPUload_tensors: layer 19 assigned to device CPUload_tensors: layer 20 assigned to device CPUload_tensors: layer 21 assigned to device CPUload_tensors: layer 22 assigned to device CUDA0load_tensors: layer 23 assigned to device CUDA0load_tensors: layer 24 assigned to device CUDA0load_tensors: layer 25 assigned to device CUDA0load_tensors: layer 26 assigned to device CUDA0load_tensors: layer 27 assigned to device CUDA0load_tensors: layer 28 assigned to device CUDA0load_tensors: layer 29 assigned to device CUDA0load_tensors: layer 30 assigned to device CUDA0load_tensors: layer 31 assigned to device CUDA0load_tensors: layer 32 assigned to device CPUload_tensors: tensor 'token_embd.weight' (q4_K) (and 222 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU insteadload_tensors: offloading 10 repeating layers to GPUload_tensors: offloaded 10/33 layers to GPUload_tensors: CUDA0 model buffer size = 1263.13 MiBload_tensors: CPU_Mapped model buffer size = 4685.30 MiB.......................................................................................llama_init_from_model: n_seq_max = 1llama_init_from_model: n_ctx = 131072llama_init_from_model: n_ctx_per_seq = 131072llama_init_from_model: n_batch = 2048llama_init_from_model: n_ubatch = 512llama_init_from_model: flash_attn = 1llama_init_from_model: freq_base = 500000.0llama_init_from_model: freq_scale = 1llama_kv_cache_init: kv_size = 131072, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024llama_kv_cache_init: CUDA0 KV buffer size = 5120.00 MiBllama_kv_cache_init: CPU KV buffer size = 11264.00 MiBllama_init_from_model: KV self size = 16384.00 MiB, K (f16): 8192.00 MiB, V (f16): 8192.00 MiBllama_init_from_model: CPU output buffer size = 0.49 MiBllama_init_from_model: CUDA0 compute buffer size = 920.00 MiBllama_init_from_model: CUDA_Host compute buffer size = 264.01 MiBllama_init_from_model: graph nodes = 903llama_init_from_model: graph splits = 247 (with bs=512), 3 (with bs=1)common_init_from_params: setting dry_penalty_last_n to ctx_size = 131072common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)srv init: initializing slots, n_slots = 1slot init: id 0 | task -1 | new slot n_ctx_slot = 131072slot reset: id 0 | task -1 | main: model loadedmain: chat template, chat_template: {% for message in messages %}{% if message['role'] == 'user' or message['role'] == 'system' %}{{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>' + message['content'] + '<|eot_id|>' }}{% elif message['role'] == 'tool' %}{{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>' + message['content'] + '<|eot_id|>' }}{% else %}{{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>'}}{% if message['content'] is not none %}{{ '>>>all' + message['content'] }}{% endif %}{% if 'tool_calls' in message and message['tool_calls'] is not none %}{% for tool_call in message['tool_calls'] %}{{ '>>>' + tool_call['function']['name'] + '' + tool_call['function']['arguments'] }}{% endfor %}{% endif %}{{ '<|eot_id|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>{role}<|end_header_id|>' }}{% endif %}, example_format: '<|start_header_id|>system<|end_header_id|>You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>>>>allHi there<|eot_id|><|start_header_id|>user<|end_header_id|>How are you?<|eot_id|><|start_header_id|>{role}<|end_header_id|>'main: server is listening on http://0.0.0.0:8080 - starting the main loopque start_loop: processing new tasksque start_loop: update slotssrv update_slots: all slots are idlesrv kv_cache_cle: clearing KV cacheque start_loop: waiting for new tasksrequest: {"model": "gpt-3.5-turbo","tools": [ {"type":"function","function":{"name":"python","description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.","parameters":{"type":"object","properties":{"code":{"type":"string","description":"The code to run in the ipython interpreter." } },"required":["code"] } } }],"messages": [ {"role": "user","content": "Hi" }]}Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.srv params_from_: Grammar: char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})first-tool-call ::= python-callpython-args ::= "{" space python-args-code-kv "}" spacepython-args-code-kv ::= "\"code\"" space ":" space stringpython-call ::= "python\n" python-argspython-call2 ::= ">>>python\n" python-argsroot ::= first-tool-call spacespace ::= | "" | "\n" [ \t]{0,20}string ::= "\"" char* "\"" spacesrv params_from_: Grammar lazy: truesrv params_from_: Chat format: Functionary v3.2srv params_from_: Grammar trigger token: 12958 (`python`)srv params_from_: Grammar trigger word: `>>>python`srv add_waiting_: add task 0 to waiting list. current waiting = 0 (before add)que post: new task, id = 0/1, front = 0que start_loop: processing new tasksque start_loop: processing task, id = 0slot get_availabl: id 0 | task -1 | selected slot by lru, t_last = -1slot reset: id 0 | task -1 | slot launch_slot_: id 0 | task 0 | launching slot : {"id":0,"id_task":0,"n_ctx":131072,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":131072,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"char ::= [^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})\nfirst-tool-call ::= python-call\npython-args ::= \"{\" space python-args-code-kv \"}\" space\npython-args-code-kv ::= \"\\\"code\\\"\" space \":\" space string\npython-call ::= \"python\\n\" python-args\npython-call2 ::= \">>>python\\n\" python-args\nroot ::= first-tool-call space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\nstring ::= \"\\\"\" char* \"\\\"\" space\n","grammar_trigger_words":[">>>python"],"grammar_trigger_tokens":[12958],"preserved_tokens":[12958],"chat_format":"Functionary v3.2","samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou can call any of the following tools to satisfy the user's requests: [\n {\n \"type\": \"function\",\n \"function\": {\n \"name\": \"python\",\n \"description\": \"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.\",\n \"parameters\": {\n \"type\": \"object\",\n \"properties\": {\n \"code\": {\n \"type\": \"string\",\n \"description\": \"The code to run in the ipython interpreter.\"\n }\n },\n \"required\": [\n \"code\"\n ]\n }\n }\n }\n]\n\nExample tool call syntax:\n\nassistant<|end_header_id|>\n\n>>>tool_name\n{\"arg1\": \"some_value\"}<|eot_id|>\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHi<|eot_id|><|start_header_id|>{role}<|end_header_id|>\n\n","next_token":{"has_next_token":true,"has_new_line":false,"n_remain":-1,"n_decoded":0,"stopping_word":""}}slot launch_slot_: id 0 | task 0 | processing taskque start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 1, front = 0slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 167slot update_slots: id 0 | task 0 | prompt token 0: 128000 '<|begin_of_text|>'slot update_slots: id 0 | task 0 | prompt token 1: 128006 '<|start_header_id|>'slot update_slots: id 0 | task 0 | prompt token 2: 9125 'system'slot update_slots: id 0 | task 0 | prompt token 3: 128007 '<|end_header_id|>'slot update_slots: id 0 | task 0 | prompt token 4: 271 ''slot update_slots: id 0 | task 0 | prompt token 5: 2675 'You'slot update_slots: id 0 | task 0 | prompt token 6: 649 ' can'slot update_slots: id 0 | task 0 | prompt token 7: 1650 ' call'slot update_slots: id 0 | task 0 | prompt token 8: 904 ' any'slot update_slots: id 0 | task 0 | prompt token 9: 315 ' of'slot update_slots: id 0 | task 0 | prompt token 10: 279 ' the'slot update_slots: id 0 | task 0 | prompt token 11: 2768 ' following'slot update_slots: id 0 | task 0 | prompt token 12: 7526 ' tools'slot update_slots: id 0 | task 0 | prompt token 13: 311 ' to'slot update_slots: id 0 | task 0 | prompt token 14: 27651 ' satisfy'slot update_slots: id 0 | task 0 | prompt token 15: 279 ' the'slot update_slots: id 0 | task 0 | kv cache rm [0, end)slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 167, n_tokens = 167, progress = 1.000000slot update_slots: id 0 | task 0 | prompt done, n_past = 167, n_tokens = 167srv update_slots: decoding batch, n_tokens = 167Grammar still awaiting trigger after token 78191 (`assistant`)slot process_toke: id 0 | task 0 | n_decoded = 1, n_remaining = -1, next token: 78191 'assistant'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 1que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 2, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 168, n_cache_tokens = 168, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 198 (``)slot process_toke: id 0 | task 0 | n_decoded = 2, n_remaining = -1, next token: 198 ''srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 2que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 3, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 169, n_cache_tokens = 169, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 9906 (`Hello`)slot process_toke: id 0 | task 0 | n_decoded = 3, n_remaining = -1, next token: 9906 'Hello'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 3que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 4, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 170, n_cache_tokens = 170, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 0 (`!`)slot process_toke: id 0 | task 0 | n_decoded = 4, n_remaining = -1, next token: 0 '!'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 4que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 5, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 171, n_cache_tokens = 171, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 2650 (` How`)slot process_toke: id 0 | task 0 | n_decoded = 5, n_remaining = -1, next token: 2650 ' How'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 5que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 6, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 172, n_cache_tokens = 172, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 649 (` can`)slot process_toke: id 0 | task 0 | n_decoded = 6, n_remaining = -1, next token: 649 ' can'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 6que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 7, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 173, n_cache_tokens = 173, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 358 (` I`)slot process_toke: id 0 | task 0 | n_decoded = 7, n_remaining = -1, next token: 358 ' I'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 7que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 8, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 174, n_cache_tokens = 174, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 7945 (` assist`)slot process_toke: id 0 | task 0 | n_decoded = 8, n_remaining = -1, next token: 7945 ' assist'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 8que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 9, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 175, n_cache_tokens = 175, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 499 (` you`)slot process_toke: id 0 | task 0 | n_decoded = 9, n_remaining = -1, next token: 499 ' you'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 9que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 10, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 176, n_cache_tokens = 176, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 3432 (` today`)slot process_toke: id 0 | task 0 | n_decoded = 10, n_remaining = -1, next token: 3432 ' today'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 10que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 11, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 177, n_cache_tokens = 177, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 30 (`?`)slot process_toke: id 0 | task 0 | n_decoded = 11, n_remaining = -1, next token: 30 '?'srv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 11que start_loop: update slotssrv update_slots: posting NEXT_RESPONSEque post: new task, id = 12, front = 0slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 131072, n_past = 178, n_cache_tokens = 178, truncated = 0srv update_slots: decoding batch, n_tokens = 1Grammar still awaiting trigger after token 128009 (`<|eot_id|>`)slot process_toke: id 0 | task 0 | stopped by EOSslot process_toke: id 0 | task 0 | n_decoded = 12, n_remaining = -1, next token: 128009 ''slot release: id 0 | task 0 | stop processing: n_past = 178, truncated = 0slot print_timing: id 0 | task 0 | prompt eval time = 352.32 ms / 167 tokens ( 2.11 ms per token, 474.00 tokens per second) eval time = 656.33 ms / 12 tokens ( 54.69 ms per token, 18.28 tokens per second) total time = 1008.65 ms / 179 tokenssrv send: sending result for task id = 0srv send: task id = 0 pushed to result queuesrv update_slots: run slots completedque start_loop: waiting for new tasksque start_loop: processing new tasksque start_loop: processing task, id = 12que start_loop: update slotssrv update_slots: all slots are idleque start_loop: waiting for new taskssrv to_json_oaic: Parsing chat message: assistantHello! How can I assist you today?Failed to parse functionary v3.2 input: Failed to parse json tool call arguments: assistantHello! How can I assist you today?srv remove_waiti: remove task 0 from waiting list. current waiting = 1 (before remove)srv log_server_r: request: POST /v1/chat/completions 172.17.0.1 200srv log_server_r: request: {"model": "gpt-3.5-turbo","tools": [ {"type":"function","function":{"name":"python","description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.","parameters":{"type":"object","properties":{"code":{"type":"string","description":"The code to run in the ipython interpreter." } },"required":["code"] } } }],"messages": [ {"role": "user","content": "Hi" }]}srv log_server_r: response: {"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"assistant\nHello! How can I assist you today?"}}],"created":1741214074,"model":"gpt-3.5-turbo","system_fingerprint":"b4783-a800ae46","object":"chat.completion","usage":{"completion_tokens":12,"prompt_tokens":167,"total_tokens":179},"id":"chatcmpl-mjWGBGon48tiZPymyQkXF1obPF9DKskh","__verbose":{"index":0,"content":"assistant\nHello! How can I assist you today?","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":12,"tokens_evaluated":167,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":131072,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"char ::= [^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})\nfirst-tool-call ::= python-call\npython-args ::= \"{\" space python-args-code-kv \"}\" space\npython-args-code-kv ::= \"\\\"code\\\"\" space \":\" space string\npython-call ::= \"python\\n\" python-args\npython-call2 ::= \">>>python\\n\" python-args\nroot ::= first-tool-call space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\nstring ::= \"\\\"\" char* \"\\\"\" space\n","grammar_trigger_words":[">>>python"],"grammar_trigger_tokens":[12958],"preserved_tokens":[12958],"chat_format":"Functionary v3.2","samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou can call any of the following tools to satisfy the user's requests: [\n {\n \"type\": \"function\",\n \"function\": {\n \"name\": \"python\",\n \"description\": \"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.\",\n \"parameters\": {\n \"type\": \"object\",\n \"properties\": {\n \"code\": {\n \"type\": \"string\",\n \"description\": \"The code to run in the ipython interpreter.\"\n }\n },\n \"required\": [\n \"code\"\n ]\n }\n }\n }\n]\n\nExample tool call syntax:\n\nassistant<|end_header_id|>\n\n>>>tool_name\n{\"arg1\": \"some_value\"}<|eot_id|>\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHi<|eot_id|><|start_header_id|>{role}<|end_header_id|>\n\n","has_new_line":true,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":178,"timings":{"prompt_n":167,"prompt_ms":352.317,"prompt_per_token_ms":2.109682634730539,"prompt_per_second":474.00494441085726,"predicted_n":12,"predicted_ms":656.331,"predicted_per_token_ms":54.694250000000004,"predicted_per_second":18.283457584663836}},"timings":{"prompt_n":167,"prompt_ms":352.317,"prompt_per_token_ms":2.109682634730539,"prompt_per_second":474.00494441085726,"predicted_n":12,"predicted_ms":656.331,"predicted_per_token_ms":54.694250000000004,"predicted_per_second":18.283457584663836}}
The text was updated successfully, but these errors were encountered:
You can see (if you don't filter your jq output) that the prompt ends with "<|start_header_id|>{role}<|end_header_id|>\n\n" which is just confuses the model. It adds an assistant\n prefix to try and set the record straight, which is... sweet I guess?
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
version: 4783 (a800ae4)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
i9-13900HX + NVIDIA GeForce RTX 4070
Models
https://huggingface.co/bartowski/functionary-small-v3.2-GGUF/blob/main/functionary-small-v3.2-Q4_K_M.gguf
Problem description & steps to reproduce
docker run --gpus all --rm --name llama.cpp -p 8080:8080 -v /etc/ssl/certs:/etc/ssl/certs:ro -v /home/ed/.llama.cpp/models:/root/.cache ghcr.io/ggml-org/llama.cpp:full-cuda -s --ctx-size 0 --jinja -fa -hf bartowski/functionary-small-v3.2-GGUF:Q4_K_M --host 0.0.0.0 -ngl 10 --verbose
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: