bug: Wrong maximum context length for qwen2.5-coder #3714

alexbfr · 2024-09-21T08:17:15Z

Jan version

v0.5.4

Describe the Bug

Using qwen-2.5-coder-7b-instruct, Jan allows a maximum context length of 2048 tokens. However, both according to qwen's website as well as llama.cpp's output (most recent version from github at the time of writing), the maximum context length is 131072 tokens.

build: 3787 (6026da52) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
system info: n_threads = 47, n_threads_batch = 47, total_threads = 48

system_info: n_threads = 47 (n_threads_batch = 47) / 48 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 

main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 47
main: loading model
llama_model_loader: additional 2 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 29 key-value pairs and 339 tensors from [...]/qwen-2.5/qwen2.5-coder-7b-instruct-q8_0-00001-of-00003.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct GGUF
llama_model_loader: - kv   3:                           general.finetune str              = Instruct-GGUF
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 28
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 7
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                                   split.no u16              = 0
llama_model_loader: - kv  27:                                split.count u16              = 3
llama_model_loader: - kv  28:                        split.tensors.count i32              = 339

Unfortunately this makes the model within Jan not really usable for coding related tasks.

Steps to Reproduce

Install current Jan release (0.5.4) as debian package
Download qwen-2.5-coder-7b-instruct 8b (https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/tree/main)
Import model into Jan
Select model in "Model" tab
Scroll down to "Context Length" in "Model tab"
2048 is the maximum allowed value

To my understanding, even without ROPE 2048 is way below the trained context length of qwen-2.5 coder.

Screenshots / Logs

Unfortunately, there are no logs under ~/jan/logs (I did a find over all of my home folder for an app.log file, but none were found).

Maybe I'll checkout the repo and debug this myself later on.

What is your OS?

MacOS
Windows
Linux

The text was updated successfully, but these errors were encountered:

imtuyethan · 2024-09-23T09:22:19Z

Model card says it supports ~32,768 tokens:

Seems like it's related to #2320
Or possibly related to #3558

imtuyethan · 2024-10-01T13:03:50Z

LGTM on v0.5.4-650

From (janhq/jan#3714 (comment)), we know that the context length for GGUF models are 32768. The full context length of 131072, one has to refer to non-GGUF models.

alexbfr added the type: bug Something isn't working label Sep 21, 2024

github-project-automation bot added this to Jan & Cortex Sep 21, 2024

imtuyethan self-assigned this Sep 23, 2024

imtuyethan moved this to Need Investigation in Jan & Cortex Sep 23, 2024

imtuyethan added os: linux Linux issues category: model running labels Sep 23, 2024

imtuyethan assigned louis-jan and unassigned imtuyethan Sep 23, 2024

imtuyethan moved this from Need Investigation to Scheduled in Jan & Cortex Sep 23, 2024

imtuyethan added this to the v0.5.5 milestone Sep 23, 2024

imtuyethan removed the os: linux Linux issues label Sep 23, 2024

louis-jan mentioned this issue Sep 23, 2024

fix: #3558 wrong model metadata import or download from HuggingFace #3725

Merged

louis-jan closed this as completed in #3725 Sep 24, 2024

github-project-automation bot moved this from In Review to Completed in Jan & Cortex Sep 24, 2024

imtuyethan moved this from Review + QA to Completed in Jan & Cortex Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Wrong maximum context length for qwen2.5-coder #3714

bug: Wrong maximum context length for qwen2.5-coder #3714

alexbfr commented Sep 21, 2024

imtuyethan commented Sep 23, 2024 •

edited

Loading

imtuyethan commented Oct 1, 2024

bug: Wrong maximum context length for qwen2.5-coder #3714

bug: Wrong maximum context length for qwen2.5-coder #3714

Comments

alexbfr commented Sep 21, 2024

Jan version

Describe the Bug

Steps to Reproduce

Screenshots / Logs

What is your OS?

imtuyethan commented Sep 23, 2024 • edited Loading

imtuyethan commented Oct 1, 2024

imtuyethan commented Sep 23, 2024 •

edited

Loading