Detect GigaChat3-10-A1.8B as deepseek lite #17420

ubergarm · 2025-11-21T01:59:52Z

Hardcodes checking number of layers to detect if a model is the lite version of deepseek.

Tested with bf16 and q8_0 version of GigaChat3-10B-A1.8B and discussed realizing it was a lite version similar to DeepSeek-V2-Lite. That model had 27 layers, but GigaChat3 has 26 and that is used to detect the lite variant as discussed here: https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B/discussions/1#691fb161ac024c8eb626ab36

I'd like if anyone else could test. I'll update after testing perplexity to make sure the value looks sane. I haven't uploaded a gguf yet as the template has a parse error and wanted to get it updated before baking it in. That is discussed here: https://huggingface.co/ai-sage/GigaChat3-702B-A36B-preview-bf16/discussions/1

ubergarm · 2025-11-21T02:14:43Z

I added a test Q8_0 gguf here: https://huggingface.co/ubergarm/GigaChat3-10B-A1.8B-GGUF/tree/main

$ export model=/mnt/data/models/ubergarm/GigaChat3-10B-A1.8B-GGUF/GigaChat3-10B-A1.8B-Q8_0.gguf
$ ./build/bin/llama-server \
    --model "$model"\
    --alias ubergarm/GigaChat3-10B-A1.8B-GGUF \
    --ctx-size 32768 \
    --parallel 1 \
    --threads 8 \
    --host 127.0.0.1 \
    --port 8080 \
    --jinja

llama_model_loader: - type  f32:  129 tensors
llama_model_loader: - type q8_0:  285 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 10.57 GiB (8.51 BPW)

ubergarm · 2025-11-21T02:30:31Z

Oops i need to get rid of an accidently commit file, will force push to fix.

Hardcodes checking number of layers to detect if lite version of deepseek.

ubergarm · 2025-11-21T04:06:47Z

Perplexity seems reasonable:

==> logs/perplexity-GigaChat3-10B-A1.8B-BF16.log <==
- Final estimate: PPL = 6.7302 +/- 0.04230
==> logs/perplexity-GigaChat3-10B-A1.8B-Q8_0.log <==
- Final estimate: PPL = 6.7265 +/- 0.04226

Always a bit funky when quants have lower perplexity than the original bf16... though it happens sometimes and in this case the values are very similar within the noise.

CISC

Can you add a comment describing which models we are detecting in case this needs to be fine-tuned in the future?

deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B

CISC · 2025-11-21T13:51:10Z

Thank you!

* Detect GigaChat3-10-A1.8B as deepseek lite Hardcodes checking number of layers to detect if lite version of deepseek. * Add commnent identifying deepseek lite variants deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B

ubergarm requested a review from CISC as a code owner November 21, 2025 01:59

github-actions bot added the model Model specific label Nov 21, 2025

loci-dev mentioned this pull request Nov 21, 2025

UPSTREAM PR #17420: Detect GigaChat3-10-A1.8B as deepseek lite auroralabs-loci/llama.cpp#273

Open

Detect GigaChat3-10-A1.8B as deepseek lite

e1bfe51

Hardcodes checking number of layers to detect if lite version of deepseek.

ubergarm force-pushed the ug/gigachat3-lite branch from 3cddaab to e1bfe51 Compare November 21, 2025 02:33

ubergarm mentioned this pull request Nov 21, 2025

GigaChat3 models check_tensor_dims has wrong shape ikawrakow/ik_llama.cpp#994

Closed

CISC approved these changes Nov 21, 2025

View reviewed changes

Add commnent identifying deepseek lite variants

8efaefd

deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B

CISC merged commit 23bc779 into ggml-org:master Nov 21, 2025
1 check passed

ubergarm deleted the ug/gigachat3-lite branch November 21, 2025 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect GigaChat3-10-A1.8B as deepseek lite #17420

Detect GigaChat3-10-A1.8B as deepseek lite #17420

Uh oh!

ubergarm commented Nov 21, 2025

Uh oh!

ubergarm commented Nov 21, 2025

Uh oh!

ubergarm commented Nov 21, 2025

Uh oh!

ubergarm commented Nov 21, 2025

Uh oh!

CISC left a comment

Uh oh!

CISC commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Detect GigaChat3-10-A1.8B as deepseek lite #17420

Detect GigaChat3-10-A1.8B as deepseek lite #17420

Uh oh!

Conversation

ubergarm commented Nov 21, 2025

Uh oh!

ubergarm commented Nov 21, 2025

Uh oh!

ubergarm commented Nov 21, 2025

Uh oh!

ubergarm commented Nov 21, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

CISC commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants