mtmd: add --image-warmup-tokens #17638

SmartestWashingMachine · 2025-12-01T07:04:47Z

Noticed that a Q4_1 Qwen3-VL 2B mmproj reserved a surprisingly large amount of memory for one of its compute buffers during the image warmup part - larger than the model itself!

Looking at the code, it seems that image warmup sizes are hard-coded for different models. For Qwen3-VL it's 2116 tokens, or an image with 1472 x 1472 dimensions. If I understand correctly, llama-cpp initially reserves memory proportional to the size of that warmup image.

But some users may be certain that their images will never exceed some dimensions (e.g: OCR'ing single lines of text or their preprocessing pipeline caps images at 512 x 512), so they may want a smaller max image warmup size, reducing the memory consumption.

A few hundred MB cut down doesn't sound like much on its own, but it might help for development on the edge.

Before (Initial behavior)

With `--image-warmup-tokens 256`

ngxson · 2025-12-01T11:31:26Z

I'd prefer having a warmup option like in libllama. image-warmup-tokens is quite low-level I think, it probably not very future-proof as we can probably have other strategy for warming up in the future.

The warmup option should matches the common_params::warmup

ngxson · 2025-12-01T11:38:12Z

Superseded by #17652

It will be more suitable in your use case, as you know the image size in advanced, not the number of tokens

SmartestWashingMachine · 2025-12-01T12:17:20Z

Oh, that's even better. Thanks for taking the time to look into this!

mtmd: add --image-warmup-tokens

6003e22

SmartestWashingMachine requested review from ggerganov and ngxson as code owners December 1, 2025 07:04

github-actions bot added examples server labels Dec 1, 2025

loci-dev mentioned this pull request Dec 1, 2025

UPSTREAM PR #17638: mtmd: add --image-warmup-tokens auroralabs-loci/llama.cpp#384

Open

ngxson mentioned this pull request Dec 1, 2025

mtmd: add mtmd_context_params::warmup option #17652

Merged

ngxson closed this Dec 1, 2025

loci-dev mentioned this pull request Dec 1, 2025

UPSTREAM PR #17652: mtmd: add mtmd_context_params::warmup option auroralabs-loci/llama.cpp#388

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd: add --image-warmup-tokens #17638

mtmd: add --image-warmup-tokens #17638

Uh oh!

SmartestWashingMachine commented Dec 1, 2025

Uh oh!

ngxson commented Dec 1, 2025

Uh oh!

ngxson commented Dec 1, 2025

Uh oh!

SmartestWashingMachine commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mtmd: add --image-warmup-tokens #17638

mtmd: add --image-warmup-tokens #17638

Uh oh!

Conversation

SmartestWashingMachine commented Dec 1, 2025

Before (Initial behavior)

With --image-warmup-tokens 256

Uh oh!

ngxson commented Dec 1, 2025

Uh oh!

ngxson commented Dec 1, 2025

Uh oh!

SmartestWashingMachine commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

With `--image-warmup-tokens 256`