mtmd: add --image-warmup-tokens #17638
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Noticed that a Q4_1 Qwen3-VL 2B mmproj reserved a surprisingly large amount of memory for one of its compute buffers during the image warmup part - larger than the model itself!
Looking at the code, it seems that image warmup sizes are hard-coded for different models. For Qwen3-VL it's
2116tokens, or an image with1472 x 1472dimensions. If I understand correctly, llama-cpp initially reserves memory proportional to the size of that warmup image.But some users may be certain that their images will never exceed some dimensions (e.g: OCR'ing single lines of text or their preprocessing pipeline caps images at
512 x 512), so they may want a smaller max image warmup size, reducing the memory consumption.A few hundred MB cut down doesn't sound like much on its own, but it might help for development on the edge.
Before (Initial behavior)
With
--image-warmup-tokens 256