add python/pytorch version compat notes #44

wizzard0 · 2023-03-12T10:53:42Z

see #32 (comments)

* RAM usage reduction and calculations Removed -b batch limit (1024) (tested up to-b 8192) Fixed a integer overflow in ggml matmul (happened at around nbatch 3000) Added a dynamic calculation for batched scratch memory consumption Overall reduced RAM buffer sizes by magnitudes for normal settings RAM usage scales quadratically with increasing context size * batch Using a small batch (or default 1) will result in a very small memory footprint even at thousands of tokens processed Tested up to 13,000 tokens prompt and 8k batch Needs more tests on various platforms * removed debug * minor ---------

python/pytorch compat notes

97a25c1

ggerganov merged commit b9bd1d0 into ggml-org:master Mar 12, 2023

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add python/pytorch version compat notes #44

add python/pytorch version compat notes #44

wizzard0 commented Mar 12, 2023

add python/pytorch version compat notes #44

add python/pytorch version compat notes #44

Conversation

wizzard0 commented Mar 12, 2023