Parallel Quantize.sh, add & #106

tljstewart · 2023-03-13T23:07:05Z

./quantize "$i" "${i/f16/q4_0}" 2 &

The text was updated successfully, but these errors were encountered:

prusnak · 2023-03-14T08:07:23Z

The fix need to be more elaborate, because if you pass --remove-f16 then the rm command is called before ./quantize has finished.

Can you come up with a solution that does not have this issue?

prusnak · 2023-03-14T16:13:24Z

This should work:

Yes, this works. But now I realised this completely defeats the purpose of the remove flag. The remove flag is there to save disk space after each conversion has been done. So this means the remove flag only makes sense when processing the files one after each other.

@ggerganov Do you think it makes sense to run the script in parallel by default and switch to serial processing when --remove-f16 is provided or do we want to have a separate orthogonal flag for parallel/serial processing?

tljstewart · 2023-03-14T16:20:41Z

ah I see what you mean, swapping disk resources

ggerganov · 2023-03-14T19:40:14Z

I think it is better to multi-thread the quantize.cpp program.
Each tensor is divided in n parts and each of the n threads quantizes the corresponding part.
This way, even when quantizing the 7B model which has only 1 part, we will utilize all available CPU resources and still gain performance.

If you agree, either reformulate this issue and add "good first issue" tag or create a new one and close this.

prusnak · 2023-03-14T20:24:28Z

I think it is better to multi-thread the quantize.cpp program.

I agree. This makes sense especially for this reason:

This way, even when quantizing the 7B model which has only 1 part, we will utilize all available CPU resources

If you agree, ...

ACK

FWIW, I really respect your shell skills @tljstewart 👍

prusnak · 2023-03-19T19:54:08Z

Done another way (rewrite to python) in #222

gjmulder mentioned this issue Mar 14, 2023

🚀 Dockerize llamacpp #132

Merged

gjmulder added the enhancement New feature or request label Mar 15, 2023

prusnak closed this as completed Mar 19, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Quantize.sh, add & #106

Parallel Quantize.sh, add & #106

tljstewart commented Mar 13, 2023 •

edited

Loading

prusnak commented Mar 14, 2023

prusnak commented Mar 14, 2023

tljstewart commented Mar 14, 2023

ggerganov commented Mar 14, 2023

prusnak commented Mar 14, 2023

prusnak commented Mar 19, 2023

Parallel Quantize.sh, add & #106

Parallel Quantize.sh, add & #106

Comments

tljstewart commented Mar 13, 2023 • edited Loading

prusnak commented Mar 14, 2023

prusnak commented Mar 14, 2023

tljstewart commented Mar 14, 2023

ggerganov commented Mar 14, 2023

prusnak commented Mar 14, 2023

prusnak commented Mar 19, 2023

tljstewart commented Mar 13, 2023 •

edited

Loading