Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Quantize.sh, add & #106

Closed
tljstewart opened this issue Mar 13, 2023 · 6 comments
Closed

Parallel Quantize.sh, add & #106

tljstewart opened this issue Mar 13, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@tljstewart
Copy link

tljstewart commented Mar 13, 2023

@prusnak

./quantize "$i" "${i/f16/q4_0}" 2 &

@prusnak
Copy link
Collaborator

prusnak commented Mar 14, 2023

The fix need to be more elaborate, because if you pass --remove-f16 then the rm command is called before ./quantize has finished.

Can you come up with a solution that does not have this issue?

@prusnak
Copy link
Collaborator

prusnak commented Mar 14, 2023

This should work:

Yes, this works. But now I realised this completely defeats the purpose of the remove flag. The remove flag is there to save disk space after each conversion has been done. So this means the remove flag only makes sense when processing the files one after each other.

@ggerganov Do you think it makes sense to run the script in parallel by default and switch to serial processing when --remove-f16 is provided or do we want to have a separate orthogonal flag for parallel/serial processing?

@tljstewart
Copy link
Author

ah I see what you mean, swapping disk resources

@ggerganov
Copy link
Member

I think it is better to multi-thread the quantize.cpp program.
Each tensor is divided in n parts and each of the n threads quantizes the corresponding part.
This way, even when quantizing the 7B model which has only 1 part, we will utilize all available CPU resources and still gain performance.

If you agree, either reformulate this issue and add "good first issue" tag or create a new one and close this.

@prusnak
Copy link
Collaborator

prusnak commented Mar 14, 2023

I think it is better to multi-thread the quantize.cpp program.

I agree. This makes sense especially for this reason:

This way, even when quantizing the 7B model which has only 1 part, we will utilize all available CPU resources

If you agree, ...

ACK

FWIW, I really respect your shell skills @tljstewart 👍

@gjmulder gjmulder added the enhancement New feature or request label Mar 15, 2023
@prusnak
Copy link
Collaborator

prusnak commented Mar 19, 2023

Done another way (rewrite to python) in #222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants