-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved quantize script #222
Conversation
I improved the quantize script by adding error handling and allowing to select many models for quantization at once in the command line. I also converted it to Python for generalization as well as extensibility.
quantize.py
Outdated
|
||
model_path = os.path.join("models", model, "ggml-model-f16.bin") | ||
|
||
for i in os.listdir(model_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's another PR about parallelizing the quantizations. here it would be easy just to wrap this in a multiprocessing.Pool()
(with the subprocess calls extracted out to a top-level function): https://docs.python.org/3/library/multiprocessing.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand exactly what can be parallelized here, is it quantizing many models at the same time? The code from the latest commit removed the loop that acts upon the incorrect listing of the file (for i in os.listdir(model_path)
) by the way.
Fixed and improved many things in the script based on the reviews made by @mattsta. The parallelization suggestion is still to be revised, but code for it was still added (commented).
The original Bash script uses a glob pattern to match files that have endings such as ...bin.0, ...bin.1, etc. That has been translated correctly to Python now.
I think it's a good idea to remove the requirement for a Unix shell. I suggest to remove quantize.sh and update the readme to use the python script. |
New code to set the name of the quantize script binary depending on the platform has been added (quantize.exe if working on Windows) and the README.md file has been updated to use this script instead of the Bash one.
Thank you for your comment @sw, I just pushed a commit that applies the changes you suggested. Let me know if there's anything else that should be done to achieve full compatibility with Windows. |
Fixed a typo regarding the new filenames of the quantized models and removed the shell=True parameter in the subprocess.run call as it was conflicting with the list of parameters.
This was making the automatic help message to be suggesting the program's usage as being literally "$ Quantization Script [arguments]". It should now be something like "$ python3 quantize.py [arguments]".
Should we merge now or wait for someone to test on Windows? @SuajCarrot maybe keep the |
Thank you for merging! Should I create another pull request with the Bash script added back as well as its deprecation notice in the README? |
It was confirmed in #285 that it works on Windows, so no need to do it |
That's great, thank you. |
I improved the quantize script by adding error handling and allowing to select many models for quantization at once in the command line. I also converted it to Python for generalization as well as extensibility.