Improved quantize script #222

SuajCarrot · 2023-03-17T03:33:34Z

I improved the quantize script by adding error handling and allowing to select many models for quantization at once in the command line. I also converted it to Python for generalization as well as extensibility.

quantize.py

mattsta · 2023-03-17T17:10:08Z

quantize.py

+
+        model_path = os.path.join("models", model, "ggml-model-f16.bin")
+
+        for i in os.listdir(model_path):


there's another PR about parallelizing the quantizations. here it would be easy just to wrap this in a multiprocessing.Pool() (with the subprocess calls extracted out to a top-level function): https://docs.python.org/3/library/multiprocessing.html

I don't understand exactly what can be parallelized here, is it quantizing many models at the same time? The code from the latest commit removed the loop that acts upon the incorrect listing of the file (for i in os.listdir(model_path)) by the way.

@mattsta

Fixed and improved many things in the script based on the reviews made by @mattsta. The parallelization suggestion is still to be revised, but code for it was still added (commented).

The original Bash script uses a glob pattern to match files that have endings such as ...bin.0, ...bin.1, etc. That has been translated correctly to Python now.

sw · 2023-03-19T10:56:19Z

I think it's a good idea to remove the requirement for a Unix shell.
See also #285 which would be made obsolete by this, assuming the Python script works on Windows?

I suggest to remove quantize.sh and update the readme to use the python script.

New code to set the name of the quantize script binary depending on the platform has been added (quantize.exe if working on Windows) and the README.md file has been updated to use this script instead of the Bash one.

SuajCarrot · 2023-03-19T16:34:06Z

Thank you for your comment @sw, I just pushed a commit that applies the changes you suggested. Let me know if there's anything else that should be done to achieve full compatibility with Windows.

quantize.py

Fixed a typo regarding the new filenames of the quantized models and removed the shell=True parameter in the subprocess.run call as it was conflicting with the list of parameters.

This was making the automatic help message to be suggesting the program's usage as being literally "$ Quantization Script [arguments]". It should now be something like "$ python3 quantize.py [arguments]".

ggerganov · 2023-03-19T18:34:47Z

Should we merge now or wait for someone to test on Windows?

@SuajCarrot maybe keep the .sh for now and add a comment that it is deprecated. We will remove it later.

SuajCarrot · 2023-03-19T19:07:48Z

Thank you for merging! Should I create another pull request with the Bash script added back as well as its deprecation notice in the README?

ggerganov · 2023-03-19T19:09:31Z

It was confirmed in #285 that it works on Windows, so no need to do it

SuajCarrot · 2023-03-19T19:11:16Z

That's great, thank you.

Improved quantize script

f8db3d6

I improved the quantize script by adding error handling and allowing to select many models for quantization at once in the command line. I also converted it to Python for generalization as well as extensibility.

gjmulder added the enhancement New feature or request label Mar 17, 2023

mattsta reviewed Mar 17, 2023

View reviewed changes

quantize.py Outdated Show resolved Hide resolved

mattsta reviewed Mar 17, 2023

View reviewed changes

quantize.py Outdated Show resolved Hide resolved

mattsta reviewed Mar 17, 2023

View reviewed changes

quantize.py Outdated Show resolved Hide resolved

mattsta reviewed Mar 17, 2023

View reviewed changes

quantize.py Outdated Show resolved Hide resolved

mattsta reviewed Mar 17, 2023

View reviewed changes

SuajCarrot added 3 commits March 18, 2023 21:36

Fixes and improvements based on Matt's observations

2ab3311

Fixed and improved many things in the script based on the reviews made by @mattsta. The parallelization suggestion is still to be revised, but code for it was still added (commented).

Small fixes to the previous commit

01237dd

Corrected to use the original glob pattern

c028226

The original Bash script uses a glob pattern to match files that have endings such as ...bin.0, ...bin.1, etc. That has been translated correctly to Python now.

Added support for Windows and updated README to use this script

e2bfaeb

New code to set the name of the quantize script binary depending on the platform has been added (quantize.exe if working on Windows) and the README.md file has been updated to use this script instead of the Bash one.

sw reviewed Mar 19, 2023

View reviewed changes

quantize.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into suajcarrot-changes

5d864c1

sw reviewed Mar 19, 2023

View reviewed changes

quantize.py Outdated Show resolved Hide resolved

SuajCarrot added 3 commits March 19, 2023 12:15

Fixed a typo and removed shell=True in the subprocess.run call

b802b78

Fixed a typo regarding the new filenames of the quantized models and removed the shell=True parameter in the subprocess.run call as it was conflicting with the list of parameters.

Corrected previous commit

c389c69

Small tweak: changed the name of the program in argparse

e9c3343

This was making the automatic help message to be suggesting the program's usage as being literally "$ Quantization Script [arguments]". It should now be something like "$ python3 quantize.py [arguments]".

sw approved these changes Mar 19, 2023

View reviewed changes

sw mentioned this pull request Mar 19, 2023

Fix scripts to support cross-platform execution #285

Closed

ggerganov approved these changes Mar 19, 2023

View reviewed changes

ggerganov merged commit 7392f1c into ggml-org:master Mar 19, 2023

This was referenced Mar 19, 2023

Python3 script instead of bash #184

Closed

Parallel Quantize.sh, add & #106

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved quantize script #222

Improved quantize script #222

SuajCarrot commented Mar 17, 2023

mattsta Mar 17, 2023

SuajCarrot Mar 19, 2023

sw commented Mar 19, 2023

SuajCarrot commented Mar 19, 2023

ggerganov commented Mar 19, 2023

SuajCarrot commented Mar 19, 2023

ggerganov commented Mar 19, 2023

SuajCarrot commented Mar 19, 2023


		model_path = os.path.join("models", model, "ggml-model-f16.bin")

		for i in os.listdir(model_path):

Improved quantize script #222

Improved quantize script #222

Conversation

SuajCarrot commented Mar 17, 2023

mattsta Mar 17, 2023

Choose a reason for hiding this comment

SuajCarrot Mar 19, 2023

Choose a reason for hiding this comment

sw commented Mar 19, 2023

SuajCarrot commented Mar 19, 2023

ggerganov commented Mar 19, 2023

SuajCarrot commented Mar 19, 2023

ggerganov commented Mar 19, 2023

SuajCarrot commented Mar 19, 2023