-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Description
Motivation
I was reading thanks to @julien-c this reddit post from @he29-net 👍
You can't easily tell whether a model was quantized with the help of importance matrix just from the name. I first found this annoying, because it was not clear if and how the calibration dataset affects performance of the model in other than just positive ways. But recent tests in llama.cpp discussion #5263 show, that while the data used to prepare the imatrix slightly affect how it performs in (un)related languages or specializations, any dataset will perform better than a "vanilla" quantization with no imatrix. So now, instead, I find it annoying because sometimes the only way to be sure I'm using the better imatrix version is to re-quantize the model myself.
Proposal
-
Add at the end of the
imatrixbinary file the dataset name on which the imatrix was computed on -
Add following KV in
quantize:quantize.imatrix.fileFilename of the provided imatrix during quantizationquantize.imatrix.entries_countNumber of entries in the imatrixquantize.imatrix.datasetDataset from the imatrixquantize.imatrix.chunks_countNumber of chunks the imatrix was computed with
Ideally I would also add both imatrix and dataset files hashes in the metadata, but I am not sure this is supported and appropriate.