-
Notifications
You must be signed in to change notification settings - Fork 14k
quantize: add imatrix and dataset metadata in GGUF
#6658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
quantize: factorize KV Overrides parsing between common #6656
…pile on some toolchain
|
We might also add the number of chunks the imatrix was computed with |
|
@ggerganov, is this general approach relevant ? |
common: free kv override if used after model loading
…ntize/imatrix-metadata
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@slaren, can you please have a second check and merge it if approved |
slaren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also realized that llama_model_quantize_params::kv_overrides is a pointer to a std::vector for no reason whatsoever. It would be great if that could be fixed as well.
…ed from a pair of iterators. Co-authored-by: slaren <slarengh@gmail.com>
slaren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still need to change llama_model_quantize_params::kv_overrides to be a pointer to llama_model_kv_override rather than a std::vector, but it can be done in other PR.
|
While I appreciate adding this metadata, I think there is a privacy concern here - how about only storing the filename and not the complete path (which might leak sensitive data such as the username). |
|
Good point. Meanwhile, you can use kv overrides. |
Context
In the context of:
quantize: add imatrix and dataset metadata in GGUF #6656Add imatrix related metadata in quantum models.
Changes
Tests
./gguf-py/scripts/gguf-dump.py models/phi-2-q4_k_m.gguf 23: UINT32 | 1 | general.quantization_version = 2 24: STRING | 1 | my_metadata = 'best-quantum-model-ever' 25: STRING | 1 | quantize.imatrix.file = 'imatrix-f16.imatrix' 26: STRING | 1 | quantize.imatrix.dataset = 'wikitext-2-raw/wiki.train.raw' 27: INT32 | 1 | quantize.imatrix.entries_count = 192 28: INT32 | 1 | quantize.imatrix.chunks_count = 20 ./build/bin/main --model .models/phi-2-q4_k_m.gguf \ -ngl 33 \ --random-prompt \ --override-kv my_metadata_2=str:best-quantum-model-ever-2 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. ... llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - kv 20: my_metadata str = best-quantum-model-ever llama_model_loader: - kv 21: quantize.imatrix.file str = imatrix-f16.imatrix llama_model_loader: - kv 22: quantize.imatrix.dataset str = wikitext-2-raw/wiki.train.raw llama_model_loader: - kv 23: quantize.imatrix.entries_count i32 = 192 llama_model_loader: - kv 24: quantize.imatrix.chunks_count i32 = 20Closes #6656