-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gguf-py: Refactor and allow reading/modifying existing GGUF files #3981
gguf-py: Refactor and allow reading/modifying existing GGUF files #3981
Conversation
@cebtenzzre I'm not really sure how to resolve these conflicts without basically reverting then manually editing in your changes since I moved the file content and then changed it. Would you be okay with that? Or is there a better way? |
Yeah, that's basically the only way to do it. |
Credit to @cebtenzzre for that pull
0475e44
to
8047aa1
Compare
4ab9105
to
d7688dc
Compare
Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly
Hey, we are going to have a conflict once #3838 gets merged. You'll need to revert your changes for Thanks for GGUFReader! It was definitely needed #3838 (comment) |
Not a problem. Thanks for the heads up! If you wanted to, you could just make that change in #3838, it isn't specific to the other GGUF changes. Just something weird I noticed when I was looking at those scripts. Whichever way you prefer is fine. |
@chenqiny Hi, not sure if you're interested but this pull is supposed to add the ability to read/modify GGUF files and transparently support working with GGUF files created on a different endian than the machine where the script is running. So you should be able to use this to open and change a little endian GGUF file on your big endian machine and vice versa. I don't really have a good way to test that though since all my machines are LE and I'm not sure where to even find a BE GGUF model. |
gguf-py/examples/dump_gguf.py
Outdated
from gguf import GGUFReader, GGUFValueType # noqa: E402 | ||
|
||
def dump_gguf(filename: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be two blank lines before and after a top-level function. Same with the other two examples.
Also, the examples should be marked executable - otherwise, the shebang lines don't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the suggestions (and especially the actual bugs you caught). I really appreciate the time you've spent helping improve this pull!
What are you using for formatting and would you be able to share your configuration? I'd be perfectly happy to turn on Python auto formatting if there's a standard for the Python code in this repo to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linters I'm using are:
- mypy for type checking (there is a mypy.ini in the repo and all new python scripts should pass mypy)
- isort for import sorting (for this repo, basically
isort **/*.py -l 120 --tc -m VERTICAL_HANGING_INDENT
) - and flake8 for PEP 8 style checking.
My flake8 configuration is messy, but I've done pip install wemake-python-styleguide
and then turned off everything I don't care about. This ridiculous command should reproduce the way I'm using flake8 for llama.cpp (most of this is hidden behind a shell alias):
flake8 **/*.py --max-line-length=120 --ignore=D,DAR,I,S,A003,E121,E123,E126,E127,E201,E202,E203,E211,E221,E222,E226,E241,E251,E261,E266,E272,E306,E402,E704,E731,E741,E800,F403,F811,N400,N801,N803,N806,N812,N813,P101,P103,P205,Q000,T001,U101,W503,W504,WPS102,WPS110,WPS111,WPS113,WPS114,WPS115,WPS117,WPS120,WPS122,WPS125,WPS201,WPS202,WPS203,WPS204,WPS210,WPS211,WPS212,WPS213,WPS214,WPS218,WPS220,WPS221,WPS222,WPS223,WPS224,WPS225,WPS226,WPS229,WPS230,WPS231,WPS232,WPS234,WPS235,WPS236,WPS237,WPS238,WPS300,WPS301,WPS302,WPS304,WPS305,WPS306,WPS316,WPS317,WPS318,WPS319,WPS320,WPS322,WPS323,WPS326,WPS331,WPS332,WPS336,WPS337,WPS347,WPS348,WPS352,WPS360,WPS361,WPS362,WPS400,WPS405,WPS407,WPS412,WPS414,WPS420,WPS421,WPS422,WPS427,WPS428,WPS429,WPS430,WPS431,WPS432,WPS433,WPS434,WPS435,WPS436,WPS437,WPS440,WPS441,WPS442,WPS450,WPS457,WPS458,WPS459,WPS460,WPS463,WPS464,WPS501,WPS504,WPS508,WPS509,WPS510,WPS513,WPS518,WPS526,WPS602,WPS604,WPS605,WPS606,WPS608,WPS609,WPS611,WPS613
There is a lot of subjectivity with flake8, even that command leaves some checks enabled that don't really matter IMO. And normally I leave E251 enabled, but the style in this repo seems to use spaces around '=' in keyword arguments.
Sure. I will test it in weekend. |
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm --thanks for taking this! I think it's good to merge.
Thanks for taking the time to review it, and also many thanks to cebtenzzre who found a bunch of bugs and cleaned up a lot of stuff! I'll go ahead and merge it later today unless anyone finds problems in the mean time. |
Actually, I found one more thing I'm going to change (after testing that it doesn't break anything).
Forcing the value to little endian. I'm pretty sure that means using |
… ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u
I made a few extra changes since you approved so I want to double check. I was looking at the code for big endian and Q8_0 and realized it can't really work unless the byteswapping happens before the quantization part. However, currently that's in There are also some changes to I also made it possible to load merges from The last thing I changed is possibly more controversial. I changed I also didn't add it to the official GGUF keys constants. (But I can, if people think it's a good idea.) I think we should support at least including that information in the metadata. Models that don't want something like a BOS token prepended to their prompt can have severely degraded quality when you do it anyway. |
@monatis This is just waiting for your review, so if you don't have any issues with the latest changes once you see this you can go ahead and just merge it. |
Thanks again @KerfuffleV2 for the good job in this PR and taking into account every review patiently. |
@KerfuffleV2 I got following error. chenqiny@datalake:/database/llama.cpp/gguf-py$ /home/chenqiny/.local/bin/gguf-convert-endian /database/models--meta-llama--Llama-2-7b/snapshots/365ffa8f1a6c455d3e2028ae658236b4b85ba824/ggml-model-f16-little.gguf big
|
…erganov#3981) * gguf-py: Refactor and add file reading support * Replay changes from ggerganov#3871 Credit to @cebtenzzre for that pull * Various type annotation fixes. * sort imports with isort (again) * Fix missing return statement in add_tensor * style cleanup with flake8 * fix NamedTuple and Enum usage * Fix an issue with state init in GGUFReader Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly * Damagage is not a word. * Clean up gguf-py/examples/modify_gguf.py whitespace Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/examples/modify_gguf.py formatting Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/gguf/gguf_reader.py type hint Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Make examples executable, formatting changes * Add more information to GGUFReader and examples comments * Include a gguf Python package version bump * Add convert-gguf-endian.py script * cleanup * gguf-py : bump minor version * Reorganize scripts * Make GGUFReader endian detection less arbitrary * Add JSON dumping support to gguf-dump.py Which I kind of regret now * A few for gguf-dump.py cleanups * Murder accidental tuple in gguf-py/scripts/gguf-dump.py Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * cleanup * constants : remove unneeded type annotations * fix python 3.8 compat * Set up gguf- scripts in pyproject.toml * And include scripts/__init__.py, derp * convert.py: We can't currently support Q8_0 on big endian. * gguf-py: SpecialVocab: Always try available sources for special token ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u * cleanup * Promote add_X_token to GGUF metadata for BOS and EOS --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
This is a big one.
gguf
Python package into logical modules.gguf-py/gguf
was silly because that makes it loadgguf.py
specifically, rather thangguf
as a package. Fixed that.The GGUF file reading is done using
numpy
'smemmap
feature. It also supports endian swapped views. In addition, you can map the file read/write and it's possible to useGGUFReader
to get a writeable view of the GGUF data (obviously it's necessary to be very careful when writing it, and currently doing anything that changes the size of a field isn't really possible). However, doing something like being able to change a scalar value without rewriting the whole file is nice.There is an example of dumping a GGUF file at the end of
gguf_reader.py
. Here is example output:edit: This is now a separate script in
gguf-py/scripts/gguf-dump.py
edit: I wanted to change the BOS token in my Yi model to
2
instead of1
(their tokenizer config actually says don't add BOS but we don't respect it). Making that change is pretty easy now:edit: I added some examples demonstrating the reader stuff. With this pull checked out, you can run:
to dump the metadata in a GGUF file.
There is also an example to allow changing simple metadata values in a GGUF file (I.E. integers, floats). Here's an example of changing the BOS token id to
1
:Note: It will prompt you if you're sure before actually making any changes. That example only supports simple values, however the
GGUFReader
API will let you change anything (although the GGUF format it self currently prevents changes where the length of a field would be different).