Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf-py: Refactor and allow reading/modifying existing GGUF files #3981

Merged
merged 33 commits into from
Nov 11, 2023

Conversation

KerfuffleV2
Copy link
Collaborator

@KerfuffleV2 KerfuffleV2 commented Nov 7, 2023

This is a big one.

  1. Splits gguf Python package into logical modules.
  2. Makes the constant keys hierarchical (old constants left for compatibility)
  3. Some cleanups in the writer, especially when dealing with the endian struct format stuff.
  4. I realized the local GGUF thing in the scripts adding gguf-py/gguf was silly because that makes it load gguf.py specifically, rather than gguf as a package. Fixed that.
  5. Finally, GGUF file reading support!

The GGUF file reading is done using numpy's memmap feature. It also supports endian swapped views. In addition, you can map the file read/write and it's possible to use GGUFReader to get a writeable view of the GGUF data (obviously it's necessary to be very careful when writing it, and currently doing anything that changes the size of a field isn't really possible). However, doing something like being able to change a scalar value without rewriting the whole file is nice.

There is an example of dumping a GGUF file at the end of gguf_reader.py. Here is example output:

edit: This is now a separate script in gguf-py/scripts/gguf-dump.py

$ python gguf-py/gguf/gguf_reader.py /blah/yi-q4_k_m.gguf
* Loading: /blah/yi-q4_k_m.gguf

* Dumping 23 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 543
      3: UINT64     |        1 | GGUF.kv_count = 20
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'LLaMA v2'
      6: UINT32     |        1 | llama.context_length = 4096
      7: UINT32     |        1 | llama.embedding_length = 7168
      8: UINT32     |        1 | llama.block_count = 60
      9: UINT32     |        1 | llama.feed_forward_length = 20480
     10: UINT32     |        1 | llama.rope.dimension_count = 128
     11: UINT32     |        1 | llama.attention.head_count = 56
     12: UINT32     |        1 | llama.attention.head_count_kv = 8
     13: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     14: FLOAT32    |        1 | llama.rope.freq_base = 5000000.0
     15: UINT32     |        1 | general.file_type = 15
     16: STRING     |        1 | tokenizer.ggml.model = 'llama'
     17: [STRING]   |    64000 | tokenizer.ggml.tokens
     18: [FLOAT32]  |    64000 | tokenizer.ggml.scores
     19: [INT32]    |    64000 | tokenizer.ggml.token_type
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 1
     21: UINT32     |        1 | tokenizer.ggml.eos_token_id = 2
     22: UINT32     |        1 | tokenizer.ggml.padding_token_id = 0
     23: UINT32     |        1 | general.quantization_version = 2

* Dumping 543 tensor(s)
      1:  458752000 |  7168, 64000,     1,     1 | Q4_K    | token_embd.weight
      2:   51380224 |  7168,  7168,     1,     1 | Q4_K    | blk.0.attn_q.weight
      3:    7340032 |  7168,  1024,     1,     1 | Q4_K    | blk.0.attn_k.weight
      4:    7340032 |  7168,  1024,     1,     1 | Q6_K    | blk.0.attn_v.weight
      5:   51380224 |  7168,  7168,     1,     1 | Q4_K    | blk.0.attn_output.weight
      6:  146800640 |  7168, 20480,     1,     1 | Q4_K    | blk.0.ffn_gate.weight
      7:  146800640 | 20480,  7168,     1,     1 | Q6_K    | blk.0.ffn_down.weight
      8:  146800640 |  7168, 20480,     1,     1 | Q4_K    | blk.0.ffn_up.weight
      9:       7168 |  7168,     1,     1,     1 | F32     | blk.0.attn_norm.weight
     10:       7168 |  7168,     1,     1,     1 | F32     | blk.0.ffn_norm.weight

edit: I wanted to change the BOS token in my Yi model to 2 instead of 1 (their tokenizer config actually says don't add BOS but we don't respect it). Making that change is pretty easy now:

import gguf
reader = gguf.GGUFReader('/path/models/yi-q4_k_m.gguf', 'r+')
reader.fields['tokenizer.ggml.bos_token_id'].parts[-1][0] = 2

edit: I added some examples demonstrating the reader stuff. With this pull checked out, you can run:

python gguf-py/examples/dump_gguf.py /path/model.gguf

to dump the metadata in a GGUF file.

There is also an example to allow changing simple metadata values in a GGUF file (I.E. integers, floats). Here's an example of changing the BOS token id to 1:

python gguf-py/examples/modify_gguf.py /path/model.gguf tokenizer.ggml.bos_token_id 1

Note: It will prompt you if you're sure before actually making any changes. That example only supports simple values, however the GGUFReader API will let you change anything (although the GGUF format it self currently prevents changes where the length of a field would be different).

@KerfuffleV2 KerfuffleV2 added enhancement New feature or request script Script related labels Nov 7, 2023
@KerfuffleV2
Copy link
Collaborator Author

@cebtenzzre I'm not really sure how to resolve these conflicts without basically reverting then manually editing in your changes since I moved the file content and then changed it. Would you be okay with that?

Or is there a better way?

@cebtenzzre
Copy link
Collaborator

I'm not really sure how to resolve these conflicts without basically reverting then manually editing in your changes since I moved the file content and then changed it. Would you be okay with that?

Yeah, that's basically the only way to do it.

gguf-py/gguf/gguf.py Outdated Show resolved Hide resolved
@KerfuffleV2 KerfuffleV2 force-pushed the feat-gguf-py-read-refactor branch from 0475e44 to 8047aa1 Compare November 7, 2023 22:02
gguf-py/gguf/gguf.py Outdated Show resolved Hide resolved
gguf-py/gguf/constants.py Outdated Show resolved Hide resolved
@KerfuffleV2 KerfuffleV2 force-pushed the feat-gguf-py-read-refactor branch from 4ab9105 to d7688dc Compare November 8, 2023 00:34
cebtenzzre and others added 3 commits November 7, 2023 21:12
Move examples to an examples/ directory

Clean up examples

Add an example of modifying keys in a GGUF file

Update documentation with info on examples

Try to support people importing gguf/gguf.py directly
@KerfuffleV2 KerfuffleV2 changed the title gguf-py: Refactor and add file reading support gguf-py: Refactor and allow reading/modifying existing GGUF files Nov 8, 2023
@Galunid
Copy link
Collaborator

Galunid commented Nov 8, 2023

Hey, we are going to have a conflict once #3838 gets merged. You'll need to revert your changes for convert-*-hf-to-gguf.py scripts, since they'll be deleted. Instead you need to add that change to convert-hf-to-gguf.py to resolve it ;)

Thanks for GGUFReader! It was definitely needed #3838 (comment)

@KerfuffleV2
Copy link
Collaborator Author

@Galunid

Hey, we are going to have a conflict once #3838 gets merged.

Not a problem. Thanks for the heads up! If you wanted to, you could just make that change in #3838, it isn't specific to the other GGUF changes. Just something weird I noticed when I was looking at those scripts. Whichever way you prefer is fine.

@KerfuffleV2
Copy link
Collaborator Author

@chenqiny Hi, not sure if you're interested but this pull is supposed to add the ability to read/modify GGUF files and transparently support working with GGUF files created on a different endian than the machine where the script is running. So you should be able to use this to open and change a little endian GGUF file on your big endian machine and vice versa.

I don't really have a good way to test that though since all my machines are LE and I'm not sure where to even find a BE GGUF model.

gguf-py/examples/writer.py Outdated Show resolved Hide resolved
gguf-py/gguf/gguf_reader.py Outdated Show resolved Hide resolved
Comment on lines 8 to 10
from gguf import GGUFReader, GGUFValueType # noqa: E402

def dump_gguf(filename: str) -> None:
Copy link
Collaborator

@cebtenzzre cebtenzzre Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be two blank lines before and after a top-level function. Same with the other two examples.

Also, the examples should be marked executable - otherwise, the shebang lines don't do anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the suggestions (and especially the actual bugs you caught). I really appreciate the time you've spent helping improve this pull!

What are you using for formatting and would you be able to share your configuration? I'd be perfectly happy to turn on Python auto formatting if there's a standard for the Python code in this repo to follow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linters I'm using are:

  • mypy for type checking (there is a mypy.ini in the repo and all new python scripts should pass mypy)
  • isort for import sorting (for this repo, basically isort **/*.py -l 120 --tc -m VERTICAL_HANGING_INDENT)
  • and flake8 for PEP 8 style checking.

My flake8 configuration is messy, but I've done pip install wemake-python-styleguide and then turned off everything I don't care about. This ridiculous command should reproduce the way I'm using flake8 for llama.cpp (most of this is hidden behind a shell alias):

flake8 **/*.py --max-line-length=120 --ignore=D,DAR,I,S,A003,E121,E123,E126,E127,E201,E202,E203,E211,E221,E222,E226,E241,E251,E261,E266,E272,E306,E402,E704,E731,E741,E800,F403,F811,N400,N801,N803,N806,N812,N813,P101,P103,P205,Q000,T001,U101,W503,W504,WPS102,WPS110,WPS111,WPS113,WPS114,WPS115,WPS117,WPS120,WPS122,WPS125,WPS201,WPS202,WPS203,WPS204,WPS210,WPS211,WPS212,WPS213,WPS214,WPS218,WPS220,WPS221,WPS222,WPS223,WPS224,WPS225,WPS226,WPS229,WPS230,WPS231,WPS232,WPS234,WPS235,WPS236,WPS237,WPS238,WPS300,WPS301,WPS302,WPS304,WPS305,WPS306,WPS316,WPS317,WPS318,WPS319,WPS320,WPS322,WPS323,WPS326,WPS331,WPS332,WPS336,WPS337,WPS347,WPS348,WPS352,WPS360,WPS361,WPS362,WPS400,WPS405,WPS407,WPS412,WPS414,WPS420,WPS421,WPS422,WPS427,WPS428,WPS429,WPS430,WPS431,WPS432,WPS433,WPS434,WPS435,WPS436,WPS437,WPS440,WPS441,WPS442,WPS450,WPS457,WPS458,WPS459,WPS460,WPS463,WPS464,WPS501,WPS504,WPS508,WPS509,WPS510,WPS513,WPS518,WPS526,WPS602,WPS604,WPS605,WPS606,WPS608,WPS609,WPS611,WPS613

There is a lot of subjectivity with flake8, even that command leaves some checks enabled that don't really matter IMO. And normally I leave E251 enabled, but the style in this repo seems to use spaces around '=' in keyword arguments.

gguf-py/examples/modify_gguf.py Outdated Show resolved Hide resolved
gguf-py/examples/modify_gguf.py Outdated Show resolved Hide resolved
@chenqiny
Copy link
Contributor

chenqiny commented Nov 9, 2023

@chenqiny Hi, not sure if you're interested but this pull is supposed to add the ability to read/modify GGUF files and transparently support working with GGUF files created on a different endian than the machine where the script is running. So you should be able to use this to open and change a little endian GGUF file on your big endian machine and vice versa.

I don't really have a good way to test that though since all my machines are LE and I'm not sure where to even find a BE GGUF model.

Sure. I will test it in weekend.

@Galunid
Copy link
Collaborator

Galunid commented Nov 9, 2023

If you wanted to, you could just make that change in #3838, it isn't specific to the other GGUF changes. Just something weird I noticed when I was looking at those scripts. Whichever way you prefer is fine.

Sure, I included the path fix in #3838

KerfuffleV2 and others added 2 commits November 9, 2023 00:21
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
gguf-py/examples/modify_gguf.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@monatis monatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm --thanks for taking this! I think it's good to merge.

@KerfuffleV2
Copy link
Collaborator Author

thanks for taking this! I think it's good to merge.

Thanks for taking the time to review it, and also many thanks to cebtenzzre who found a bunch of bugs and cleaned up a lot of stuff! I'll go ahead and merge it later today unless anyone finds problems in the mean time.

@KerfuffleV2
Copy link
Collaborator Author

KerfuffleV2 commented Nov 10, 2023

Actually, I found one more thing I'm going to change (after testing that it doesn't break anything). convert.py's Q8_0 quantization code has:

quantized_dtype = np.dtype([('d', '<f2'), ('qs', 'i1', (32,))])

Forcing the value to little endian. I'm pretty sure that means using convert.py to Q8_0 quantize files is going to be broken on big endian machines. Not sure why I did that, but there's no reason to not let it be the native BO.

… ids

gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json

gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata
u
@KerfuffleV2
Copy link
Collaborator Author

@cebtenzzre @monatis

I made a few extra changes since you approved so I want to double check. I was looking at the code for big endian and Q8_0 and realized it can't really work unless the byteswapping happens before the quantization part. However, currently that's in gguf and occurs too late. Q8_0 would have been broken on big endian systems: I just made it so it doesn't even appear as an option in convert.py

There are also some changes to ggufs SpecialVocab class. I was trying to convert a Yi model and it didn't find the special token ids at all. I realized this was because it would only look at config.json for bos_token_id: 1, etc if tokenizer.json or tokenizer_config.json did not exist. However, those may only be defined in config.json. So I just changed it to check both places. If it found an id already, it won't overwrite that from config.json so this change should only have an effect when those special token ids wouldn't have been found previously. I'd call this a bugfix.

I also made it possible to load merges from merges.txt. This only occurs if merges are requested (BPE vocab type) and they weren't found in tokenizer.json. So this is only a fallback as well.

The last thing I changed is possibly more controversial. I changed SpecialVocab to read the add_bos_token, etc booleans in config.json and add them to the GGUF file metadata like tokenizer.ggml.add_bos_token (a boolean). This is because some models don't want you to add a BOS or whatever token, however we currently just always add BOS for SPM models. Right now, that metadata field is just informational. Actually handling it on the C++ side would have to be a different pull.

I also didn't add it to the official GGUF keys constants. (But I can, if people think it's a good idea.) I think we should support at least including that information in the metadata. Models that don't want something like a BOS token prepended to their prompt can have severely degraded quality when you do it anyway.

gguf-py/gguf/vocab.py Outdated Show resolved Hide resolved
@KerfuffleV2
Copy link
Collaborator Author

@monatis This is just waiting for your review, so if you don't have any issues with the latest changes once you see this you can go ahead and just merge it.

@monatis monatis merged commit 34b0a08 into ggerganov:master Nov 11, 2023
6 checks passed
@monatis
Copy link
Collaborator

monatis commented Nov 11, 2023

Thanks again @KerfuffleV2 for the good job in this PR and taking into account every review patiently.

@chenqiny
Copy link
Contributor

@KerfuffleV2 I got following error.

chenqiny@datalake:/database/llama.cpp/gguf-py$ /home/chenqiny/.local/bin/gguf-convert-endian /database/models--meta-llama--Llama-2-7b/snapshots/365ffa8f1a6c455d3e2028ae658236b4b85ba824/ggml-model-f16-little.gguf big

  • Loading: /database/models--meta-llama--Llama-2-7b/snapshots/365ffa8f1a6c455d3e2028ae658236b4b85ba824/ggml-model-f16-little.gguf
  • Host is LITTLE endian, GGUF file seems to be LITTLE endian
    Traceback (most recent call last):
    File "/home/chenqiny/.local/bin/gguf-convert-endian", line 8, in
    sys.exit(gguf_convert_endian_entrypoint())
    File "/home/chenqiny/.local/lib/python3.8/site-packages/scripts/gguf-convert-endian.py", line 109, in main
    convert_byteorder(reader, args)
    File "/home/chenqiny/.local/lib/python3.8/site-packages/scripts/gguf-convert-endian.py", line 34, in convert_byteorder
    if file_endian == order:
    UnboundLocalError: local variable 'order' referenced before assignment

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
…erganov#3981)

* gguf-py: Refactor and add file reading support

* Replay changes from ggerganov#3871

Credit to @cebtenzzre for that pull

* Various type annotation fixes.

* sort imports with isort (again)

* Fix missing return statement in add_tensor

* style cleanup with flake8

* fix NamedTuple and Enum usage

* Fix an issue with state init in GGUFReader

Move examples to an examples/ directory

Clean up examples

Add an example of modifying keys in a GGUF file

Update documentation with info on examples

Try to support people importing gguf/gguf.py directly

* Damagage is not a word.

* Clean up gguf-py/examples/modify_gguf.py whitespace

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Update gguf-py/examples/modify_gguf.py formatting

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Update gguf-py/gguf/gguf_reader.py type hint

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Make examples executable, formatting changes

* Add more information to GGUFReader and examples comments

* Include a gguf Python package version bump

* Add convert-gguf-endian.py script

* cleanup

* gguf-py : bump minor version

* Reorganize scripts

* Make GGUFReader endian detection less arbitrary

* Add JSON dumping support to gguf-dump.py

Which I kind of regret now

* A few for gguf-dump.py cleanups

* Murder accidental tuple in gguf-py/scripts/gguf-dump.py

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* cleanup

* constants : remove unneeded type annotations

* fix python 3.8 compat

* Set up gguf- scripts in pyproject.toml

* And include scripts/__init__.py, derp

* convert.py: We can't currently support Q8_0 on big endian.

* gguf-py: SpecialVocab: Always try available sources for special token ids

gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json

gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata
u

* cleanup

* Promote add_X_token to GGUF metadata for BOS and EOS

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request script Script related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants