Clean up QK and file and tensor types #678

sw · 2023-04-01T12:09:10Z

(edit: the python changes will clash with #545)

This PR has several goals:

reduce the number of integer literals strewn across the code base
better ensure array and enum consistency
clear up possible confusion about file and tensor types
prepare the way for new quantization formats with QK != 32, as discussed in 2-bit integer quantization #456
deduplicate python definitions

For the python scripts, I introduce a new file ggml.py at the top level, which contains definitions of the file and tensor types equivalent to those in ggml.h. I have formatted that with black, in case #611 returns from the dead.

The changes to the python files on one hand and C/C++ on the other are technically independent, but the discussion will overlap, so I'm keeping this in one draft. I will split it later if it makes sense.

I have tested conversion from pth to ggml with identical outputs, but I have not tested the other conversion scripts.

Open questions:

should enum e_ftype be moved to llama.h (with sensible renaming)? This would allow us to eliminate the hard-coded 2,3 in the usage string of quantize.cpp, and maybe be useful elsewhere.
how is file type 4 used? ~~I could not find any other parts of the code that would use this.~~ I see now, it's for GPTQ models. Should I add FTYPE_GPTQ?.

llama.cpp/llama.cpp

Line 515 in 3525899

case 4: wtype = GGML_TYPE_Q4_1; vtype = GGML_TYPE_F16; break;
Is GGML_FILE ok or should it be LLAMA_FILE? ggml.c doesn't deal with that type.

Some more comments below, looking for your thoughts on this...

sw · 2023-04-01T12:12:57Z

llama.cpp

+    FTYPE_Q4_0 = 2,
+    FTYPE_Q4_1 = 3,
+};
+static const char * ftype_str[] = { "f32", "f16", "q4_0", "q4_1" };


Unfortunately, pedantic ISO C++ does not allow designated initializers.

sw · 2023-04-01T12:13:37Z

llama.cpp

@@ -100,7 +109,7 @@ struct llama_hparams {
    int32_t n_head  = 32;
    int32_t n_layer = 32;
    int32_t n_rot   = 64;
-    int32_t f16     = 1;
+    int32_t f16     = FTYPE_F16;


I'm not sure if this is the correct semantics - what's the intention behind ftype and f16?

f16 is badly named since the times I was only considering F16 and F32 and no other data types

sw · 2023-04-01T12:18:51Z

migrate-ggml-2023-03-30-pr613.py

@@ -222,7 +179,7 @@ def copy_tensors(fin, fout, part_id, n_parts):

        # ensure tensor data is aligned
        tensor_data_offset = fout.tell()
-        while tensor_data_offset % QK != 0:
+        while tensor_data_offset % 32 != 0:


llama_model_load and llama_model_quantize_internal have 32 hard-coded as well. What matters is not the quantized block size, but whether the processor can efficiently load and store from a multiple of this number, especially with SIMD instructions.

prusnak · 2023-04-01T14:03:23Z

I think it would be worthwhile to separate the more straightforward Python changes into a separate PR which can be reviewed and merged sooner than the more complex changes in C.

sw · 2023-04-01T14:12:09Z

I think it would be worthwhile to separate the more straightforward Python changes into a separate PR which can be reviewed and merged sooner than the more complex changes in C.

Agree, but I would like the Python definitions to be similar to C/C++

Python GGML_TYPE.F32 <-> C GGML_TYPE_F32
Python GGML_FILE.Q4_1 <-> C++ FTYPE_Q4_1 (this should match better)

So I'm looking for opinions, once we have general consensus on whether and how this should be done, I will split the PR.

j-f1 · 2023-04-01T15:08:28Z

convert-gpt4all-to-ggml.py

@@ -32,6 +33,7 @@ def write_header(f_out, header):

    if magic != 0x67676d6c:


Can you make the magic a constant too?

Good idea, though that's the old one before mmap. Someone ought to migrate the *-to-ggml.py scripts. (edit: #704 #545)

We need old magic constants anyway to detect older models.

ggerganov

These refactors and code maintenance changes are very helpful

enum e_ftype can be moved to llama.h and be called enum llama_ftype
I think GGML_FILE -> LLAMA_FTYPE

sw · 2023-04-02T11:04:51Z

enum e_ftype can be moved to llama.h and be called enum llama_ftype

I think GGML_FILE -> LLAMA_FTYPE

@ggerganov I'm glad you like the overall direction, I'll make a separate PR with this enum, then we might sync it with the Python code in #545 before that gets merged.

sw · 2023-04-15T16:29:36Z

Obsolete thanks to #709 #1001 #545

Clean up QK and file and tensor types

39f91e3

sw commented Apr 1, 2023

View reviewed changes

Fix Windows build by not using variable array sizes

d5349f8

j-f1 reviewed Apr 1, 2023

View reviewed changes

sw mentioned this pull request Apr 2, 2023

New conversion script #545

Merged

16 tasks

ggerganov reviewed Apr 2, 2023

View reviewed changes

sw mentioned this pull request Apr 2, 2023

Introduce enum llama_ftype #709

Merged

sw closed this Apr 15, 2023

sw deleted the qk-ftypes branch April 22, 2023 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up QK and file and tensor types #678

Clean up QK and file and tensor types #678

sw commented Apr 1, 2023 •

edited

Loading

sw Apr 1, 2023

sw Apr 1, 2023 •

edited

Loading

ggerganov Apr 2, 2023

sw Apr 1, 2023

prusnak commented Apr 1, 2023 •

edited

Loading

sw commented Apr 1, 2023

j-f1 Apr 1, 2023

sw Apr 1, 2023 •

edited

Loading

prusnak Apr 1, 2023

ggerganov left a comment

sw commented Apr 2, 2023

sw commented Apr 15, 2023

		@@ -32,6 +33,7 @@ def write_header(f_out, header):

		if magic != 0x67676d6c:

Clean up QK and file and tensor types #678

Clean up QK and file and tensor types #678

Conversation

sw commented Apr 1, 2023 • edited Loading

sw Apr 1, 2023

Choose a reason for hiding this comment

sw Apr 1, 2023 • edited Loading

Choose a reason for hiding this comment

ggerganov Apr 2, 2023

Choose a reason for hiding this comment

sw Apr 1, 2023

Choose a reason for hiding this comment

prusnak commented Apr 1, 2023 • edited Loading

sw commented Apr 1, 2023

j-f1 Apr 1, 2023

Choose a reason for hiding this comment

sw Apr 1, 2023 • edited Loading

Choose a reason for hiding this comment

prusnak Apr 1, 2023

Choose a reason for hiding this comment

ggerganov left a comment

Choose a reason for hiding this comment

sw commented Apr 2, 2023

sw commented Apr 15, 2023

sw commented Apr 1, 2023 •

edited

Loading

sw Apr 1, 2023 •

edited

Loading

prusnak commented Apr 1, 2023 •

edited

Loading

sw Apr 1, 2023 •

edited

Loading