[Feature request] Implement GPT-JT #6

pablogranolabar · 2022-12-01T10:13:12Z

e.g. https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai

ggerganov · 2022-12-01T17:14:24Z

Well this looks like the same model as GPT-J, just different weights. You should already be able to run it - just convert it to ggml format and use the gpt-j example

pablogranolabar · 2022-12-04T01:38:42Z

Ok, converted these weights from GPT-JT and it generated the model file, however I'm getting the following error when loading:

gptj_model_load: f16     = 1
gptj_model_load: ggml ctx size = 13334.86 MB
gptj_model_load: memory_size =  1792.00 MB, n_mem = 57344
gptj_model_load: unknown tensor '       W*ƍyC$B' in model file
main: failed to load model from './ggml-model.bin'

any ideas?

ggerganov · 2022-12-04T08:20:46Z

The convert script assumes that the original weights are FP32 and converts to FP16 when necessary.
However, in the new GPT-JT, the weights are in FP16 by default, so the script has to be adjusted.

Try changing the following:

ggml/examples/gpt-j/convert-h5-to-ggml.py

Lines 118 to 121 in 90ee5c6

    
           if name[-7:] == ".weight" and n_dims == 2: 
        
               print("  Converting to float16") 
        
               data = data.astype(np.float16) 
        
               ftype = 1

to:

    # ftype == 0 -> float32, ftype == 1 -> float16
    ftype = 0;
    if use_f16:
        if name[-7:] == ".weight" and n_dims == 2:
            print("  Converting to float16")
            data = data.astype(np.float16)
            ftype = 1
        else:
            print("  Converting to float32")
            data = data.astype(np.float32)
            ftype = 0

ggerganov · 2022-12-04T16:34:46Z

Just tested and it works: ed09c71

Also fixed unicode support for the GPT-2 and GPT-J models in general

trholding · 2022-12-09T03:31:54Z

@pablogranolabar Is there a noticeable difference in quality of output of GPT-JT compared to GPT-J?

pablogranolabar · 2022-12-12T04:40:20Z

Yes and no, it's getting a lot of conflicting reviews because GPT-JT is fine tuned for task oriented stuff like chain of thought reasoning. So for canned general tasks like causal LM it's potentially worse in whatever you would consider precision and accuracy, but with quality prompt engineering all of these additional tasks can be teased out during inference. So, the inevitable "it depends" is applicable there, depending on target architecture, model handler customization, and inference hyperparameters + prompt injection and optimization during inference.

trholding · 2022-12-12T05:33:51Z

So for canned general tasks like causal LM it's potentially worse in whatever you would consider precision and accuracy, but with quality prompt engineering all of these additional tasks can be teased out during inference.

Would be awesome if you could share some sample outputs.

If there is a way to share large models, I'd be willing to convert it to ggml and share. Maybe IPFS or Torrent, have to figure out. I have bandwidth caps on server.

trholding · 2022-12-12T10:06:34Z

@pablogranolabar Thanks for sharing the great idea about using GPT-JT

@ggerganov Thanks for the fix

I uploaded the model to huggingface so that its easy for people to get hold of the gpt-jt ggml model variant without eating into your hosting bills:

https://huggingface.co/trholding/GPT-JT-6B-v1-ggml

cd models
mkdir gpt-jt-6B ; cd gpt-jt-6B
wget https://huggingface.co/trholding/GPT-JT-6B-v1-ggml/resolve/main/ggml-model.bin
cd ../..

# Run the GPT-JT 6B v1 model (requires 12GB disk space and 16GB CPU RAM)
./bin/gpt-j -m models/gpt-jt-6B/ggml-model.bin -p "This is an example"

pablogranolabar · 2022-12-13T05:26:58Z

probably best suited for a new issue, but @ggerganov what do you think about adding 8-bit inference? this would further cut model memory consumption by 50% and with nominal loss of precision. this is a supported option now for transformers with bitsandbytes via Accelerate.

ggerganov · 2022-12-13T07:51:18Z

@pablogranolabar
Hm, I'm probably missing something - the referenced repo is a CUDA wrapper.
I cannot find any information about Apple Accelerate supporting 8-bit precision.
Can you provide any reference?

pablogranolabar · 2022-12-13T07:53:45Z

Yeah for example: huggingface/transformers#17901

ggerganov · 2022-12-13T08:03:14Z

Again, I might be missing something, but it seems this refers to huggingface/accelerate framework which is all CUDA and does not apply to Apple Accelerate.

Unless there is a way to use Apple framework with direct 8-bit precision support, I think 8-bit support will be very low priority for ggml. It means I will have to implement the quantization from scratch with NEON and I'm not really sure how to do this atm. And even if I achieve it, it will very likely be less performant compared to the existing mixed FP16/FP32 + Accelerate because we will lose the AMX coprocessor benefits that we currently have.

pablogranolabar · 2022-12-13T08:06:06Z

Ah sorry I was referring to the Accelerate framework used with PyTorch. Here's a decent writeup of their 8-bit quantization methods: https://huggingface.co/blog/hf-bitsandbytes-integration

regstuff · 2023-02-28T13:14:31Z

@trholding - your model link gives a 404. Is the GPT-6JT ggml still available anywhere?

ggerganov added the enhancement New feature or request label Dec 1, 2022

ggerganov mentioned this issue Feb 26, 2023

4-bit Integer quantisation #27

Merged

8 tasks

katsu560 mentioned this issue Mar 18, 2023

add OpenBLAS detection and modify tests codes #40

Merged

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023

Include Python dependencies in README (ggerganov#6)

5f2f970

PABannier added a commit to PABannier/ggml that referenced this issue Oct 20, 2024

sync: sync to latest GGML api (ggerganov#6)

6157db7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Implement GPT-JT #6

[Feature request] Implement GPT-JT #6

pablogranolabar commented Dec 1, 2022

ggerganov commented Dec 1, 2022

pablogranolabar commented Dec 4, 2022

ggerganov commented Dec 4, 2022

ggerganov commented Dec 4, 2022

trholding commented Dec 9, 2022

pablogranolabar commented Dec 12, 2022 •

edited

Loading

trholding commented Dec 12, 2022

trholding commented Dec 12, 2022

pablogranolabar commented Dec 13, 2022 •

edited

Loading

ggerganov commented Dec 13, 2022

pablogranolabar commented Dec 13, 2022

ggerganov commented Dec 13, 2022

pablogranolabar commented Dec 13, 2022

regstuff commented Feb 28, 2023 •

edited

Loading

[Feature request] Implement GPT-JT #6

[Feature request] Implement GPT-JT #6

Comments

pablogranolabar commented Dec 1, 2022

ggerganov commented Dec 1, 2022

pablogranolabar commented Dec 4, 2022

ggerganov commented Dec 4, 2022

ggerganov commented Dec 4, 2022

trholding commented Dec 9, 2022

pablogranolabar commented Dec 12, 2022 • edited Loading

trholding commented Dec 12, 2022

trholding commented Dec 12, 2022

pablogranolabar commented Dec 13, 2022 • edited Loading

ggerganov commented Dec 13, 2022

pablogranolabar commented Dec 13, 2022

ggerganov commented Dec 13, 2022

pablogranolabar commented Dec 13, 2022

regstuff commented Feb 28, 2023 • edited Loading

pablogranolabar commented Dec 12, 2022 •

edited

Loading

pablogranolabar commented Dec 13, 2022 •

edited

Loading

regstuff commented Feb 28, 2023 •

edited

Loading