Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the convert-unversioned-ggml-to-ggml.py script to support GPT4All ggml models #588

Closed
ggerganov opened this issue Mar 29, 2023 · 2 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue model Model specific

Comments

@ggerganov
Copy link
Member

ggerganov commented Mar 29, 2023

See: https://twitter.com/ggerganov/status/1640945226662420483

The gpt4all ggml model has an extra <pad> token (i.e. n_vocab = 32001).
Need to add it during the conversion. Should be an optional command line argument to the script to specify if the token should be added or not

@ggerganov ggerganov added help wanted Extra attention is needed good first issue Good for newcomers model Model specific high priority Very important issue labels Mar 29, 2023
@niansa
Copy link
Contributor

niansa commented Mar 29, 2023

Currently, it seems to not be able to load up for me:

Process 2053450 launched: '/home/nils/llama.cpp/main' (x86_64)
main: seed = 1680072922
llama_model_load: loading model from './gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 4273.35 MB
llama_model_load: mem required  = 6065.35 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from './gpt4all-lora-quantized.bin'
llama_model_load: terminate called after throwing an instance of 'std::__ios_failure'
  what():  basic_ios::clear: iostream error
Process 2053450 stopped
* thread #1, name = 'main', stop reason = signal SIGABRT
    frame #0: 0x00007ffff7ac4ce1 libc.so.6`raise + 321

@anzz1
Copy link
Contributor

anzz1 commented Mar 29, 2023

The same 32001 tokens with the extra pad token would apply to the alpaca models if the Stanford spec was followed to the tee. However the LoRa adaptations did not so they worked out of the box. The problem exists in the point-alpaca model too since it's "native" aka adheres fully to the Stanford original alpaca spec.

Currently if you add the missing token, llama.cpp crashes. I suspect that the current code needs the number of tokens and dimensions of the tensors to be a power of 256 like pretty much all the current models? I'm not sure about this though. Even if it doesn't need to be a a power of 256 , wouldn't it need to be at least divisible by 2 for the bytepair logic to work? This is just guesswork as I don't yet fully grasp how the calculations work.

I "fixed" the problem for palpaca-7B-ggml by simply removing the missing token from the vocabulary and clamping the tensor length to 32000, essentially cutting the token off completely, as seen in my hack job here #303 (comment) . It definitely doesn't seem the right approach to take, but at least with that specific model it worked fine and at least I can't perceive any issues with it.

However I do not fully understand the repercussions of doing that, so someone more knowledgeable on this could probably explain it better. My line of thinking was that if the pad token was actually used in any of the calculations, it would crash or produce garbled output and that if it worked it would be fine. Maybe the tokens are as their name implies, just padding and exists as fillers in the data on tensor "paths" which never get taken.

Disclaimer: Assume everything I just said is wrong, as like I said I do not have a complete enough understanding to confidently state anything on the matter. But maybe my guesswork can help someone achieve a 💡 moment and give a proper explanation to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue model Model specific
Projects
None yet
Development

No branches or pull requests

3 participants