Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steps to support the Dolly model #1308

Closed
wants to merge 3 commits into from

Conversation

devkral
Copy link

@devkral devkral commented May 3, 2023

What

  • The current glob for model files is very restricted, I relaxed it a little so it could find the dolly model.
  • support for ByteStorage was added. As far as I can see it is uint8.
  • new: PretrainedVocab is added

Why

I want to use the dolly model with llama.cpp. They use some ByteStorage stuff of torch

Remaining issues

The vocab file is in a completely different format to sentencepiece (it uses a pretrained tokenizer):

Somehow it must get converted or another Vocab must be added to convert.py

  • dolly uses a gpt_neox format which is different from what llama.cpp understands. Needs conversion

yield from self.added_tokens()

def __repr__(self) -> str:
return f"<SentencePieceVocab with {self.vocab_size_base} base tokens and {len(self.added_tokens_list)} added tokens>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"<SentencePieceVocab with {self.vocab_size_base} base tokens and {len(self.added_tokens_list)} added tokens>"
return f"<PretrainedVocab with {self.vocab_size_base} base tokens and {len(self.added_tokens_list)} added tokens>"

@ggerganov
Copy link
Member

For a GPT-NeoX implementation using ggml see the StableLM example.
It will require some extra work to integrate this in llama.cpp.

But before doing that, we need to add a ggml example for Dolly and make sure that it works correctly.

@ggerganov ggerganov closed this May 4, 2023
@devkral
Copy link
Author

devkral commented May 5, 2023

nice, sry for ghost posting but why llama.cpp exists if you have the ggml repo?

I am a beginner in terms of AI.

@mverrilli
Copy link

Hi @ggerganov

But before doing that, we need to add a ggml example for Dolly and make sure that it works correctly.

I created this one. I can PR it if it is ok.

https://github.com/mverrilli/ggml/tree/dolly-v2/examples/dolly-v2

@ggerganov
Copy link
Member

@devkral

Yes, please open a PR.
In your experience, do the ggml results look OK when you compare to the reference Python implementation?
I'll probably do some more rigorous testing later, but would like to get an additional opinion.

@mverrilli
Copy link

@ggerganov ggml-org/ggml#132

It is pretty comparable. The Q5_0 is significantly faster for the larger model. I was not getting good results until I added the special token handling since it would split the special token into two. I posted some sample runs in the README.

@j-f1
Copy link
Collaborator

j-f1 commented May 5, 2023

nice, sry for ghost posting but why llama.cpp exists if you have the ggml repo?

I am a beginner in terms of AI.

GGML is a general purpose matrix API that doesn’t include support for running specific models directly (I believe). This repo exists to use GGML to implement the specific structures of LLaMA.

@devkral
Copy link
Author

devkral commented May 11, 2023

The ggml dolly example results look good.

@xingchensong
Copy link
Contributor

Hi teams, any update?

@devkral
Copy link
Author

devkral commented May 25, 2023

wrong repository

@xingchensong
Copy link
Contributor

wrong repository

is ggml the right one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants