Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Adept Persimmon 8b #3410

Conversation

phillip-kravtsov
Copy link
Contributor

@phillip-kravtsov phillip-kravtsov commented Sep 29, 2023

  • Adds Persimmon 8B which is, architecturally, a standard dense transformer with:
    • Q/K layernorm
    • Squared ReLU activations
    • partial RoPE
    • very large vocab size (most unused for text)

To support Partial RoPE & Squared ReLU, this PR adds concat & square kernels for metal.
I've confirmed agreement between the GGML & HF implementation up to tensor values in the last layer.

@ggerganov ggerganov added high priority Very important issue model Model specific labels Sep 30, 2023
ggml-metal.m Outdated Show resolved Hide resolved
ggml-metal.m Outdated Show resolved Hide resolved
ggml-metal.m Show resolved Hide resolved
ggml-metal.m Outdated Show resolved Hide resolved
gguf-py/gguf/gguf.py Outdated Show resolved Hide resolved
gguf-py/gguf/gguf.py Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Show resolved Hide resolved
@ggerganov
Copy link
Owner

Let's resolve the CI fails and merge

gguf-py/gguf/gguf.py Outdated Show resolved Hide resolved
…kravtsov/support-adept-persimmon-8b. ggml-ci
@phillip-kravtsov phillip-kravtsov force-pushed the phillip-kravtsov/support-adept-persimmon-8b branch from 92acb44 to 5d259d3 Compare October 5, 2023 18:04
llama.cpp Outdated Show resolved Hide resolved
@ggerganov ggerganov merged commit 0e797c2 into ggerganov:master Oct 7, 2023
34 checks passed
@slaren
Copy link
Collaborator

slaren commented Oct 7, 2023

The switches in llm_load_hparams and llama_build_graph are missing breaks, so it should be using the refact graph. Does this work currently?

@ggerganov
Copy link
Owner

@phillip-kravtsov PTAL at @slaren's comment and fix as necessary

@KerfuffleV2
Copy link
Collaborator

I got tired of seeing the compiler warning and created #3535 (not sure if there are any other issues, haven't had a chance to test it yet).

@phillip-kravtsov
Copy link
Contributor Author

Thanks for the fix @KerfuffleV2 -- that PR should be sufficient.

joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 12, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp:
  py : change version of numpy requirement to 1.24.4 (ggerganov#3515)
  quantize : fail fast on write errors (ggerganov#3521)
  metal : support default.metallib load & reuse code for swift package (ggerganov#3522)
  llm : support Adept Persimmon 8B (ggerganov#3410)
  Fix for ggerganov#3454 (ggerganov#3455)
  readme : update models, cuda + ppl instructions (ggerganov#3510)
  server : docs fix default values and add n_probs (ggerganov#3506)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Very important issue model Model specific
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants