-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert : update Falcon script for new HF config #3448
Conversation
is this a continuation of #3049 ? :) |
Yea, but your changes are a subset of the actual changes done for 180B
2c2
< # HF falcon--> gguf conversion
---
> # HF falcon180B--> gguf conversion
16a17
> from safetensors import safe_open
48c49
< if filename.startswith("pytorch_model-"):
---
> if filename.startswith("model-00"):
90c91
< if hparams["architectures"][0] != "RWForCausalLM":
---
> if hparams["architectures"][0] != "FalconForCausalLM":
103c104
< block_count = hparams["n_layer"]
---
> block_count = hparams["num_hidden_layers"]
111,113c112,114
< gguf_writer.add_head_count(hparams["n_head"])
< if "n_head_kv" in hparams:
< gguf_writer.add_head_count_kv(hparams["n_head_kv"])
---
> gguf_writer.add_head_count(hparams["num_attention_heads"])
> if "num_kv_heads" in hparams:
> gguf_writer.add_head_count_kv(hparams["num_kv_heads"])
181,182c182,183
< n_head = hparams["n_head"]
< n_head_kv = hparams["n_head_kv"] if "n_head_kv" in hparams else 1
---
> n_head = hparams["num_attention_heads"]
> n_head_kv = hparams["num_kv_heads"] if "num_kv_heads" in hparams else 1
193c194
< f"pytorch_model-{n:05}-of-{num_parts:05}.bin" for n in range(1, num_parts + 1)
---
> f"model-{n:05}-of-{num_parts:05}.safetensors" for n in range(1, num_parts + 1)
200c201
< model_part = torch.load(dir_model / part_name, map_location="cpu")
---
> with safe_open(dir_model / part_name, framework="pt", device="cpu") as model_part:
203c204
< data = model_part[name]
---
> data = model_part.get_tensor(name) (i compared an older version of convert-falcon, bc there have been some small changes on master since.) |
Okay, so what do you suggest I do in order to get Falcon <180B working on master again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need #3049 ?
The script in this PR does not know how to read safetensors files, so I think we still need that PR. |
By no means, why not merge #3049 first, abandon it then, finally merge this. No one will lose the contribution records, and the falcon inference keep working. |
31c88e3
to
94dd85c
Compare
Could someone test this script with Falcon-180B? |
Testing atm - will take some time. |
It works with this repo: https://huggingface.co/tiiuae/falcon-180B/tree/main Which is strange, since this looks like safetensors data, which we don't expect to be able to read with this script. |
I recently added safetensors support to this PR so it can support 180B, which is why I asked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont have the means to test the 180b, but looks like you merged it nicely.
…example * 'master' of github.com:ggerganov/llama.cpp: kv cache slot search improvements (ggerganov#3493) prompts : fix editorconfig checks after ggerganov#3416 parallel : add option to load external prompt file (ggerganov#3416) server : reuse llama_sample_token common util (ggerganov#3494) llama : correct hparams comparison (ggerganov#3446) ci : fix xcodebuild destinations (ggerganov#3491) convert : update Falcon script for new HF config (ggerganov#3448) build : use std::make_tuple() for compatibility with older GCC versions (ggerganov#3488) common : process escape sequences in reverse prompts (ggerganov#3461) CLBlast: Fix handling of on-device tensor data server : fix incorrect num_tokens_predicted (ggerganov#3480) swift : disable ACCELERATE_NEW_LAPACK (ggerganov#3481) ci : add swift build via xcodebuild (ggerganov#3482)
Also adds Falcon-180B support. Closes ggerganov#3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
This change breaks the old architecture (RWForCausalLM). Kind of annoying because there was another change that requires reconverting the original HuggingFace model: |
The official falcon-7b, falcon-40b, and falcon-180B models on HF are compatible with the new version of this script. What is your use case? |
I have this model downloaded from Hugging Face: Is this the official version of Falcon? |
The latest commit is 898df13 "Move to in-library checkpoint (for real this time)". That's the one you want. I suppose I could add back support for the older format until the finetunes are updated. |
Oh, I see; they updated the config.json file but the actual multi-GB model files did not change. I think this might not be clear to some people, who might end up re-downloading the whole model again, which is not a good user experience. Also, some fine-tunes might not ever be updated. In my opinion, it would be preferable to support both formats, but I will let you decide how to prioritize this. Thank you for looking into this issue. |
This aligns the Falcon convert script with the recent changes to the HuggingFace models.