-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add support for xverse #6301
Conversation
2. Add LLM_ARCH_XVERSE inference in llama.cpp; 3. Add xverse item in Supported models in README.md;
Can you elaborate, why is this necessary specifically for this model and not others? |
convert-hf-to-gguf.py
Outdated
@@ -212,6 +212,7 @@ def from_model_architecture(cls, arch): | |||
try: | |||
return cls._model_classes[arch] | |||
except KeyError: | |||
print(f"{cls._model_classes}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to leave this in. Unless you think the exception message should mention the full list of supported models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to leave this in. Unless you think the exception message should mention the full list of supported models.
I will remove the redundant logs.
* llama: remove the init_mapping_prefetch custom parameter
Nice question! I just turned on the prefetch switch and retested the xverse model loading process locally, and found that there was no problem. The prefetch settings have been restored in the new submission. |
…sed outputs of the last layers.
Hello, @cebtenzzre @compilade I've addressed all the feedback in the pull request. The EditorConfig Checker and flake8 Lint failed, how can i fix it? Could you please take another look at the code changes and provide your review? Thank you for your time and assistance. Best regards, hxer7963. |
You have to fix the formatting issues. Click on "Details" and it will tell you the reason.
|
- Remove duplicate set kqv_out to llm_build_kv
The macOS runner is occasionally started without access to GPU, so this is not related to this PR. The Benchmark CI was recently added so we can expect some instability at the start - nothing to worry about |
Thank you very much for ggerganov's patient reply May I ask how long it usually takes to merge to the main branch? |
We can merge after @slaren's approval |
* Support xverse model convert to gguf format. * 1. Convert xverse models to gguf; 2. Add LLM_ARCH_XVERSE inference in llama.cpp; 3. Add xverse item in Supported models in README.md; * * gguf-py: remove redundant logs * llama: remove the init_mapping_prefetch custom parameter * llama.cpp: Include the changes from ggerganov#6122 to exclude the unused outputs of the last layers. * - Fix format issues - Remove duplicate set kqv_out to llm_build_kv * Update llama.cpp --------- Co-authored-by: willhe <willhe@xverse.cn> Co-authored-by: willhe <hexin@xverse.cn>
* Support xverse model convert to gguf format. * 1. Convert xverse models to gguf; 2. Add LLM_ARCH_XVERSE inference in llama.cpp; 3. Add xverse item in Supported models in README.md; * * gguf-py: remove redundant logs * llama: remove the init_mapping_prefetch custom parameter * llama.cpp: Include the changes from ggerganov#6122 to exclude the unused outputs of the last layers. * - Fix format issues - Remove duplicate set kqv_out to llm_build_kv * Update llama.cpp --------- Co-authored-by: willhe <willhe@xverse.cn> Co-authored-by: willhe <hexin@xverse.cn>
* Support xverse model convert to gguf format. * 1. Convert xverse models to gguf; 2. Add LLM_ARCH_XVERSE inference in llama.cpp; 3. Add xverse item in Supported models in README.md; * * gguf-py: remove redundant logs * llama: remove the init_mapping_prefetch custom parameter * llama.cpp: Include the changes from ggerganov#6122 to exclude the unused outputs of the last layers. * - Fix format issues - Remove duplicate set kqv_out to llm_build_kv * Update llama.cpp --------- Co-authored-by: willhe <willhe@xverse.cn> Co-authored-by: willhe <hexin@xverse.cn>
[Model] Add support for xverse
llama.cpp: Add support for Xverse model architecture
gguf-py : Add convert for Xverse model architecture