Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add support for xverse #6301

Merged
merged 12 commits into from
Mar 29, 2024
Merged

[Model] Add support for xverse #6301

merged 12 commits into from
Mar 29, 2024

Conversation

hxer7963
Copy link
Contributor

[Model] Add support for xverse

llama.cpp: Add support for Xverse model architecture

  • init_mappings: Turn off prefetching when load xveres model to avoid struck when call mmap on linux os.

gguf-py : Add convert for Xverse model architecture

@slaren
Copy link
Collaborator

slaren commented Mar 25, 2024

  • init_mappings: Turn off prefetching when load xveres model to avoid struck when call mmap on linux os.

Can you elaborate, why is this necessary specifically for this model and not others?

@@ -212,6 +212,7 @@ def from_model_architecture(cls, arch):
try:
return cls._model_classes[arch]
except KeyError:
print(f"{cls._model_classes}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to leave this in. Unless you think the exception message should mention the full list of supported models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to leave this in. Unless you think the exception message should mention the full list of supported models.

I will remove the redundant logs.

root and others added 2 commits March 26, 2024 08:32
* llama: remove the init_mapping_prefetch custom parameter
@hxer7963
Copy link
Contributor Author

  • init_mappings: Turn off prefetching when load xveres model to avoid struck when call mmap on linux os.

Can you elaborate, why is this necessary specifically for this model and not others?

Nice question!

I just turned on the prefetch switch and retested the xverse model loading process locally, and found that there was no problem.
During previous testing, it was possible that the model was too large (65B model) and the loading process was slow, mistakenly thinking that the program was stuck.

The prefetch settings have been restored in the new submission.

llama.cpp Show resolved Hide resolved
@hxer7963
Copy link
Contributor Author

Hello, @cebtenzzre @compilade

I've addressed all the feedback in the pull request.

The EditorConfig Checker and flake8 Lint failed, how can i fix it?

Could you please take another look at the code changes and provide your review?

Thank you for your time and assistance.

Best regards,

hxer7963.

@slaren
Copy link
Collaborator

slaren commented Mar 27, 2024

The EditorConfig Checker and flake8 Lint failed, how can i fix it?

You have to fix the formatting issues. Click on "Details" and it will tell you the reason.

 convert-hf-to-gguf.py:
	885: Trailing whitespace
	921: Trailing whitespace
./convert-hf-to-gguf.py:781:1: E302 expected 2 blank lines, found 1
./convert-hf-to-gguf.py:885:1: W293 blank line contains whitespace
./convert-hf-to-gguf.py:921:1: W293 blank line contains whitespace
./convert-hf-to-gguf.py:922:1: E302 expected 2 blank lines, found 1

llama.cpp Outdated Show resolved Hide resolved
@hxer7963 hxer7963 requested a review from slaren March 28, 2024 02:46
@ggerganov
Copy link
Owner

The macOS runner is occasionally started without access to GPU, so this is not related to this PR. The Benchmark CI was recently added so we can expect some instability at the start - nothing to worry about

@hxer7963
Copy link
Contributor Author

The macOS runner is occasionally started without access to GPU, so this is not related to this PR. The Benchmark CI was recently added so we can expect some instability at the start - nothing to worry about

Thank you very much for ggerganov's patient reply

May I ask how long it usually takes to merge to the main branch?

@ggerganov
Copy link
Owner

We can merge after @slaren's approval

llama.cpp Outdated Show resolved Hide resolved
@slaren slaren merged commit 0695747 into ggerganov:master Mar 29, 2024
51 of 57 checks passed
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Support xverse model convert to gguf format.

* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;

* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter

* llama.cpp: Include the changes from ggerganov#6122 to exclude the unused outputs of the last layers.

* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv

* Update llama.cpp

---------

Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* Support xverse model convert to gguf format.

* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;

* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter

* llama.cpp: Include the changes from ggerganov#6122 to exclude the unused outputs of the last layers.

* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv

* Update llama.cpp

---------

Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* Support xverse model convert to gguf format.

* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;

* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter

* llama.cpp: Include the changes from ggerganov#6122 to exclude the unused outputs of the last layers.

* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv

* Update llama.cpp

---------

Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants