-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add support for Aya-23 8B Model by Cohere #2603
Conversation
@@ -129,7 +129,7 @@ def gen_config( # pylint: disable=too-many-locals,too-many-arguments,too-many-b | |||
prefill_chunk_size=model_config.prefill_chunk_size, | |||
attention_sink_size=getattr(model_config, "attention_sink_size", -1), | |||
tensor_parallel_shards=model_config.tensor_parallel_shards, | |||
conv_template=conversation, | |||
conv_template=conversation, # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why we disable mypy for this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return mapping | ||
|
||
|
||
# def awq(model_config: CohereConfig, quantization: Quantization) -> ExternMapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove if we don't support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked that AWQ weights for Aya-23 are available: https://huggingface.co/alijawad07/aya-23-8B-AWQ-GEMM
In that case, we can support it, I will uncomment the awq part in cohere_loader.py
Does this PR support CohereForAI/c4ai-command-r-plus ? They're both |
Will look into the tokenizer issue during |
@GunjanDhanuka The tokenizer issue is solved in this PR: #2649. Please tell me if there are any other related problems! |
@GunjanDhanuka please rebase and cross check if things can run |
f50d58d
to
0a2fa68
Compare
Yes the tokenizer issue is now resolved, but there was a delay because of the discrepancy in outputs from Edit: The prompt seems to be aligned in both cases now, it was a misunderstanding of the blank template that mlc_llm chat passes in the first instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @GunjanDhanuka!
Hi, I tried to convert a new version of aya-expanse-8b using this command:
Is this bug specific to the new version of Aya? Model: WebLLM demo: |
@DenisSergeevitch The good news are that we were able to load the model using your online demo page - even though we only used a windows pc with 16GB of ram and an 8GB integrated (onboard) GPU which is the Intel UHD Graphics 630 - closing any other open window to allow the model to load. That is great news because we want to get it to work with minimal resources. The bad news is that the output was almost gibberish - meaning the model did respond in several languages, but the responses didn't make much sense. There were a lot of new lines and much repeats of the same few words. We aim to let people use this model from anywhere on earth with even low resources... Perhaps you can try converting the older model aya-23? link and help humanity? |
@MasterJH5574 hello, can you please help to guide how to debug the model re-compile in this case? |
This PR adds support for the Aya-23 8B model, whose weights and config can be found here: https://huggingface.co/CohereForAI/aya-23-8B/tree/main
Also fixed a typo where
LlamaForCausalLM
was written asLlamaForCasualLM
There was an issue with CUDA graph while compiling the model, so use
--opt "flashinfer=1;cublas_gemm=1;cudagraph=0"
to themlc_llm compile
command as suggested by @MasterJH5574 .Solves issue: mlc-ai/web-llm#483