-
-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kindly add DeepSeek family for training #1340
Comments
Hey, as said here #1171, it's llama-based, so you can use the llama configs :) |
Very cool, Thank you! |
It doesn't look so: https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json
If it's a Llama then it will say "architectures": ["LlamaForCausalLM"] right? |
Oh, I wasn’t aware of that model. I thought they were referencing models such as https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main which is llama based |
Thanks @ehartford for spotting it & reopening the issue. I hope Axolotl team will come up with the solution. |
@ajinkya123-robo , in the meantime, you can just use AutoCausalModelLM and AutoTokenizer with an existing config and point to your model (?). Unfortunately, in this case, sample packing isn't available yet |
@NanoCode012 I am not sure if it will work out but I will give it try over the weekend. |
Currently experimenting with training DeepSeek Coder and stumbled on this thread when I ran into:
Using the - base_model: NousResearch/Llama-2-7b-hf
- model_type: LlamaForCausalLM
- tokenizer_type: LlamaTokenizer
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
+ model_type: AutoCausalModelLM
+ tokenizer_type: AutoTokenizer seems to work, thanks! |
i did some experiments with deepseek-coder-v2, which works using: base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Base
model_type: AutoCausalModelLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
special_tokens:
pad_token: "<|EOT|>"
bos_token: "<|begin▁of▁sentence|>"
eos_token: "<|end▁of▁sentence|>" multipack support was added recently in #1712, although it wasn't working for me:
|
🔖 Feature description
Axolotl for DeepSeek model finetuning. Thanks
✔️ Solution
Sft, Lora and qLora will suffice. It will be great to have their models in Axolotl training platform.
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: