Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kindly add DeepSeek family for training #1340

Closed
5 tasks done
ajinkya123-robo opened this issue Feb 27, 2024 · 9 comments · Fixed by #1877
Closed
5 tasks done

Kindly add DeepSeek family for training #1340

ajinkya123-robo opened this issue Feb 27, 2024 · 9 comments · Fixed by #1877
Labels
enhancement New feature or request

Comments

@ajinkya123-robo
Copy link

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Axolotl for DeepSeek model finetuning. Thanks

✔️ Solution

Sft, Lora and qLora will suffice. It will be great to have their models in Axolotl training platform.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@ajinkya123-robo ajinkya123-robo added the enhancement New feature or request label Feb 27, 2024
@NanoCode012
Copy link
Collaborator

Hey, as said here #1171, it's llama-based, so you can use the llama configs :)

@ajinkya123-robo
Copy link
Author

Very cool, Thank you!

@ehartford
Copy link
Collaborator

ehartford commented Feb 27, 2024

It doesn't look so:

https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json

{
--
  | "architectures": [
  | "DeepseekForCausalLM"
  | ],
  | "attention_bias": false,
  | "attention_dropout": 0.0,
  | "auto_map": {
  | "AutoConfig": "configuration_deepseek.DeepseekConfig",
  | "AutoModel": "modeling_deepseek.DeepseekModel",
  | "AutoModelForCausalLM": "modeling_deepseek.DeepseekForCausalLM"
  | },
  | "bos_token_id": 100000,
  | "eos_token_id": 100001,

If it's a Llama then it will say "architectures": ["LlamaForCausalLM"] right?

@ehartford ehartford reopened this Feb 27, 2024
@NanoCode012
Copy link
Collaborator

Oh, I wasn’t aware of that model. I thought they were referencing models such as https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main which is llama based

@ajinkya123-robo
Copy link
Author

Thanks @ehartford for spotting it & reopening the issue. I hope Axolotl team will come up with the solution.

@NanoCode012
Copy link
Collaborator

NanoCode012 commented Feb 28, 2024

@ajinkya123-robo , in the meantime, you can just use AutoCausalModelLM and AutoTokenizer with an existing config and point to your model (?). Unfortunately, in this case, sample packing isn't available yet

@ajinkya123-robo
Copy link
Author

@NanoCode012 I am not sure if it will work out but I will give it try over the weekend.

@ZaneH
Copy link

ZaneH commented Mar 31, 2024

Currently experimenting with training DeepSeek Coder and stumbled on this thread when I ran into:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 209, in get_spm_processor
    tokenizer.Load(self.vocab_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Using the examples/llama-2/qlora.yml as a reference and changing the following:

- base_model: NousResearch/Llama-2-7b-hf
- model_type: LlamaForCausalLM
- tokenizer_type: LlamaTokenizer
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
+ model_type: AutoCausalModelLM
+ tokenizer_type: AutoTokenizer

seems to work, thanks!

@tmm1
Copy link
Collaborator

tmm1 commented Aug 21, 2024

i did some experiments with deepseek-coder-v2, which works using:

base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Base
model_type: AutoCausalModelLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

special_tokens:
  pad_token: "<|EOT|>"
  bos_token: "<|begin▁of▁sentence|>"
  eos_token: "<|end▁of▁sentence|>"

multipack support was added recently in #1712, although it wasn't working for me:

File "/root/micromamba/envs/dev/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants