Kindly add DeepSeek family for training #1340

ajinkya123-robo · 2024-02-27T05:42:52Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Axolotl for DeepSeek model finetuning. Thanks

✔️ Solution

Sft, Lora and qLora will suffice. It will be great to have their models in Axolotl training platform.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

NanoCode012 · 2024-02-27T07:08:35Z

Hey, as said here #1171, it's llama-based, so you can use the llama configs :)

ajinkya123-robo · 2024-02-27T11:29:47Z

Very cool, Thank you!

ehartford · 2024-02-27T20:27:46Z

It doesn't look so:

https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json

{
--
  | "architectures": [
  | "DeepseekForCausalLM"
  | ],
  | "attention_bias": false,
  | "attention_dropout": 0.0,
  | "auto_map": {
  | "AutoConfig": "configuration_deepseek.DeepseekConfig",
  | "AutoModel": "modeling_deepseek.DeepseekModel",
  | "AutoModelForCausalLM": "modeling_deepseek.DeepseekForCausalLM"
  | },
  | "bos_token_id": 100000,
  | "eos_token_id": 100001,

If it's a Llama then it will say "architectures": ["LlamaForCausalLM"] right?

NanoCode012 · 2024-02-28T09:11:10Z

Oh, I wasn’t aware of that model. I thought they were referencing models such as https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main which is llama based

ajinkya123-robo · 2024-02-28T17:54:06Z

Thanks @ehartford for spotting it & reopening the issue. I hope Axolotl team will come up with the solution.

NanoCode012 · 2024-02-28T19:02:16Z

@ajinkya123-robo , in the meantime, you can just use AutoCausalModelLM and AutoTokenizer with an existing config and point to your model (?). Unfortunately, in this case, sample packing isn't available yet

ajinkya123-robo · 2024-02-29T04:40:08Z

@NanoCode012 I am not sure if it will work out but I will give it try over the weekend.

ZaneH · 2024-03-31T19:52:56Z

Currently experimenting with training DeepSeek Coder and stumbled on this thread when I ran into:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 209, in get_spm_processor
    tokenizer.Load(self.vocab_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Using the examples/llama-2/qlora.yml as a reference and changing the following:

- base_model: NousResearch/Llama-2-7b-hf
- model_type: LlamaForCausalLM
- tokenizer_type: LlamaTokenizer
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
+ model_type: AutoCausalModelLM
+ tokenizer_type: AutoTokenizer

seems to work, thanks!

tmm1 · 2024-08-21T05:27:34Z

i did some experiments with deepseek-coder-v2, which works using:

base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Base
model_type: AutoCausalModelLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

special_tokens:
  pad_token: "<|EOT|>"
  bos_token: "<｜begin▁of▁sentence｜>"
  eos_token: "<｜end▁of▁sentence｜>"

multipack support was added recently in #1712, although it wasn't working for me:

File "/root/micromamba/envs/dev/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered

ajinkya123-robo added the enhancement New feature or request label Feb 27, 2024

ajinkya123-robo closed this as completed Feb 27, 2024

ehartford reopened this Feb 27, 2024

This was referenced Aug 27, 2024

reapply multipack patches after model load for trust_remote_code compat #1874

Closed

monkey-patch transformers to simplify monkey-patching modeling code #1877

Merged

tmm1 closed this as completed in #1877 Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kindly add DeepSeek family for training #1340

Kindly add DeepSeek family for training #1340

ajinkya123-robo commented Feb 27, 2024

NanoCode012 commented Feb 27, 2024

ajinkya123-robo commented Feb 27, 2024

ehartford commented Feb 27, 2024 •

edited

Loading

NanoCode012 commented Feb 28, 2024

ajinkya123-robo commented Feb 28, 2024

NanoCode012 commented Feb 28, 2024 •

edited

Loading

ajinkya123-robo commented Feb 29, 2024

ZaneH commented Mar 31, 2024

tmm1 commented Aug 21, 2024

Kindly add DeepSeek family for training #1340

Kindly add DeepSeek family for training #1340

Comments

ajinkya123-robo commented Feb 27, 2024

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

NanoCode012 commented Feb 27, 2024

ajinkya123-robo commented Feb 27, 2024

ehartford commented Feb 27, 2024 • edited Loading

NanoCode012 commented Feb 28, 2024

ajinkya123-robo commented Feb 28, 2024

NanoCode012 commented Feb 28, 2024 • edited Loading

ajinkya123-robo commented Feb 29, 2024

ZaneH commented Mar 31, 2024

tmm1 commented Aug 21, 2024

ehartford commented Feb 27, 2024 •

edited

Loading

NanoCode012 commented Feb 28, 2024 •

edited

Loading