Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about run_unsloth_peft.sh #2152

Closed
opentld opened this issue Oct 16, 2024 · 5 comments
Closed

about run_unsloth_peft.sh #2152

opentld opened this issue Oct 16, 2024 · 5 comments

Comments

@opentld
Copy link

opentld commented Oct 16, 2024

sft.zip
ruozhiba_cvt.zip

I try to fine-tune the ruozhiba dataset using unsloth. I download the pre-trained models and dataset to local machine.
I modified some .py files , and all the files is in the zip package.
But I got a subprocess.CalledProcessError, I don't know how to handle it ?
The following is all the outputs:

(llama3) D:\SourceCodes\llama\peft-main\examples\sft>bash run_unsloth_peft_fp16.sh
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
[2024-10-16 18:39:03,409] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
test.c
LINK : fatal error LNK1181: 无法打开输入文件“cufile.lib”
W1016 18:39:09.417000 23884 torch\distributed\elastic\multiprocessing\redirects.py:28] NOTE: Redirects are currently not supported in Windows or MacOs.
==((====))==  Unsloth 2024.10.0: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: NVIDIA GeForce RTX 2080. Max memory: 8.0 GB. Platform = Windows.
O^O/ \_/ \    Pytorch: 2.4.1. CUDA = 7.5. CUDA Toolkit = 12.4.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.85s/it]
D:\SourceCodes\llama\models\llama3-chinese-8b-v3 does not have a padding token! Will use pad_token = <|reserved_special_token_250|>.
Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.1.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2024.10.0 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.
Size of the train set: 1496. Size of the validation set: 1
A sample of train dataset: {'text': '<s>[INST]只剩一个心脏了还能活吗?[/INST] 能,人本来就只有一个心脏。</s>'}
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (v_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (o_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (rotary_emb): LlamaRotaryEmbedding()
            )
            (mlp): LlamaMLP(
              (gate_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=14336, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (up_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=14336, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (down_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=14336, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=14336, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (act_fn): SiLU()
            )
            (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
            (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
          )
        )
        (norm): LlamaRMSNorm((4096,), eps=1e-05)
        (rotary_emb): LlamaRotaryEmbedding()
      )
      (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
    )
  )
)
trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
start_training....
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 75 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 8 | Gradient Accumulation steps = 8
\        /    Total batch size = 64 | Total steps = 1
 "-____-"     Number of trainable parameters = 20,971,520
  0%|                                                                                                                                                              | 0/1 [00:00<?, ?it/s]ptxas info    : 11 bytes gmem
ptxas info    : Compiling entry function '_rms_layernorm_forward_0d1de2d3de4d5c6d7c8de9' for 'sm_75'
ptxas info    : Function properties for _rms_layernorm_forward_0d1de2d3de4d5c6d7c8de9
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 39 registers, 408 bytes cmem[0]
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x1b4): undefined reference to `__imp_PyArg_ParseTuple'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x1cb): undefined reference to `__imp__Py_NoneStruct'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x1d9): undefined reference to `__imp_PyObject_CallObject'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x211): undefined reference to `__imp_PyObject_GetAttrString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x225): undefined reference to `__imp_PyTuple_New'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x237): undefined reference to `__imp_PyObject_Call'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x276): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x2b1): undefined reference to `__imp_PyObject_GetAttrString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x2c5): undefined reference to `__imp_PyTuple_New'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x2d7): undefined reference to `__imp_PyObject_Call'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x319): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x354): undefined reference to `__imp_PyObject_GetAttrString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x368): undefined reference to `__imp_PyTuple_New'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x37a): undefined reference to `__imp_PyObject_Call'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x3bb): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x3f6): undefined reference to `__imp_PyObject_GetAttrString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x40a): undefined reference to `__imp_PyTuple_New'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x41c): undefined reference to `__imp_PyObject_Call'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x45b): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x4e7): undefined reference to `__imp_PyEval_SaveThread'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x6d7): undefined reference to `__imp_PyEval_RestoreThread'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x6dd): undefined reference to `__imp_PyErr_Occurred'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x6f8): undefined reference to `__imp_PyObject_CallObject'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x713): undefined reference to `__imp_PyExc_TypeError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x723): undefined reference to `__imp_PyErr_SetString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x766): undefined reference to `__imp_PyExc_TypeError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x776): undefined reference to `__imp_PyErr_SetString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x792): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x7b2): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x7c5): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x7d5): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x7f2): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x805): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x815): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x873): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x887): undefined reference to `__imp_PyLong_AsUnsignedLongLong'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x8d4): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x8f5): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x905): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x953): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x96a): undefined reference to `__imp__Py_Dealloc'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0x978): more undefined references to `__imp__Py_Dealloc' follow
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xa4a): undefined reference to `__imp_PyGILState_Ensure'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xa56): undefined reference to `__imp_PyExc_RuntimeError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xa5f): undefined reference to `__imp_PyErr_SetString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xa67): undefined reference to `__imp_PyGILState_Release'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xb83): undefined reference to `__imp_PyExc_ValueError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xb96): undefined reference to `__imp_PyErr_Format'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xbbf): undefined reference to `__imp_PyExc_ValueError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xbd5): undefined reference to `__imp_PyErr_Format'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xbfe): undefined reference to `__imp_PyExc_ValueError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xc14): undefined reference to `__imp_PyErr_Format'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xc3b): undefined reference to `__imp_PyExc_ValueError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xc51): undefined reference to `__imp_PyErr_Format'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xc78): undefined reference to `__imp_PyExc_RuntimeError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xc8a): undefined reference to `__imp_PyErr_SetString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xc96): undefined reference to `__imp_PyExc_RuntimeError'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xca8): undefined reference to `__imp_PyErr_SetString'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xcd3): undefined reference to `__imp_PyModule_Create2'
C:\Users\JAMESM~1\AppData\Local\Temp\ccr6NQ10.o:main.c:(.text+0xceb): undefined reference to `__imp_PyModule_AddFunctions'
collect2.exe: error: ld returned 1 exit status
Traceback (most recent call last):
  File "D:\SourceCodes\llama\peft-main\examples\sft\train.py", line 164, in <module>
    main(model_args, data_args, training_args)
  File "D:\SourceCodes\llama\peft-main\examples\sft\train.py", line 148, in main
    trainer.train(resume_from_checkpoint=checkpoint)
  File "<string>", line 142, in train
  File "<string>", line 363, in _fast_inner_training_loop
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\transformers\trainer.py", line 3318, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\transformers\trainer.py", line 3363, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\_compile.py", line 31, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\_dynamo\eval_frame.py", line 600, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\unsloth\models\llama.py", line 1039, in PeftModelForCausalLM_fast_forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\peft\tuners\tuners_utils.py", line 197, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\unsloth\models\llama.py", line 937, in _CausalLM_fast_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\unsloth\models\llama.py", line 789, in LlamaModel_fast_forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\_compile.py", line 31, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\_dynamo\eval_frame.py", line 600, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\utils\checkpoint.py", line 481, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\autograd\function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\utils\checkpoint.py", line 255, in forward
    outputs = run_function(*args)
              ^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\unsloth\models\llama.py", line 785, in custom_forward
    return module(*inputs, past_key_value, output_attentions, padding_mask = padding_mask)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\unsloth\models\llama.py", line 485, in LlamaDecoderLayer_fast_forward
    hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\unsloth\kernels\rms_layernorm.py", line 192, in fast_rms_layernorm
    out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\torch\autograd\function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\unsloth\kernels\rms_layernorm.py", line 144, in forward
    fx[(n_rows,)](
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\triton\runtime\jit.py", line 541, in run
    self.cache[device][key] = compile(
                              ^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\triton\compiler\compiler.py", line 202, in compile
    so_path = backend.make_launcher_stub(src, metadata)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\triton\compiler\backends\cuda.py", line 224, in make_launcher_stub
    return make_stub(src.name, src.signature, constants, ids, enable_warp_specialization=enable_warp_specialization)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\triton\compiler\make_launcher.py", line 37, in make_stub
    so = _build(name, src_path, tmpdir)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\site-packages\triton\common\build.py", line 124, in _build
    ret = subprocess.check_call(cc_cmd)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DevelopTools\anaconda3\envs\llama3\Lib\subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['C:\\MinGW\\bin\\gcc.EXE', 'C:\\Users\\JAMESM~1\\AppData\\Local\\Temp\\tmpuu9daoqv\\main.c', '-O3', '-shared', '-ID:\\DevelopTools\\anaconda3\\envs\\llama3\\Lib\\site-packages\\triton\\common\\..\\third_party\\cuda\\include', '-ID:\\DevelopTools\\anaconda3\\envs\\llama3\\Include', '-IC:\\Users\\JAMESM~1\\AppData\\Local\\Temp\\tmpuu9daoqv', '-LC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\lib\\x64', '-LD:\\DevelopTools\\anaconda3\\envs\\llama3\\libs', '-lcuda', '-o', 'C:\\Users\\JAMESM~1\\AppData\\Local\\Temp\\tmpuu9daoqv\\_rms_layernorm_forward.cp311-win_amd64.pyd']' returned non-zero exit status 1.
  0%|                                                                                                                                                              | 0/1 [00:02<?, ?it/s]
@BenjaminBossan
Copy link
Member

I don't have experience with unsloth, so I'm not sure I can help with this error. It looks like you're running this on Windows, are you using WSL? AFAIK, triton kernels can be problematic with Windows. Also please ensure that you're on the latest versions of all important packages, especially unsloth, torch, triton, accelerate, transformers, and of course PEFT.

@opentld
Copy link
Author

opentld commented Oct 17, 2024

I don't have experience with unsloth, so I'm not sure I can help with this error. It looks like you're running this on Windows, are you using WSL? AFAIK, triton kernels can be problematic with Windows. Also please ensure that you're on the latest versions of all important packages, especially unsloth, torch, triton, accelerate, transformers, and of course PEFT.

Im using windows10 without WSL,here is all my packages:

(llama3) D:\SourceCodes\llama\peft-main\examples\sft>pip list
Package Version


accelerate 1.0.1
aiohappyeyeballs 2.4.3
aiohttp 3.10.10
aiosignal 1.3.1
annotated-types 0.7.0
attrs 24.2.0
bitsandbytes 0.44.1
certifi 2024.8.30
charset-normalizer 3.4.0
cmake 3.30.4
colorama 0.4.6
datasets 3.0.1
datatrove 0.3.0
deepspeed 0.15.3+unknown
dill 0.3.8
docstring_parser 0.16
einops 0.8.0
filelock 3.13.1
flash_attn 2.6.3
frozenlist 1.4.1
fsspec 2024.2.0
gmpy2 2.1.2
hf_transfer 0.1.8
hjson 3.1.0
huggingface-hub 0.25.2
humanize 4.11.0
idna 3.10
iniconfig 2.0.0
Jinja2 3.1.4
loguru 0.7.2
markdown-it-py 3.0.0
MarkupSafe 2.1.3
mdurl 0.1.2
mpmath 1.3.0
msgpack 1.1.0
multidict 6.1.0
multiprocess 0.70.16
networkx 3.3
ninja 1.11.1.1
numpy 1.26.4
nvidia-ml-py 12.560.30
packaging 24.1
pandas 2.2.3
peft 0.13.2
pip 24.2
pluggy 1.5.0
propcache 0.2.0
protobuf 3.20.3
psutil 6.0.0
py-cpuinfo 9.0.0
pyarrow 17.0.0
pybind11 2.13.6
pydantic 2.9.2
pydantic_core 2.23.4
Pygments 2.18.0
pytest 8.3.3
python-dateutil 2.9.0.post0
pytz 2024.2
PyYAML 6.0.2
regex 2024.9.11
requests 2.32.3
rich 13.9.2
safetensors 0.4.5
scipy 1.14.1
sentencepiece 0.2.0
setuptools 75.1.0
shtab 1.7.1
six 1.16.0
sympy 1.13.2
tokenizers 0.19.1
torch 2.4.1
tqdm 4.66.5
transformers 4.44.2
triton 2.1.0
trl 0.11.1
typing_extensions 4.11.0
tyro 0.8.12
tzdata 2024.2
unsloth 2024.10.0
unsloth_zoo 2024.10.1
urllib3 2.2.3
wheel 0.44.0
win32-setctime 1.1.0
xformers 0.0.28.post1
xxhash 3.5.0
yarl 1.15.3

@JINO-ROHIT
Copy link
Contributor

Hi @opentld if i remember correctly, there is not support for triton on windows yet, you can read here - triton-lang/triton#4045.

maybe thats why, try wsl or ubuntu maybe?

@BenjaminBossan
Copy link
Member

I agree, AFAIK triton does not work with Windows directly, you should try using WSL.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants