Microsoft Windows [Version 10.0.22631.3155] (c) Microsoft Corporation. All rights reserved. C:\Users\Training\Documents\Finetune>python dpo.py --model_name_or_path microsoft/phi-2 --per_device_train_batch_size 1 --max_steps 1000 --learning_rate 2e-5 --gradient_accumulation_steps 3 --logging_steps 10 --eval_steps 0 --output_dir C:\Users\Training\Documents\Finetune\DPOphi2 --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --optim rmsprop_bnb_8bit --use_peft --lora_r 32 --lora_alpha 16 --lr_scheduler_type cosine --trust_remote_code --save_steps 333 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --lora_target_modules q_proj, v_proj, k_proj, dense, lm_head, fc1, fc2 --dataloader_persistent_workers true --dataloader_pin_memory true --dataloader_num_workers 6 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --max_length 1028 --max_prompt_length 512 --attn_implementation sdpa WARNING:tensorflow:From C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead. [2024-02-20 21:45:40,157] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-20 21:45:40,505] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. Traceback (most recent call last): File "C:\Users\Training\Documents\Finetune\dpo.py", line 140, in model = AutoModelForCausalLM.from_pretrained(model_config.model_name_or_path, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 556, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 3369, in from_pretrained config = cls._autoset_attn_implementation( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 1369, in _autoset_attn_implementation config = cls._check_and_enable_sdpa( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 1525, in _check_and_enable_sdpa raise ValueError( ValueError: PhiForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: https://github.com/huggingface/transformers/issues/28005. If you believe this error is a bug, please open an issue in Transformers GitHub repository and load your model with the argument `attn_implementation="eager"` meanwhile. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="eager")` Exception ignored in atexit callback: Traceback (most recent call last): File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 444, in matmul_ext_update_autotune_table fp16_matmul._update_autotune_table() File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 421, in _update_autotune_table TritonMatmul._update_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel) File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 150, in _update_autotune_table cache_manager.put(autotune_table) File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 69, in put os.rename(self.file_path + ".tmp", self.file_path) FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\Training\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle.tmp' -> 'C:\\Users\\Training\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle' C:\Users\Training\Documents\Finetune>pip install -U git+https://github.com/huggingface/transformers Collecting git+https://github.com/huggingface/transformers Cloning https://github.com/huggingface/transformers to c:\users\training\appdata\local\temp\pip-req-build-bhkbaxy1 Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers 'C:\Users\Training\AppData\Local\Temp\pip-req-build-bhkbaxy1' Resolved https://github.com/huggingface/transformers to commit e770f0316d2a9b787c9d1440f204fcb65e176682 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: filelock in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (3.9.0) Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (0.20.3) Requirement already satisfied: numpy>=1.17 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (1.26.4) Requirement already satisfied: packaging>=20.0 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (23.2) Requirement already satisfied: pyyaml>=5.1 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (6.0.1) Requirement already satisfied: regex!=2019.12.17 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (2023.12.25) Requirement already satisfied: requests in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (2.31.0) Requirement already satisfied: tokenizers<0.19,>=0.14 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (0.15.2) Requirement already satisfied: safetensors>=0.4.1 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (0.4.1) Requirement already satisfied: tqdm>=4.27 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from transformers==4.38.0.dev0) (4.66.1) Requirement already satisfied: fsspec>=2023.5.0 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0.dev0) (2023.10.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0.dev0) (4.9.0) Requirement already satisfied: colorama in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from tqdm>=4.27->transformers==4.38.0.dev0) (0.4.6) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from requests->transformers==4.38.0.dev0) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from requests->transformers==4.38.0.dev0) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from requests->transformers==4.38.0.dev0) (2.2.0) Requirement already satisfied: certifi>=2017.4.17 in c:\users\training\appdata\local\programs\python\python311\lib\site-packages (from requests->transformers==4.38.0.dev0) (2024.2.2) [notice] A new release of pip is available: 23.2.1 -> 24.0 [notice] To update, run: python.exe -m pip install --upgrade pip C:\Users\Training\Documents\Finetune>python dpo.py --model_name_or_path microsoft/phi-2 --per_device_train_batch_size 1 --max_steps 1000 --learning_rate 2e-5 --gradient_accumulation_steps 3 --logging_steps 10 --eval_steps 0 --output_dir C:\Users\Training\Documents\Finetune\DPOphi2 --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --optim rmsprop_bnb_8bit --use_peft --lora_r 32 --lora_alpha 16 --lr_scheduler_type cosine --trust_remote_code --save_steps 333 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --lora_target_modules q_proj, v_proj, k_proj, dense, lm_head, fc1, fc2 --dataloader_persistent_workers true --dataloader_pin_memory true --dataloader_num_workers 6 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --max_length 1028 --max_prompt_length 512 --attn_implementation sdpa WARNING:tensorflow:From C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead. [2024-02-20 21:46:58,513] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-20 21:46:58,868] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. Traceback (most recent call last): File "C:\Users\Training\Documents\Finetune\dpo.py", line 140, in model = AutoModelForCausalLM.from_pretrained(model_config.model_name_or_path, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 556, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 3369, in from_pretrained config = cls._autoset_attn_implementation( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 1369, in _autoset_attn_implementation config = cls._check_and_enable_sdpa( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 1525, in _check_and_enable_sdpa raise ValueError( ValueError: PhiForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: https://github.com/huggingface/transformers/issues/28005. If you believe this error is a bug, please open an issue in Transformers GitHub repository and load your model with the argument `attn_implementation="eager"` meanwhile. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="eager")` Exception ignored in atexit callback: Traceback (most recent call last): File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 444, in matmul_ext_update_autotune_table fp16_matmul._update_autotune_table() File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 421, in _update_autotune_table TritonMatmul._update_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel) File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 150, in _update_autotune_table cache_manager.put(autotune_table) File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 69, in put os.rename(self.file_path + ".tmp", self.file_path) FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\Training\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle.tmp' -> 'C:\\Users\\Training\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle' C:\Users\Training\Documents\Finetune>python dpo.py --model_name_or_path microsoft/phi-2 --per_device_train_batch_size 1 --max_steps 1000 --learning_rate 2e-5 --gradient_accumulation_steps 3 --logging_steps 10 --eval_steps 0 --output_dir C:\Users\Training\Documents\Finetune\DPOphi2 --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --optim rmsprop_bnb_8bit --use_peft --lora_r 32 --lora_alpha 16 --lr_scheduler_type cosine --trust_remote_code --save_steps 333 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --lora_target_modules q_proj, v_proj, k_proj, dense, lm_head, fc1, fc2 --dataloader_persistent_workers true --dataloader_pin_memory true --dataloader_num_workers 6 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --max_length 1028 --max_prompt_length 512 --attn_implementation SDPA WARNING:tensorflow:From C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead. [2024-02-20 21:47:35,792] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-20 21:47:36,128] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. Traceback (most recent call last): File "C:\Users\Training\Documents\Finetune\dpo.py", line 140, in model = AutoModelForCausalLM.from_pretrained(model_config.model_name_or_path, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 556, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 3369, in from_pretrained config = cls._autoset_attn_implementation( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 1348, in _autoset_attn_implementation raise ValueError(message + ".") ValueError: Specified `attn_implementation="SDPA"` is not supported. The only possible arguments are `attn_implementation="eager"` (manual attention implementation), `"attn_implementation=flash_attention_2"` (implementation using flash attention 2). Exception ignored in atexit callback: Traceback (most recent call last): File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 444, in matmul_ext_update_autotune_table fp16_matmul._update_autotune_table() File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 421, in _update_autotune_table TritonMatmul._update_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel) File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 150, in _update_autotune_table cache_manager.put(autotune_table) File "C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\deepspeed\ops\transformer\inference\triton\matmul_ext.py", line 69, in put os.rename(self.file_path + ".tmp", self.file_path) FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\Training\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle.tmp' -> 'C:\\Users\\Training\\.triton\\autotune\\Fp16Matmul_2d_kernel.pickle' C:\Users\Training\Documents\Finetune>python dpo.py --model_name_or_path microsoft/phi-2 --per_device_train_batch_size 1 --max_steps 1000 --learning_rate 2e-5 --gradient_accumulation_steps 3 --logging_steps 10 --eval_steps 0 --output_dir C:\Users\Training\Documents\Finetune\DPOphi2 --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --optim rmsprop_bnb_8bit --use_peft --lora_r 32 --lora_alpha 16 --lr_scheduler_type cosine --trust_remote_code --save_steps 333 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --lora_target_modules q_proj, v_proj, k_proj, dense, lm_head, fc1, fc2 --dataloader_persistent_workers true --dataloader_pin_memory true --dataloader_num_workers 6 --load_in_4bit --bnb_4bit_quant_type nf4 --use_bnb_nested_quant true --max_length 1028 --max_prompt_length 512 --attn_implementation flash_attention_2 WARNING:tensorflow:From C:\Users\Training\AppData\Local\Programs\Python\Python311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead. [2024-02-20 21:49:41,033] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-20 21:49:41,370] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.03s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. wandb: Currently logged in as: bobnick0703. Use `wandb login --relogin` to force relogin wandb: wandb version 0.16.3 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.16.2 wandb: Run data is saved locally in C:\Users\Training\Documents\Finetune\wandb\run-20240220_214959-rlsw58rv wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run golden-chrysanthemum-48 wandb: View project at https://wandb.ai/bobnick0703/huggingface wandb: View run at https://wandb.ai/bobnick0703/huggingface/runs/rlsw58rv 0%| | 0/1000 [00:00