-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PhiForCausalLM does not support Flash Attention 2.0 #28381
Comments
Hi, I would like to work on this issue |
Support for Phi-2 is still WIP, you can follow the progress here: #28163 |
Hi @gmittal, Flash Attention is already implemented for Phi, PR It seems that you are using the hub version of First update to the latest transformers version -
then run - from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("susnato/phi-2",
use_flash_attention_2=True,
torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("susnato/phi-2")
inputs = tokenizer('''def print_prime(n):
"""
Print all primes between 1 and n
"""''', return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text) Let me know if this works or not. |
I would like to work on this issue |
Using HF alignment notebook, DPO script gives me this error regardless of transformers version. (I already force updated with pip). When I remove flash attention from the yaml it works (after a bit of code adjustment). I am able to fine tune with one of my sft scripts using flash attention, which is the strange part. |
Hello everyone! This should be fixed in transformers 4.37.0.dev. If not using that version, please make sure that |
Thanks! Closing as this was fixed in #28163 |
I installed from source so now i am on transformers 4.37.dev0 and i am still getting the Incompatible error, even with trust remote code set to true. `C:\Users\PC\Documents\Code-Trainer\FineTune>py FINETUNERphiFP16.py --model_name_or_path C:\Users\PC\Documents\NEWGEN\text-generation-webui-main\models\dolphin-2_6-phi-2 --data_path MiniCoderW.json --output_dir C:\Users\PC\Documents\NEWGEN\text-generation-webui-main\models\TrainedPhi --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 1 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1000 --save_total_limit 10 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 10 --lr_scheduler_type "cosine" --report_to "tensorboard" --bf16 False --dataloader_num_workers 12 --optim paged_adamw_8bit ==================================================================================================== Here is the script I am using: `import copy import torch IGNORE_INDEX = -100 def build_instruction_prompt(instruction: str): Instruction:{} Response:'''.format(instruction.strip()).lstrip() @DataClass @DataClass @DataClass def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output_dir: str): def _tokenize_fn(strings: Sequence[str], tokenizer: transformers.PreTrainedTokenizer) -> Dict:
def preprocess(
@DataClass
def train_tokenize_function(examples, tokenizer): def train():
if name == "main": |
Hi @NickWithBotronics if you set Hence it's recommended to convert the weights from the @ArthurZucker should we host the converted phi-2 weights as part of the Microsoft organization? Cause currently one will get a lot of mismatched keys when doing the following:
due to the the model in Transformers using a single matrix for queries, keys and values wheras the code on the hub uses separate matrices. |
Thank you <3 !!!! that fixed that error(using the new modeling.py and converted hf format), now onto a new error that's due to my script I think?. :( C:\Users\PC\Documents\Code-Trainer\FineTune> Edit: fixed it by downloading the latest: Generation_config.json, Config.json, Configuration_phi.py, and Modeling_phi.py |
while I got it working, the training loss was very wack. It started at 6 and went to 2 (after 3 epochs) but when I used the old config with out flash attention it was .6 to ~.29(also 3 epochs) same dataset same set up, same model. Just different config files and flash attention. I saw someone else experience the same thing on twitter. |
Can you open a seperate issue for this? With a reproducible snippet |
Gotcha, I’ll move to this ticket #28488 |
Throws:
The text was updated successfully, but these errors were encountered: