Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why MPS can never be used successfully? #32035

Closed
2 of 4 tasks
AimoneAndex opened this issue Jul 18, 2024 · 10 comments
Closed
2 of 4 tasks

Why MPS can never be used successfully? #32035

AimoneAndex opened this issue Jul 18, 2024 · 10 comments
Labels

Comments

@AimoneAndex
Copy link

System Info

Device:Apple M3 Pro
OS:macOS Sonoma 14.1
packages:
datasets 2.20.1.dev0
evaluate 0.4.2
huggingface-hub 0.23.5
tokenizers 0.19.1
torch 2.5.0.dev20240717
torchaudio 2.4.0.dev20240717
torchvision 0.20.0.dev20240717
transformers 4.43.0.dev0

Who can help?

@ArthurZucker @muellerzr @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import os
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForSeq2Seq,
    TrainingArguments,
    Trainer
)
import torch
from peft import LoraConfig, TaskType, get_peft_model

# 设定路径
current_dir = os.getcwd()
model_dir = os.path.join(current_dir, 'model', 'zh-7b')
save_dir = os.path.join(current_dir, 'model', 'zh-7b-saved')
target_file_path = os.path.join(current_dir, 'dats.csv')

# 加载数据集
dataset = load_dataset("csv", data_files=target_file_path, split="train")

# 加载分词器
tokenizer = AutoTokenizer.from_pretrained(model_dir)
tokenizer.padding_side = "right"
tokenizer.pad_token_id = 2

# 数据预处理
def process_func(example):
    MAX_LENGTH = 384
    instruction = tokenizer(f"Human: {example['instruction']}\n{example['input']}\n\nAssistant: ", add_special_tokens=False)
    response = tokenizer(example['output'], add_special_tokens=False)
    input_ids = instruction['input_ids'] + response['input_ids'] + [tokenizer.eos_token_id]
    attention_mask = instruction['attention_mask'] + response['attention_mask'] + [1]
    labels = [-100] * len(instruction['input_ids']) + response['input_ids'] + [tokenizer.eos_token_id]
    if len(input_ids) > MAX_LENGTH:
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {'input_ids': input_ids, 'attention_mask': attention_mask, 'labels': labels}

# 预处理数据集
tokenized_dataset = dataset.map(process_func, remove_columns=dataset.column_names)

# 加载模型到 MPS 设备
device = torch.device("mps")
model = AutoModelForCausalLM.from_pretrained(model_dir, low_cpu_mem_usage=True, torch_dtype=torch.half, device_map="mps")

# 加载 LoRA 配置
config = LoraConfig(task_type=TaskType.CAUSAL_LM)
model = get_peft_model(model, config)

# 设置训练参数
training_args = TrainingArguments(
    output_dir=save_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    logging_steps=10,
    num_train_epochs=1,
)

# 创建 Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset.select(range(3)),  # 限制训练数据集大小,仅用于测试
    data_collator=DataCollatorForSeq2Seq(tokenizer, padding=True),
)

# 开始训练
trainer.train()

# 保存模型
model.save_pretrained(save_dir)

Then it prints:
RuntimeError: Placeholder storage has not been allocated on MPS device!

Expected behavior

Train successfully.

@jponf
Copy link

jponf commented Jul 21, 2024

@AimoneAndex I was looking for this too, luckily it has already been fixed already, see #31812. It is a matter of waiting for a new release. In the meantime you can install the package from github to get the fix now:

pip install git+https://github.com/huggingface/transformers.git

@ArthurZucker could you tell us when is the next release planned? not being able to use MPS on mac devices is quite annoying 😥

@ArthurZucker
Copy link
Collaborator

Release will be this week! Sorry all for the troubles, I am also a mac user and this sucks!

@AimoneAndex
Copy link
Author

AimoneAndex commented Jul 23, 2024

pip install git+https://github.com/huggingface/transformers.git

@ArthurZucker could you tell us when is the next release planned? not being able to use MPS on mac devices is quite annoying 😥

Thank you!And I will update it as soon as the new version releases!

@AimoneAndex
Copy link
Author

Release will be this week! Sorry all for the troubles, I am also a mac user and this sucks!

Thanks a lot!It is developers like you that make Transformers easier for everyone to build our dreams.Truly thank you,and you everyone, so much!

@AimoneAndex
Copy link
Author

AimoneAndex commented Jul 23, 2024

I've seen the new version's coming,and I'll try it as soon as I come back to my computer.Truly thanks for everyone!

@AimoneAndex
Copy link
Author

AimoneAndex commented Jul 24, 2024

Everything's OK in this new version.Thanks for everyone!

@AimoneAndex
Copy link
Author

Can MPS use FP16 when training?Why I can't?
ValueError Traceback (most recent call last)
Cell In[15], line 1
----> 1 trainer = Trainer(
2 model=model,
3 args=training_args,
4 train_dataset=tokenized_dataset,
5 data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
6 )

File ~/Data/AIHub/Trans-Penv/transformers/src/transformers/trainer.py:409, in Trainer.init(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
406 self.deepspeed = None
407 self.is_in_train = False
--> 409 self.create_accelerator_and_postprocess()
411 # memory metrics - must set up as early as possible
412 self._memory_tracker = TrainerMemoryTracker(self.args.skip_memory_metrics)

File ~/Data/AIHub/Trans-Penv/transformers/src/transformers/trainer.py:4648, in Trainer.create_accelerator_and_postprocess(self)
4645 args.update(accelerator_config)
4647 # create accelerator object
-> 4648 self.accelerator = Accelerator(**args)
4649 # some Trainer classes need to use gather instead of gather_for_metrics, thus we store a flag
4650 self.gather_function = self.accelerator.gather_for_metrics

File /opt/anaconda3/envs/tfs/lib/python3.12/site-packages/accelerate/accelerator.py:467, in Accelerator.init(self, device_placement, split_batches, mixed_precision, gradient_accumulation_steps, cpu, dataloader_config, deepspeed_plugin, fsdp_plugin, megatron_lm_plugin, rng_types, log_with, project_dir, project_config, gradient_accumulation_plugin, dispatch_batches, even_batches, use_seedable_sampler, step_scheduler_with_optimizer, kwargs_handlers, dynamo_backend)
...
--> 467 raise ValueError(f"fp16 mixed precision requires a GPU (not {self.device.type!r}).")
468 kwargs = self.scaler_handler.to_kwargs() if self.scaler_handler is not None else {}
469 if self.distributed_type == DistributedType.FSDP:

ValueError: fp16 mixed precision requires a GPU (not 'mps').

@AimoneAndex AimoneAndex reopened this Aug 13, 2024
@amyeroberts
Copy link
Collaborator

cc @muellerzr as this appears to be being raised in accelerate

@muellerzr
Copy link
Contributor

Correct, there's nothing we can do for now until stable torch supports mixed precision MPS:

It looks like the nightlies may have it, so soon!

@AimoneAndex
Copy link
Author

Correct, there's nothing we can do for now until stable torch supports mixed precision MPS:

It looks like the nightlies may have it, so soon!

That's OK!Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants