Why MPS can never be used successfully? #32035

AimoneAndex · 2024-07-18T03:23:37Z

System Info

Device:Apple M3 Pro
OS:macOS Sonoma 14.1
packages:
datasets 2.20.1.dev0
evaluate 0.4.2
huggingface-hub 0.23.5
tokenizers 0.19.1
torch 2.5.0.dev20240717
torchaudio 2.4.0.dev20240717
torchvision 0.20.0.dev20240717
transformers 4.43.0.dev0

Who can help?

@ArthurZucker @muellerzr @ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import os
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForSeq2Seq,
    TrainingArguments,
    Trainer
)
import torch
from peft import LoraConfig, TaskType, get_peft_model

# 设定路径
current_dir = os.getcwd()
model_dir = os.path.join(current_dir, 'model', 'zh-7b')
save_dir = os.path.join(current_dir, 'model', 'zh-7b-saved')
target_file_path = os.path.join(current_dir, 'dats.csv')

# 加载数据集
dataset = load_dataset("csv", data_files=target_file_path, split="train")

# 加载分词器
tokenizer = AutoTokenizer.from_pretrained(model_dir)
tokenizer.padding_side = "right"
tokenizer.pad_token_id = 2

# 数据预处理
def process_func(example):
    MAX_LENGTH = 384
    instruction = tokenizer(f"Human: {example['instruction']}\n{example['input']}\n\nAssistant: ", add_special_tokens=False)
    response = tokenizer(example['output'], add_special_tokens=False)
    input_ids = instruction['input_ids'] + response['input_ids'] + [tokenizer.eos_token_id]
    attention_mask = instruction['attention_mask'] + response['attention_mask'] + [1]
    labels = [-100] * len(instruction['input_ids']) + response['input_ids'] + [tokenizer.eos_token_id]
    if len(input_ids) > MAX_LENGTH:
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {'input_ids': input_ids, 'attention_mask': attention_mask, 'labels': labels}

# 预处理数据集
tokenized_dataset = dataset.map(process_func, remove_columns=dataset.column_names)

# 加载模型到 MPS 设备
device = torch.device("mps")
model = AutoModelForCausalLM.from_pretrained(model_dir, low_cpu_mem_usage=True, torch_dtype=torch.half, device_map="mps")

# 加载 LoRA 配置
config = LoraConfig(task_type=TaskType.CAUSAL_LM)
model = get_peft_model(model, config)

# 设置训练参数
training_args = TrainingArguments(
    output_dir=save_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    logging_steps=10,
    num_train_epochs=1,
)

# 创建 Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset.select(range(3)),  # 限制训练数据集大小，仅用于测试
    data_collator=DataCollatorForSeq2Seq(tokenizer, padding=True),
)

# 开始训练
trainer.train()

# 保存模型
model.save_pretrained(save_dir)

Then it prints:
RuntimeError: Placeholder storage has not been allocated on MPS device!

Expected behavior

Train successfully.

The text was updated successfully, but these errors were encountered:

jponf · 2024-07-21T14:28:05Z

@AimoneAndex I was looking for this too, luckily it has already been fixed already, see #31812. It is a matter of waiting for a new release. In the meantime you can install the package from github to get the fix now:

pip install git+https://github.com/huggingface/transformers.git

@ArthurZucker could you tell us when is the next release planned? not being able to use MPS on mac devices is quite annoying 😥

ArthurZucker · 2024-07-22T13:13:37Z

Release will be this week! Sorry all for the troubles, I am also a mac user and this sucks!

AimoneAndex · 2024-07-23T01:56:10Z

pip install git+https://github.com/huggingface/transformers.git
@ArthurZucker could you tell us when is the next release planned? not being able to use MPS on mac devices is quite annoying 😥

Thank you!And I will update it as soon as the new version releases!

AimoneAndex · 2024-07-23T01:59:35Z

Release will be this week! Sorry all for the troubles, I am also a mac user and this sucks!

Thanks a lot!It is developers like you that make Transformers easier for everyone to build our dreams.Truly thank you,and you everyone, so much!

AimoneAndex · 2024-07-23T02:10:53Z

I've seen the new version's coming,and I'll try it as soon as I come back to my computer.Truly thanks for everyone!

AimoneAndex · 2024-07-24T02:13:54Z

Everything's OK in this new version.Thanks for everyone!

AimoneAndex · 2024-08-13T05:32:36Z

Can MPS use FP16 when training?Why I can't?
ValueError Traceback (most recent call last)
Cell In[15], line 1
----> 1 trainer = Trainer(
2 model=model,
3 args=training_args,
4 train_dataset=tokenized_dataset,
5 data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
6 )

File ~/Data/AIHub/Trans-Penv/transformers/src/transformers/trainer.py:409, in Trainer.init(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
406 self.deepspeed = None
407 self.is_in_train = False
--> 409 self.create_accelerator_and_postprocess()
411 # memory metrics - must set up as early as possible
412 self._memory_tracker = TrainerMemoryTracker(self.args.skip_memory_metrics)

File ~/Data/AIHub/Trans-Penv/transformers/src/transformers/trainer.py:4648, in Trainer.create_accelerator_and_postprocess(self)
4645 args.update(accelerator_config)
4647 # create accelerator object
-> 4648 self.accelerator = Accelerator(**args)
4649 # some Trainer classes need to use gather instead of gather_for_metrics, thus we store a flag
4650 self.gather_function = self.accelerator.gather_for_metrics

File /opt/anaconda3/envs/tfs/lib/python3.12/site-packages/accelerate/accelerator.py:467, in Accelerator.init(self, device_placement, split_batches, mixed_precision, gradient_accumulation_steps, cpu, dataloader_config, deepspeed_plugin, fsdp_plugin, megatron_lm_plugin, rng_types, log_with, project_dir, project_config, gradient_accumulation_plugin, dispatch_batches, even_batches, use_seedable_sampler, step_scheduler_with_optimizer, kwargs_handlers, dynamo_backend)
...
--> 467 raise ValueError(f"fp16 mixed precision requires a GPU (not {self.device.type!r}).")
468 kwargs = self.scaler_handler.to_kwargs() if self.scaler_handler is not None else {}
469 if self.distributed_type == DistributedType.FSDP:

ValueError: fp16 mixed precision requires a GPU (not 'mps').

amyeroberts · 2024-08-13T11:17:29Z

cc @muellerzr as this appears to be being raised in accelerate

muellerzr · 2024-08-13T12:24:31Z

Correct, there's nothing we can do for now until stable torch supports mixed precision MPS:

Accelerate issue: accelerate autocast on mps device accelerate#1396
[MPS] Add support for autocast in MPS pytorch/pytorch#132625

It looks like the nightlies may have it, so soon!

AimoneAndex · 2024-08-23T11:51:54Z

Correct, there's nothing we can do for now until stable torch supports mixed precision MPS:

Accelerate issue: accelerate autocast on mps device accelerate#1396

[MPS] Add support for autocast in MPS pytorch/pytorch#132625

It looks like the nightlies may have it, so soon!

That's OK!Thanks a lot!

AimoneAndex added the bug label Jul 18, 2024

AimoneAndex closed this as completed Jul 24, 2024

wanggz mentioned this issue Jul 24, 2024

RuntimeError: Placeholder storage has not been allocated on MPS device! UKPLab/sentence-transformers#2855

Closed

AimoneAndex reopened this Aug 13, 2024

muellerzr closed this as completed Aug 13, 2024

muellerzr mentioned this issue Aug 13, 2024

Can MPS use FP16 when training?Why I can't? #32648

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why MPS can never be used successfully? #32035

Why MPS can never be used successfully? #32035

AimoneAndex commented Jul 18, 2024

jponf commented Jul 21, 2024 •

edited

Loading

ArthurZucker commented Jul 22, 2024

AimoneAndex commented Jul 23, 2024 •

edited

Loading

AimoneAndex commented Jul 23, 2024

AimoneAndex commented Jul 23, 2024 •

edited

Loading

AimoneAndex commented Jul 24, 2024 •

edited

Loading

AimoneAndex commented Aug 13, 2024

amyeroberts commented Aug 13, 2024

muellerzr commented Aug 13, 2024

AimoneAndex commented Aug 23, 2024

Why MPS can never be used successfully? #32035

Why MPS can never be used successfully? #32035

Comments

AimoneAndex commented Jul 18, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

jponf commented Jul 21, 2024 • edited Loading

ArthurZucker commented Jul 22, 2024

AimoneAndex commented Jul 23, 2024 • edited Loading

AimoneAndex commented Jul 23, 2024

AimoneAndex commented Jul 23, 2024 • edited Loading

AimoneAndex commented Jul 24, 2024 • edited Loading

AimoneAndex commented Aug 13, 2024

amyeroberts commented Aug 13, 2024

muellerzr commented Aug 13, 2024

AimoneAndex commented Aug 23, 2024

jponf commented Jul 21, 2024 •

edited

Loading

AimoneAndex commented Jul 23, 2024 •

edited

Loading

AimoneAndex commented Jul 23, 2024 •

edited

Loading

AimoneAndex commented Jul 24, 2024 •

edited

Loading