Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminating due to uncaught exception of type c10::TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype. #882

Closed
Aniket22156 opened this issue Jun 1, 2023 · 2 comments

Comments

@Aniket22156
Copy link

Folder 100_heer: 26 images found
Folder 100_heer: 2600 steps
Total steps: 2600
Train batch size: 1
Gradient accumulation steps: 1.0
Epoch: 1
Regulatization factor: 1
max_train_steps (2600 / 1 / 1.0 * 1 * 1) = 2600
stop_text_encoder_training = 0
lr_warmup_steps = 260
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="/Users/aniketsharma/Documents/Sharma/image" --resolution=512,512 --output_dir="/Users/aniketsharma/Documents/Sharma/model" --logging_dir="/Users/aniketsharma/Documents/Sharma/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="last" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="260" --train_batch_size="1" --max_train_steps="2600" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory /Users/aniketsharma/Documents/Sharma/image/100_heer contains 26 image files
2600 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: True

[Subset 0 of Dataset 0]
image_dir: "/Users/aniketsharma/Documents/Sharma/image/100_heer"
image_count: 26
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: heer
caption_extension: .caption

[Dataset 0]
loading image sizes.
100%|██████████████████████████████████| 26/26 [00:00<00:00, 2822.98it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 2600
mean ar error (without repeats): 0.0
prepare accelerator
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir instead.
warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load Diffusers pretrained models: runwayml/stable-diffusion-v1-5
Fetching 15 files: 100%|██████████████| 15/15 [00:00<00:00, 12036.46it/s]
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/safetensors/torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(filename, framework="pt", device=device) as f:
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = cls(wrap_storage=untyped_storage)
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(checkpoint_file, framework="pt") as f:
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
CrossAttention.forward has been replaced to enable xformers.
[Dataset 0]
caching latents.
0%| | 0/26 [00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type c10::TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype.
Exception raised from getMPSScalarType at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/OperationUtils.mm:91 (most recent call first):
frame #0: at::native::mps::getMPSScalarType(c10::ScalarType) + 180 (0x1162f9278 in libtorch_cpu.dylib)
frame #1: invocation function for block in at::native::_mps_convolution_impl(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long long, c10::optional<c10::ArrayRef>) + 592 (0x116323f40 in libtorch_cpu.dylib)
frame #2: invocation function for block in at::native::mps::MPSGraphCache::CreateCachedGraph(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, at::native::mps::MPSCachedGraph* () block_pointer) + 216 (0x11630f1f8 in libtorch_cpu.dylib)
frame #3: _dispatch_client_callout + 20 (0x1a1330400 in libdispatch.dylib)
frame #4: _dispatch_lane_barrier_sync_invoke_and_complete + 56 (0x1a133f97c in libdispatch.dylib)
frame #5: at::native::mps::MPSGraphCache::CreateCachedGraph(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, at::native::mps::MPSCachedGraph* () block_pointer) + 160 (0x1162fd304 in libtorch_cpu.dylib)
frame #6: at::native::_mps_convolution_impl(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long long, c10::optional<c10::ArrayRef>) + 3204 (0x116322850 in libtorch_cpu.dylib)
frame #7: at::native::_mps_convolution(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long long) + 40 (0x116324130 in libtorch_cpu.dylib)
frame #8: at::_ops::_mps_convolution::call(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long long) + 356 (0x112acef30 in libtorch_cpu.dylib)
frame #9: at::native::_convolution(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, long long, bool, bool, bool, bool) + 11884 (0x11206eaf4 in libtorch_cpu.dylib)
frame #10: at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd___convolution(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRefc10::SymInt, c10::ArrayRef, bool, c10::ArrayRefc10::SymInt, long long, bool, bool, bool, bool) + 184 (0x113247f30 in libtorch_cpu.dylib)
frame #11: at::_ops::_convolution::call(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRefc10::SymInt, c10::ArrayRef, bool, c10::ArrayRefc10::SymInt, long long, bool, bool, bool, bool) + 432 (0x1128cfd58 in libtorch_cpu.dylib)
frame #12: at::_convolution(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, long long, bool, bool, bool, bool) + 172 (0x112065ee4 in libtorch_cpu.dylib)
frame #13: at::native::convolution(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, bool, c10::ArrayRef, long long) + 288 (0x112065bc4 in libtorch_cpu.dylib)
frame #14: at::_ops::convolution::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRefc10::SymInt, c10::ArrayRef, bool, c10::ArrayRefc10::SymInt, long long) + 184 (0x1128cf538 in libtorch_cpu.dylib)
frame #15: torch::autograd::VariableType::(anonymous namespace)::convolution(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRefc10::SymInt, c10::ArrayRef, bool, c10::ArrayRefc10::SymInt, long long) + 2160 (0x1146c3190 in libtorch_cpu.dylib)
frame #16: at::_ops::convolution::call(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRefc10::SymInt, c10::ArrayRef, bool, c10::ArrayRefc10::SymInt, long long) + 372 (0x1128cf094 in libtorch_cpu.dylib)
frame #17: at::native::conv2d(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long long) + 880 (0x11205cbd0 in libtorch_cpu.dylib)
frame #18: at::_ops::conv2d::call(at::Tensor const&, at::Tensor const&, c10::optionalat::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long long) + 356 (0x112f10ea4 in libtorch_cpu.dylib)
frame #19: torch::autograd::THPVariable_conv2d(_object*, _object*, _object*) + 1088 (0x10690c3e0 in libtorch_python.dylib)
frame #20: cfunction_call + 60 (0x104787b24 in Python)
frame #21: _PyObject_MakeTpCall + 136 (0x10473669c in Python)
frame #22: call_function + 272 (0x104830d0c in Python)
frame #23: _PyEval_EvalFrameDefault + 43104 (0x10482e494 in Python)
frame #24: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #25: method_vectorcall + 124 (0x104739904 in Python)
frame #26: call_function + 128 (0x104830c7c in Python)
frame #27: _PyEval_EvalFrameDefault + 43104 (0x10482e494 in Python)
frame #28: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #29: method_vectorcall + 288 (0x1047399a8 in Python)
frame #30: _PyEval_EvalFrameDefault + 43560 (0x10482e65c in Python)
frame #31: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #32: _PyObject_FastCallDictTstate + 96 (0x104736904 in Python)
frame #33: slot_tp_call + 196 (0x1047ab664 in Python)
frame #34: _PyObject_MakeTpCall + 136 (0x10473669c in Python)
frame #35: call_function + 272 (0x104830d0c in Python)
frame #36: _PyEval_EvalFrameDefault + 43104 (0x10482e494 in Python)
frame #37: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #38: method_vectorcall + 288 (0x1047399a8 in Python)
frame #39: _PyEval_EvalFrameDefault + 43560 (0x10482e65c in Python)
frame #40: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #41: _PyObject_FastCallDictTstate + 96 (0x104736904 in Python)
frame #42: slot_tp_call + 196 (0x1047ab664 in Python)
frame #43: _PyObject_MakeTpCall + 136 (0x10473669c in Python)
frame #44: call_function + 272 (0x104830d0c in Python)
frame #45: _PyEval_EvalFrameDefault + 43104 (0x10482e494 in Python)
frame #46: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #47: method_vectorcall + 124 (0x104739904 in Python)
frame #48: call_function + 128 (0x104830c7c in Python)
frame #49: _PyEval_EvalFrameDefault + 43104 (0x10482e494 in Python)
frame #50: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #51: call_function + 128 (0x104830c7c in Python)
frame #52: _PyEval_EvalFrameDefault + 42984 (0x10482e41c in Python)
frame #53: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #54: call_function + 128 (0x104830c7c in Python)
frame #55: _PyEval_EvalFrameDefault + 42984 (0x10482e41c in Python)
frame #56: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #57: call_function + 128 (0x104830c7c in Python)
frame #58: _PyEval_EvalFrameDefault + 43144 (0x10482e4bc in Python)
frame #59: _PyEval_Vector + 376 (0x1048229e0 in Python)
frame #60: PyEval_EvalCode + 104 (0x104822854 in Python)
frame #61: run_eval_code_obj + 84 (0x10487e8e8 in Python)
frame #62: run_mod + 112 (0x10487e84c in Python)
frame #63: pyrun_file + 148 (0x10487e4e0 in Python)

Traceback (most recent call last):
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 923, in launch_command
simple_launcher(args)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/aniketsharma/Documents/taining/kohya_ss/venv/bin/python', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/Users/aniketsharma/Documents/Sharma/image', '--resolution=512,512', '--output_dir=/Users/aniketsharma/Documents/Sharma/model', '--logging_dir=/Users/aniketsharma/Documents/Sharma/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=260', '--train_batch_size=1', '--max_train_steps=2600', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' died with <Signals.SIGABRT: 6>.
/opt/homebrew/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

@Vadim170
Copy link

Vadim170 commented Jun 8, 2023

@Aniket22156 The problem is not solved?

@Aniket22156
Copy link
Author

@Aniket22156 The problem is not solved?

no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants