-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDXL training cannot continue from latest #851
Comments
can you check main? |
(#851) remove shard merge code on load hook
yes main is working fine. I was on release. |
well, somehow I keep getting OOM at 1002 step after resume from 1000 step. I was hitting 2000 steps before, but I was on release, so something in main must have increase vram usage. I can't change batch size when resuming correct? |
you can change batch size at any time |
did you have quanto enabled before? |
no quanto, I was doing full finetune. |
Good to know. I remember you said learning rate and schedule is not changable? What if learning rate is set to linear or sine? |
2024-08-23 07:26:53,615 [INFO] (main) Resuming from checkpoint checkpoint-1000
Could not load model: 'Namespace' object has no attribute 'unet', traceback: Traceback (most recent call last):
File "/root/autodl-tmp/SimpleTuner/helpers/training/save_hooks.py", line 429, in _load_full_model
if self.args.controlnet or self.args.unet:
AttributeError: 'Namespace' object has no attribute 'unet'
Traceback (most recent call last):
File "/root/autodl-tmp/SimpleTuner/train.py", line 2490, in
main()
File "/root/autodl-tmp/SimpleTuner/train.py", line 1225, in main
accelerator.load_state(os.path.join(args.output_dir, path))
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/accelerator.py", line 3131, in load_state
hook(models, input_dir)
File "/root/autodl-tmp/SimpleTuner/helpers/training/save_hooks.py", line 469, in load_model_hook
self._load_full_model(models=models, input_dir=input_dir)
File "/root/autodl-tmp/SimpleTuner/helpers/training/save_hooks.py", line 452, in _load_full_model
raise Exception(return_exception)
Exception: Could not load model: 'Namespace' object has no attribute 'unet', traceback: Traceback (most recent call last):
File "/root/autodl-tmp/SimpleTuner/helpers/training/save_hooks.py", line 429, in _load_full_model
if self.args.controlnet or self.args.unet:
AttributeError: 'Namespace' object has no attribute 'unet'
The text was updated successfully, but these errors were encountered: