-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ademamix mixing #11
Comments
Does it need higher VRAM? I dont know why, but it gets stuck after 2 steps. I can use the vanilla AdEMAmix with my 16G card on stable diffusion's lora training smoothly. |
Ah interesting, it shouldnt as we are only dividing but some value already existing and we arent making a deepcopy of it |
oh, maybe the precision? Can you make a 8-bit version, it seems that it is float32. bnb has a 8-bit implementation of the vanilla one: https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/optim/ademamix.py |
I modified line 88 of bnb under the # Update the EMAs section. \Lib\site-packages\bitsandbytes\optim\ademamix.py
|
I also noticed that adopt has been updated with add_clip to help stabilize the early stages. |
Try adding this. Cautious Optimizer (C-Optiom): Improving Training with One Line of Code And stableadamw Not sure if it's correct, but it seems pretty good by the training results. |
That's great! I try 60ep on my small datasets for sdxl lora training (also use with immiscible noise), it seems to be better than the 150ep version with old method. |
@And233 Thank you Another modification: Add automatic warm-up like RAdam. You can try using a higher learning rate combined with a decay scheduler (e.g., 1e-3 ). From my testing, it looks much better than before. The changes also start at line 88, to the return loss
|
Do you mean try a scheduler like CosineAnnealing? I used to set a 20% warm up and constant scheduler with a high lr(5e-3). Could this RAdam take place of warmup steps? |
RAdam automatically adjusts the learning rate to wait for an appropriate variance, so it should be able to replace warm-up. |
FYI, mixing with Ademamix gives great performances :
https://github.com/edmondja/AdEMAMix-ADOPT-Optimizer-Pytorch/blob/main/AdEMAMix-ADOPT.py
The text was updated successfully, but these errors were encountered: