-
-
Notifications
You must be signed in to change notification settings - Fork 447
LoRA
LoRA is a method used for quick fine-tuning of diffusion models - what you end up with after training is a much smaller 'model' that works with your other models
LoRA injects trainable layers to steer cross attention layers in multiple parts of the original
Originally introduced by Microsoft as way of tuning Large Language Models and later adopted for other use-cases
In case of Stable Diffusion, current LoRA implementations are capable of creating layers for actual diffusion model as well as unet de-noiser and text encoder
Currently most popular ways to train LoRA are:
-
https://github.com/cloneofsimo
Original adaptation of LoRA for Stable Diffusion -
https://huggingface.co/docs/diffusers/main/en/training/lora
Original author collaboration with Huggingface as LoRA support is added todiffusers
library -
https://github.com/kohya-ss/sd-scripts
Standalone script(s) Originally based on CloneofSimo's work, but has since been heavily modified -
https://github.com/d8ahazard/sd_dreambooth_extension
Extension for Automatic WebUI Ideas from CloneofSimo's work, but adapted to work with existingdreambooth
training workflow
How to use LoRA? Simply add it to your prompt and optionally use any activation tags/keywords:
photo of "sara" in the city <lora:lora-sara:1.0>
Where
-
sara
is activation tag set during model training -
lora-sara
is name of the LoRA model -
1.0
is activation strength
Chosen method in this repository is Kohya's and repository is registered as a submodule in /modules/lora
However, solution is heavily wrapped in custom pre-processing and post-processing scripts to make it work with existing training workflow
cli/train-lora.py
Steps:
- Pre-processed input images
- Prepare captions and tags
- Create metadata file
- Create VAE normalization latents
- Run actual training
Processing and training can be split into separate step allowing to batch-process and prepare multiple datasets before actual training For example,
- run
train-lora.py --notrain
and it will run only processing steps - run
train-lora.py --noprocess
and it will run only training steps
Pre-processing is a highly complex and customizable process performed by cli/modules/process.py
and includes number of optional operations (details)
Note that pre-processing requires WebUI Server to be running, as it uses existing models captioning, face restoration, etc.
However, that can cause memory issues as LoRA training is memory intensive
- run
train-lora.py --shutdown
and it will auto-shutdown WebUI server after processing and before training
There are large number of additional tunable parameters (although much less than underlying solution as many values are predetermined based on best practices), but minimum that should be provided is:
-
--model MODEL
: original model to use a base for training -
--input INPUT
: input folder with training images -
--output OUTPUT
: lora name -
--tag TAG
: primary tag word(s) that can be used for model activation in prompts -
--dir DIR
: folder containing lora checkpoints
Additionally, depending on your training dataset, you may want to adjust:
-
--steps STEPS
: total number of training steps
adjust based on size of your dataset as larger dataset requires more steps to train -
--dim DIM
: network dimension which is actual size of created LoRA
this determines its capacity to learn and should be proportional to size and complexity of training dataset
Example:
train-lora.py --model /models/stable-diffusion/sd-v15-runwayml.ckpt --dir /models/lora --tag sara --output lora-sara --input sara-images/ --dim 192 --steps 20000