finetune Stable Diffusion to medical images #11589
Replies: 2 comments 1 reply
-
Hi!, you're being to generic to be able to receive some help, you're doing something very specific but only providing some very basic information: For example:
with this we can try to guess something but essentially the only way to get real help is if you provide the full information of what you're doing. What I can say is that if you're looking to get something very real and detailed from a diffusion model that you can use in the real world you will need to do a full finetune of the model and with a lot more images than 2500, most full fine tunes use at least a couple of millions of images and even then, you will still have some artifacts and errors in the images like you can see in the finetuned models. If you just need some lower random quality generation from a specific class like "cancer" or similar, you can probably get do it with a lora like you're doing. |
Beta Was this translation helpful? Give feedback.
-
Hi @asomoza, thanks for your answer. Before SD I tried to train LDM (from the original GitHub repo of CompVis, not from HF) with class-conditioned on smaller dataset and after finetune the VAE and the UNET I got pretty god results so I hoped that I can at least get reasonable images from SD finetuning, maybe I should finetune also in this case (of SD instead of LDM) the UNET and VAE (and the text embedding? I'm not sure) before training the LoRA? if so, how I can do this? with the train_text_to_image.py script (without LoRA) script? thanks again |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I'm trying to fine tune SD model on custom medical dataset. The logic is to be able to create new samples from text prompt.
I have ~20 different classes and mask of the relevant part in the image that also determines the class of the image.
I generate ~30 generic caption sentences that I incorporate the class name into them along with the location and size based on the binary mask and from that I generate pairs of image-caption dataset (~2500 samples).
Next I'm using the train_text_to_image_lora.py script on this data but I'm getting invalid images.
what I'm doing wrong?
should I train LoRA per each class?
should I finetune the entire SD model before? the vae/tokenaizer/unet? I saw that they are frozen in the train_text_to_image_lora.py script.
do you know some hyperparameters that I should tweak?
Will appreciate any help.
thanks
Beta Was this translation helpful? Give feedback.
All reactions