can't reproduce the results #8

andorxornot · 2022-08-24T11:47:58Z

hi! i trained ldm with three images and the token "container":

training takes lasted a few hours, the loss jumps, but i got exactly the same result as without training:

the config is loaded correctly. are there any logs besides the loss?

rinongal · 2022-08-24T12:16:22Z

What text are you using for inference?
Unless you changed the config, the placeholder word for your concept is *, so your sentences should be of the form: "a photo of *" (and not "a photo of a container")

andorxornot · 2022-08-24T12:47:12Z

yeah, i used a photo of * prompt , but got the container

rinongal · 2022-08-24T12:49:45Z

Can you please:
(1) Post your full inference command?
(2) Check your logs folder images to see if the samples_scaled_gs images look like your input data?

andorxornot · 2022-08-24T14:23:07Z

hm, \logs\images...\testtube\version_0\media is empty for me, there are no images

train :

main.py
--data_root ./images
--base ./configs/latent-diffusion/txt2img-1p4B-finetune.yaml
-t
-n run_01
--actual_resume ./models/ldm/text2img-large/model.ckpt
--init_word container
--gpus 0

inference:

scripts/txt2img.py
--ddim_eta 0.0
--n_samples 3
--n_iter 2
--scale 10.0
--ddim_steps 50
--embedding_path ./logs/images2022-08-23T21-03-11_run_01/checkpoints/embeddings.pt
--ckpt ./models/ldm/text2img-large/model.ckpt
--prompt "a photo of *"

rinongal · 2022-08-24T14:33:17Z

The images should be in your ./logs/images2022-08-23T21-03-11_run_01/images/ directory.
Either way, when you run txt2img, try to run with:
--embedding_path ./logs/images2022-08-23T21-03-11_run_01/checkpoints/embeddings_gs-5xxx.pt where 5xxx is whatever checkpoint you have there which is closest to 5k.

johndpope · 2022-08-24T15:14:01Z

fyi - I've got it working and I'm very impressed - I'm interested to know how to boost quality / dimensions of output.. have to dig into docs.
HOW TO

I used this to scrape google images
https://chrome.google.com/webstore/detail/dbjbempljhcmhlfpfacalomonjpalpko
I searched for a famous photographer - gregory_crewdson

I train all his photos as "cinematic"
python main.py --base configs/latent-diffusion/txt2img-1p4B-finetune.yaml -t --actual_resume ../stable-diffusion/models/ldm/text2img-large/model.ckpt -n leavanny_attempt_one --gpus 0, --data_root "/home/jp/Downloads/ImageAssistant_Batch_Image_Downloader/www.google.com/gregory_crewdson_-_Google_Search" --init_word=cinematic
(I gave up at 10,000 training iterations.)

I can then prime it with

 photo of * 
 pixelart or * 
 watercolor of * 

python scripts/txt2img.py --ddim_eta 0.0 \
                          --n_samples 8 \
                          --n_iter 2 \
                          --scale 10.0 \
                          --ddim_steps 50 \
                          --embedding_path /home/jp/Documents/gitWorkspace/textual_inversion/logs/gregory_crewdson_-_Google_Search2022-08-24T23-09-43_leavanny_attempt_one/checkpoints/embeddings_gs-9999.pt \
                          --ckpt_path ../stable-diffusion/models/ldm/text2img-large/model.ckpt \
                          --prompt "pixelart of *"

rinongal · 2022-08-24T15:53:15Z

@johndpope Glad to see some positive results 😄
Regarding quality / dimensions: I'm still working on the Stable Diffusion port which will probably help with that. At the moment inversion is working fairly well, but I'm having some trouble finding a 'sweet spot' where editing (by reusing * in new prompts) works as expected. It might require moving beyond just parameter changes.

As a temporary alternative, you should be able to just invert these results into the stable diffusion model and let it come up with new variations at a higher resolution (using just 'a photo of *').

XavierXiao · 2022-08-24T16:19:08Z

Hi, when I train the embedding and run the generation command, I can obtain samples that shares some high-level similarity with my training inputs, however, they still look quite different in details (far less similar than the demo images in paper). Given that the reconstruction is perfect, is there a way to control the variation and let the generated samples look more similar to the inputs? Thanks!

rinongal · 2022-08-24T16:49:05Z

@XavierXiao First of all, just to make sure, you're using the LDM version, yes?

If that's the case, then you have several options:

Re-invert with a higher learning rate (e.g. edit the learning rate in the config to 1.0e-2). The higher the learning rate, the higher the image similarity after editing, but more prompts will fail to change the image at all.
Try to re-invert with another seed (using the --seed argument). Unfortunately sometimes the optimization just falls into a bad spot.
Try the same prompt engineering tricks you'd try with text. For example, use the placeholder several times ("a photo of * on the beach. A * on the beach").

Other than that, you'll see in our paper that we report that the results are typically 'best of 16'. There are certainly cases where only 3-4 images out of a batch of 16 were 'good'. And of course like with all txt2img models, some prompts just don't work.

If you can show me some examples, I could maybe point you towards specific solutions.

XavierXiao · 2022-08-24T17:01:25Z

Thanks! I am using LDM version, with default setting in readme. I will give a try on things you mentioned, especially the lr. Here are some examples. I am trying to invert some images in MVTec for industrial quality inspection, and I attached the input (some capsules) and generated samples at 5k steps. Does this look reasonable? The inputs have very few variations (they look very similar to each other), is that the possible cause?

rinongal · 2022-08-24T17:28:57Z

The one on the right is more or less what I'd expect to get. If you're still having bad results during training, then seed changes etc. probably won't help. Either increase LR, or have a look at the output images and see if there's still progress, in which case you can probably just train for more time.

I'll try a run myself and see what I can get.

rinongal · 2022-08-24T18:13:41Z

@andorxornot This is what I get with your data:

Training outputs (@5k):

Watercolor painting of *:

A photo of * on the beach:

rinongal · 2022-08-24T18:21:28Z

@XavierXiao I cropped out and trained on these 2 samples from your image:

Current outputs @4k steps with default parameters:

If you're using the default parameters but only 1 GPU, the difference might be because the LDM training script automatically scales LR by your number of GPUs and the batch size. Your effective LR is half of mine, which might be causing the difference. Can you try training with double the LR and letting me know if that improves things? If so, I might need to disable this scaling by default / add a warning to the readme.

XavierXiao · 2022-08-24T19:09:05Z

Thanks for the reply. I am using two GPUs so that shouldn't be an issue. I tried larger LR but it is hard to say whether I obtain improvements. I can obtain similar results as yours. Obviously the resulting images are less realistic than the trash container examples in the same thread, so maybe the input images are less familiar for the LDM model.

Some maybe unrelated things:

I got the following warning after every epoch, is that expected?

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

In the first epoch I got the following warning

home/.conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 20. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

I use the default config with bs=4, and I have 8 training images. Not sure what caused this.

In one over-night run, it seems like if we don't manually kill the process, it will run 1000 epochs, which is the max of pytorch lighting. So the max_step = 6100 is not effective?

rinongal · 2022-08-24T19:41:45Z

@XavierXiao Warnings should both be fine.
max_step: It should be working. I'll look into it.

andorxornot · 2022-08-24T20:07:04Z

thanks for your tests! it seems that for one machine i had to raise the lr ten times

rinongal · 2022-08-24T20:17:52Z

@andorxornot Well, if everything's working now, feel free to close the issue 😄 Otherwise let me know if you need more help

XodrocSO · 2022-08-25T04:27:05Z

@rinongal I think I'm having a similar issue but not familiar with the format of the learning rate in order to increase it.

EDIT: noticed I'm getting "RuntimeWarning: You are using LearningRateMonitor callback with models that have no learning rate schedulers. Please see documentation for configure_optimizers method.
rank_zero_warn(" from pytorch lightning lr_monitor.py

Also this is trying to use the stable diffusion v1_finetune.yaml and my samples_scaled all just look like noise at and well after 5000 global steps. Loss is pretty much staying at =1 or 0.99

I'll create a new issue if need be.

rinongal · 2022-08-25T09:04:38Z

@XodrocSO I think it might be worth a new issue, but when you open it could you please:

Check the input and reconstruction images in your log directory to see that they look fine.
Paste the config file you're using and let me know if you're using the official repo or some re-implementation and whether you changed anything else.
Upload an example of your current samples_scaled results.

Hopefully that will be enough to get started on figuring out the problem :)

Zxlan · 2023-05-23T11:34:56Z

@andorxornot Would it be convenient for you to share your images?

yuhbai · 2023-10-04T02:25:29Z

Thanks! I am using LDM version, with default setting in readme. I will give a try on things you mentioned, especially the lr. Here are some examples. I am trying to invert some images in MVTec for industrial quality inspection, and I attached the input (some capsules) and generated samples at 5k steps. Does this look reasonable? The inputs have very few variations (they look very similar to each other), is that the possible cause?

What is the effect of the image you generated later?

andorxornot closed this as completed Aug 25, 2022

johndpope mentioned this issue Aug 25, 2022

Is there a way to wire this up to ffhq dataset? #11

Open

underlines mentioned this issue Sep 23, 2022

Implement new paper: Dreambooth-StableDiffusion, Google Imagen based Textual Inversion alternative AUTOMATIC1111/stable-diffusion-webui#914

Closed

NamburiSrinath mentioned this issue Oct 25, 2022

Finetuning for multiple classes #114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't reproduce the results #8

can't reproduce the results #8

andorxornot commented Aug 24, 2022 •

edited

Loading

rinongal commented Aug 24, 2022

andorxornot commented Aug 24, 2022

rinongal commented Aug 24, 2022

andorxornot commented Aug 24, 2022 •

edited

Loading

rinongal commented Aug 24, 2022

johndpope commented Aug 24, 2022 •

edited

Loading

rinongal commented Aug 24, 2022

XavierXiao commented Aug 24, 2022

rinongal commented Aug 24, 2022

XavierXiao commented Aug 24, 2022 •

edited

Loading

rinongal commented Aug 24, 2022 •

edited

Loading

rinongal commented Aug 24, 2022

rinongal commented Aug 24, 2022

XavierXiao commented Aug 24, 2022

rinongal commented Aug 24, 2022

andorxornot commented Aug 24, 2022

rinongal commented Aug 24, 2022

XodrocSO commented Aug 25, 2022 •

edited

Loading

rinongal commented Aug 25, 2022

Zxlan commented May 23, 2023

yuhbai commented Oct 4, 2023

can't reproduce the results #8

can't reproduce the results #8

Comments

andorxornot commented Aug 24, 2022 • edited Loading

rinongal commented Aug 24, 2022

andorxornot commented Aug 24, 2022

rinongal commented Aug 24, 2022

andorxornot commented Aug 24, 2022 • edited Loading

rinongal commented Aug 24, 2022

johndpope commented Aug 24, 2022 • edited Loading

rinongal commented Aug 24, 2022

XavierXiao commented Aug 24, 2022

rinongal commented Aug 24, 2022

XavierXiao commented Aug 24, 2022 • edited Loading

rinongal commented Aug 24, 2022 • edited Loading

rinongal commented Aug 24, 2022

rinongal commented Aug 24, 2022

XavierXiao commented Aug 24, 2022

rinongal commented Aug 24, 2022

andorxornot commented Aug 24, 2022

rinongal commented Aug 24, 2022

XodrocSO commented Aug 25, 2022 • edited Loading

rinongal commented Aug 25, 2022

Zxlan commented May 23, 2023

yuhbai commented Oct 4, 2023

andorxornot commented Aug 24, 2022 •

edited

Loading

andorxornot commented Aug 24, 2022 •

edited

Loading

johndpope commented Aug 24, 2022 •

edited

Loading

XavierXiao commented Aug 24, 2022 •

edited

Loading

rinongal commented Aug 24, 2022 •

edited

Loading

XodrocSO commented Aug 25, 2022 •

edited

Loading