-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't reproduce the results #8
Comments
What text are you using for inference? |
yeah, i used |
Can you please: |
hm, train :
inference:
|
The images should be in your |
fyi - I've got it working and I'm very impressed - I'm interested to know how to boost quality / dimensions of output.. have to dig into docs.
I train all his photos as "cinematic" I can then prime it with photo of *
pixelart or *
watercolor of *
python scripts/txt2img.py --ddim_eta 0.0 \
--n_samples 8 \
--n_iter 2 \
--scale 10.0 \
--ddim_steps 50 \
--embedding_path /home/jp/Documents/gitWorkspace/textual_inversion/logs/gregory_crewdson_-_Google_Search2022-08-24T23-09-43_leavanny_attempt_one/checkpoints/embeddings_gs-9999.pt \
--ckpt_path ../stable-diffusion/models/ldm/text2img-large/model.ckpt \
--prompt "pixelart of *" |
@johndpope Glad to see some positive results 😄 As a temporary alternative, you should be able to just invert these results into the stable diffusion model and let it come up with new variations at a higher resolution (using just 'a photo of *'). |
Hi, when I train the embedding and run the generation command, I can obtain samples that shares some high-level similarity with my training inputs, however, they still look quite different in details (far less similar than the demo images in paper). Given that the reconstruction is perfect, is there a way to control the variation and let the generated samples look more similar to the inputs? Thanks! |
@XavierXiao First of all, just to make sure, you're using the LDM version, yes? If that's the case, then you have several options:
Other than that, you'll see in our paper that we report that the results are typically 'best of 16'. There are certainly cases where only 3-4 images out of a batch of 16 were 'good'. And of course like with all txt2img models, some prompts just don't work. If you can show me some examples, I could maybe point you towards specific solutions. |
Thanks! I am using LDM version, with default setting in readme. I will give a try on things you mentioned, especially the lr. Here are some examples. I am trying to invert some images in MVTec for industrial quality inspection, and I attached the input (some capsules) and generated samples at 5k steps. Does this look reasonable? The inputs have very few variations (they look very similar to each other), is that the possible cause? |
The one on the right is more or less what I'd expect to get. If you're still having bad results during training, then seed changes etc. probably won't help. Either increase LR, or have a look at the output images and see if there's still progress, in which case you can probably just train for more time. I'll try a run myself and see what I can get. |
@andorxornot This is what I get with your data: Training outputs (@5k): Watercolor painting of *: A photo of * on the beach: |
@XavierXiao I cropped out and trained on these 2 samples from your image: Current outputs @4k steps with default parameters: If you're using the default parameters but only 1 GPU, the difference might be because the LDM training script automatically scales LR by your number of GPUs and the batch size. Your effective LR is half of mine, which might be causing the difference. Can you try training with double the LR and letting me know if that improves things? If so, I might need to disable this scaling by default / add a warning to the readme. |
Thanks for the reply. I am using two GPUs so that shouldn't be an issue. I tried larger LR but it is hard to say whether I obtain improvements. I can obtain similar results as yours. Obviously the resulting images are less realistic than the trash container examples in the same thread, so maybe the input images are less familiar for the LDM model. Some maybe unrelated things:
I use the default config with bs=4, and I have 8 training images. Not sure what caused this.
|
@XavierXiao Warnings should both be fine. |
thanks for your tests! it seems that for one machine i had to raise the lr ten times |
@andorxornot Well, if everything's working now, feel free to close the issue 😄 Otherwise let me know if you need more help |
@rinongal I think I'm having a similar issue but not familiar with the format of the learning rate in order to increase it. EDIT: noticed I'm getting "RuntimeWarning: You are using Also this is trying to use the stable diffusion v1_finetune.yaml and my samples_scaled all just look like noise at and well after 5000 global steps. Loss is pretty much staying at =1 or 0.99 I'll create a new issue if need be. |
@XodrocSO I think it might be worth a new issue, but when you open it could you please:
Hopefully that will be enough to get started on figuring out the problem :) |
@andorxornot Would it be convenient for you to share your images? |
|
hi! i trained ldm with three images and the token "container":
training takes lasted a few hours, the loss jumps, but i got exactly the same result as without training:
the config is loaded correctly. are there any logs besides the loss?
The text was updated successfully, but these errors were encountered: