Prompt-Tuning for text-to-image diffusion models #2085

AHHHZ975 · 2024-09-22T18:31:08Z

Hi, I have been looking for a simple example/script that shows how I can use the prompt-tuning technique in the PEFT library to fine-tune the text encoder of a stable diffusion model. But I could not find any. Could you please introduce me if there is already one? If there is no implementation, I would appreciate any help/available resources for fine-tuning the text encoder with or without the PEFT library. Thanks!

BenjaminBossan · 2024-09-23T12:57:12Z

I'm not an expert on stable diffusion, but AFAIK, there is no special handling required to fine-tune the text encoder when it comes to PEFT itself. You can use LoRA or any of the other techniques that are implemented. In case the text encoder is using OpenClip or a similar architecture, you'll have to work based on the branch from #1324, as the MultiheadAttention layer is being used and the PR to support it is not merged yet.

When it comes to details like datasets and objectives fro training the text encoder, this is outside my domain and you'll have better chance looking at how other folks fine-tune the text encoder.

AHHHZ975 · 2024-09-23T23:23:06Z

Thank you for sharing your knowledge/experience and that branch. So, from what I understood, it seems the only implementation under the umbrella of PEFT methods is LoRA for the CLIP text encoder in stable diffusion (not Prompt-tuning, P-tuning, and Prefix-tuning). Please correct me if I'm wrong.

Also, do you have any plan, at this time or in the future, to support the other three methods under the PEFT methods (Prompt-tuning, P-tuning, and Prefix-Tuning) for stable diffusion (or equally for its CLIP text encoder as those methods are related to the text input prompt) similar to what has already been implemented for the domain of LLMs?

BenjaminBossan · 2024-09-24T09:45:05Z

You should be able to use prompt learning techniques such as prompt-tuning too. What I meant is that methods not based on prompt learning, such as LoRA, IA³, BOFT, etc. cannot be used on MultiheadAttention layers, but for LoRA, there is a branch that implements it.

AHHHZ975 · 2024-09-24T15:44:03Z

Yes, I got it and I have also read this discussion 761, and thank you for the great contribution on that.

However, what matters now for me is that: I want to know whether there is any prompt-tuning implementation (or any simple example would suffice) that shows how to do the prompt-tuning in the peft library to fine-tune the text encoder existing in the stable diffusion pipeline (e.g. CompVis/stable-diffusion-v1-4). More specifically, I know that the peft implementation gives me several TaskTypes here to fine-tune several types/categories of language models. But, honestly, as I am not an expert in language models, I am not sure the text-encoder in the diffusion pipeline (which is the CLIP) lies in which TaskTypes are mentioned above. So, as I could not find any resources/implementation on that, I am looking for a simple example to fine-tune the CLIP of the diffusion pipeline using the existing implementation of the peft library. I hope that I asked my question more clearly now.

BenjaminBossan · 2024-09-25T09:16:51Z

Unfortunately, I also never came across a use case to fine tine the LM of a SD model and there are no examples I'm aware of. Note that TaskType is optional, so even if your task is not listed, you can still use PEFT. If you have an existing example of fine-tuning the LM part of a SD model and want to adapt it to PEFT, that would be very helpful. I could check that and see what needs to be changed.

AHHHZ975 · 2024-09-25T10:53:57Z

Okay, it will be very helpful that at least there is a way of doing such a use case with PEFT. Thank you for putting time into this case and I will be waiting for your update. Please also let me know if you need more info or anything from my side.

AHHHZ975 · 2024-09-26T18:45:12Z

Hi @BenjaminBossan, I wanted to kindly know if there is any update on this issue. Thanks!

AHHHZ975 · 2024-09-26T19:12:17Z

As an update: I need to do something similar to the following simple script using the PEFT library but I'm not sure what task type and what other changes need to be made in this script:

from peft import get_peft_model, PromptTuningConfig, TaskType
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")


peft_config = PromptTuningConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    num_virtual_tokens=5,
    token_dim=768,
    num_layers=12,
    tokenizer_name_or_path="CompVis/stable-diffusion-v1-4"
)

# Apply PEFT to the model
model = get_peft_model(pipe.text_encoder, peft_config)

BenjaminBossan · 2024-09-27T11:44:47Z

Note that you don't need to indicate a task type if the task you're training does not correspond to any of the existing ones. As to the rest, it really depends on the data you have, the training objective, etc. If you have an existing example that you want to modify to use PEFT, you can share it here and I can check.

github-actions · 2024-10-23T15:03:54Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions bot closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt-Tuning for text-to-image diffusion models #2085

Prompt-Tuning for text-to-image diffusion models #2085

AHHHZ975 commented Sep 22, 2024 •

edited

Loading

BenjaminBossan commented Sep 23, 2024

AHHHZ975 commented Sep 23, 2024 •

edited

Loading

BenjaminBossan commented Sep 24, 2024

AHHHZ975 commented Sep 24, 2024 •

edited

Loading

BenjaminBossan commented Sep 25, 2024

AHHHZ975 commented Sep 25, 2024

AHHHZ975 commented Sep 26, 2024

AHHHZ975 commented Sep 26, 2024

BenjaminBossan commented Sep 27, 2024

github-actions bot commented Oct 23, 2024

Prompt-Tuning for text-to-image diffusion models #2085

Prompt-Tuning for text-to-image diffusion models #2085

Comments

AHHHZ975 commented Sep 22, 2024 • edited Loading

BenjaminBossan commented Sep 23, 2024

AHHHZ975 commented Sep 23, 2024 • edited Loading

BenjaminBossan commented Sep 24, 2024

AHHHZ975 commented Sep 24, 2024 • edited Loading

BenjaminBossan commented Sep 25, 2024

AHHHZ975 commented Sep 25, 2024

AHHHZ975 commented Sep 26, 2024

AHHHZ975 commented Sep 26, 2024

BenjaminBossan commented Sep 27, 2024

github-actions bot commented Oct 23, 2024

AHHHZ975 commented Sep 22, 2024 •

edited

Loading

AHHHZ975 commented Sep 23, 2024 •

edited

Loading

AHHHZ975 commented Sep 24, 2024 •

edited

Loading