Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt-Tuning for text-to-image diffusion models #2085

Closed
AHHHZ975 opened this issue Sep 22, 2024 · 10 comments
Closed

Prompt-Tuning for text-to-image diffusion models #2085

AHHHZ975 opened this issue Sep 22, 2024 · 10 comments

Comments

@AHHHZ975
Copy link

AHHHZ975 commented Sep 22, 2024

Hi, I have been looking for a simple example/script that shows how I can use the prompt-tuning technique in the PEFT library to fine-tune the text encoder of a stable diffusion model. But I could not find any. Could you please introduce me if there is already one? If there is no implementation, I would appreciate any help/available resources for fine-tuning the text encoder with or without the PEFT library. Thanks!

@BenjaminBossan
Copy link
Member

I'm not an expert on stable diffusion, but AFAIK, there is no special handling required to fine-tune the text encoder when it comes to PEFT itself. You can use LoRA or any of the other techniques that are implemented. In case the text encoder is using OpenClip or a similar architecture, you'll have to work based on the branch from #1324, as the MultiheadAttention layer is being used and the PR to support it is not merged yet.

When it comes to details like datasets and objectives fro training the text encoder, this is outside my domain and you'll have better chance looking at how other folks fine-tune the text encoder.

@AHHHZ975
Copy link
Author

AHHHZ975 commented Sep 23, 2024

Thank you for sharing your knowledge/experience and that branch. So, from what I understood, it seems the only implementation under the umbrella of PEFT methods is LoRA for the CLIP text encoder in stable diffusion (not Prompt-tuning, P-tuning, and Prefix-tuning). Please correct me if I'm wrong.

Also, do you have any plan, at this time or in the future, to support the other three methods under the PEFT methods (Prompt-tuning, P-tuning, and Prefix-Tuning) for stable diffusion (or equally for its CLIP text encoder as those methods are related to the text input prompt) similar to what has already been implemented for the domain of LLMs?

@BenjaminBossan
Copy link
Member

You should be able to use prompt learning techniques such as prompt-tuning too. What I meant is that methods not based on prompt learning, such as LoRA, IA³, BOFT, etc. cannot be used on MultiheadAttention layers, but for LoRA, there is a branch that implements it.

@AHHHZ975
Copy link
Author

AHHHZ975 commented Sep 24, 2024

Yes, I got it and I have also read this discussion 761, and thank you for the great contribution on that.

However, what matters now for me is that: I want to know whether there is any prompt-tuning implementation (or any simple example would suffice) that shows how to do the prompt-tuning in the peft library to fine-tune the text encoder existing in the stable diffusion pipeline (e.g. CompVis/stable-diffusion-v1-4). More specifically, I know that the peft implementation gives me several TaskTypes here to fine-tune several types/categories of language models. But, honestly, as I am not an expert in language models, I am not sure the text-encoder in the diffusion pipeline (which is the CLIP) lies in which TaskTypes are mentioned above. So, as I could not find any resources/implementation on that, I am looking for a simple example to fine-tune the CLIP of the diffusion pipeline using the existing implementation of the peft library. I hope that I asked my question more clearly now.

@BenjaminBossan
Copy link
Member

Unfortunately, I also never came across a use case to fine tine the LM of a SD model and there are no examples I'm aware of. Note that TaskType is optional, so even if your task is not listed, you can still use PEFT. If you have an existing example of fine-tuning the LM part of a SD model and want to adapt it to PEFT, that would be very helpful. I could check that and see what needs to be changed.

@AHHHZ975
Copy link
Author

Okay, it will be very helpful that at least there is a way of doing such a use case with PEFT. Thank you for putting time into this case and I will be waiting for your update. Please also let me know if you need more info or anything from my side.

@AHHHZ975
Copy link
Author

Hi @BenjaminBossan, I wanted to kindly know if there is any update on this issue. Thanks!

@AHHHZ975
Copy link
Author

As an update: I need to do something similar to the following simple script using the PEFT library but I'm not sure what task type and what other changes need to be made in this script:

from peft import get_peft_model, PromptTuningConfig, TaskType
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")


peft_config = PromptTuningConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    num_virtual_tokens=5,
    token_dim=768,
    num_layers=12,
    tokenizer_name_or_path="CompVis/stable-diffusion-v1-4"
)

# Apply PEFT to the model
model = get_peft_model(pipe.text_encoder, peft_config)

@BenjaminBossan
Copy link
Member

Note that you don't need to indicate a task type if the task you're training does not correspond to any of the existing ones. As to the rest, it really depends on the data you have, the training objective, etc. If you have an existing example that you want to modify to use PEFT, you can share it here and I can check.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants