Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IP-adapter support for stable diffusion #766

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

JingyaHuang
Copy link
Collaborator

@JingyaHuang JingyaHuang commented Jan 23, 2025

What does this PR do?

This adds support for Stable Diffusion IP-adapters.

IP-adapter (Image Prompt adapter) is a Stable Diffusion add-on for using images as prompts, similar to Midjourney and DaLLE 3. You can use it to copy the style, composition, or a face in the reference image.

Fixes #718

  • Export of sd models loaded with loaded IP adapter weights + image encoder
  • Ensure the caching works
  • Inference: add image encoder to the pipelines
  • Export via CLI
optimum-cli export neuron --model stable-diffusion-v1-5/stable-diffusion-v1-5 
                          --ip_adapter_ids h94/IP-Adapter 
                          --ip_adapter_subfolders models
                          --ip_adapter_weight_names ip-adapter-full-face_sd15.bin
                          --ip_adapter_scales 0.5
                          --batch_size 1 --height 512 --width 512 --num_images_per_prompt 1
                          --auto_cast matmul --auto_cast_type bf16 ip_adapter_neuron/
  • Export via NeuronModel API
from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512}

stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
    model_id, 
    export=True, 
    ip_adapter_ids="h94/IP-Adapter",
    ip_adapter_subfolders="models",
    ip_adapter_weight_names="ip-adapter-full-face_sd15.bin",
    ip_adapter_scales=0.5,
    **compiler_args, 
    **input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = "ip_adapter_neuron/"
stable_diffusion.save_pretrained(save_directory)
  • Inference

    • With ip_adapter_image as input
    from optimum.neuron import NeuronStableDiffusionPipeline
    
    model_id = "runwayml/stable-diffusion-v1-5"
    compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
    input_shapes = {"batch_size": 1, "height": 512, "width": 512}
    
    stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
        model_id, 
        export=True, 
        ip_adapter_ids="h94/IP-Adapter",
        ip_adapter_subfolders="models",
        ip_adapter_weight_names="ip-adapter-full-face_sd15.bin",
        ip_adapter_scales=0.5,
        **compiler_args, 
        **input_shapes,
    )
    
    # Save locally or upload to the HuggingFace Hub
    save_directory = "ip_adapter_neuron/"
    stable_diffusion.save_pretrained(save_directory)
    • With ip_adapter_image_embeds as input (encode the image first)
    image_embeds = stable_diffusion.prepare_ip_adapter_image_embeds(
        ip_adapter_image=image,
        ip_adapter_image_embeds=None,
        device=None,
        num_images_per_prompt=1,
        do_classifier_free_guidance=True,
    )
    torch.save(image_embeds, "image_embeds.ipadpt")
    
    image_embeds = torch.load("image_embeds.ipadpt")
    images = stable_diffusion(
        prompt="a polar bear sitting in a chair drinking a milkshake",
        ip_adapter_image_embeds=image_embeds,
        negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
        num_inference_steps=100,
        generator=generator,
    ).images[0]
    
    image.save("polar_bear.png")

Next steps

  • Support multiple IP adapters
  • Ensure it works for sdxl
  • Extend the support for diffusion transformers
  • Documentation along with refactoring

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@JingyaHuang JingyaHuang changed the title Add IP-adapter support for stable diffusion (XL) Add IP-adapter support for stable diffusion Feb 5, 2025
optimum/commands/export/neuronx.py Show resolved Hide resolved
@@ -186,22 +186,21 @@ def validate_model_outputs(
reference_model.eval()
inputs = config.generate_dummy_inputs(return_tuple=False, **input_shapes)
ref_inputs = config.unflatten_inputs(inputs)
if hasattr(reference_model, "config") and getattr(reference_model.config, "is_encoder_decoder", False):
if hasattr(reference_model, "config") and getattr(config._config, "is_encoder_decoder", False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather obscure change: does the _config member come from the SD model ?

@@ -522,6 +543,10 @@ def load_models_and_neuron_configs(
torch_dtype: Optional[Union[str, torch.dtype]] = None,
tensor_parallel_size: int = 1,
controlnet_ids: Optional[Union[str, List[str]]] = None,
ip_adapter_ids: Optional[Union[str, List[str]]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has too many parameters: please create an intermediate dataclass that can be derived for each model subtype and pass it as parameter.
Example:

class ModelAddon(dataclass):
    pass

class LoraAddon(ModelAddon):
    lora_model_ids: Optional[Union[str, List[str]]]
    lora_weight_names: Optional[Union[str, List[str]]]
    lora_adapter_names: Optional[Union[str, List[str]]]
    lora_scales: Optional[Union[float, List[float]]]

class IPAdapterAddon(ModelAddon):
    ip_adapter_ids: Optional[Union[str, List[str]]] = None
    ip_adapter_subfolders: Optional[Union[str, List[str]]] = None
    ip_adapter_weight_names: Optional[Union[str, List[str]]] = None
    ip_adapter_scales: Optional[Union[float, List[float]]] = None

@@ -542,6 +567,10 @@ def load_models_and_neuron_configs(
}
if model is None:
model = TasksManager.get_model_from_task(**model_kwargs)
# Load IP-Adapter if it exists
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you could just test the type of the ModelAddon parameter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or do a try-except...

@@ -166,6 +167,8 @@ def __init__(
num_beams: Optional[int] = None,
vae_scale_factor: Optional[int] = None,
encoder_hidden_size: Optional[int] = None,
image_encoder_sequence_length: Optional[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also you should consider grouping parameters into dataclasses specific of each use case (not in this pull-request but in the upcoming refactoring one you mentioned).

@@ -846,6 +882,10 @@ def _export(
lora_adapter_names: Optional[Union[str, List[str]]] = None,
lora_scales: Optional[Union[float, List[float]]] = None,
controlnet_ids: Optional[Union[str, List[str]]] = None,
ip_adapter_ids: Optional[Union[str, List[str]]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here reuse the same dataclass.


return image_embeds, uncond_image_embeds

def prepare_ip_adapter_image_embeds(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this method called ? Is it part of an expected API ? If so, you should inherit from the corresponding abstract class.

@requires_neuronx
def test_export_with_dynamic_batch_size(self, test_name, name, model_name, task, neuron_config_constructor):
self._neuronx_export(test_name, name, model_name, task, neuron_config_constructor, dynamic_batch_size=True)
# class NeuronExportTestCase(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want to keep that commented out.

@@ -542,6 +567,10 @@ def load_models_and_neuron_configs(
}
if model is None:
model = TasksManager.get_model_from_task(**model_kwargs)
# Load IP-Adapter if it exists
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or do a try-except...

class CLIPVisionModelNeuronWrapper(torch.nn.Module):
def __init__(self, model, input_names: List[str]):
super().__init__()
self.model = model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be worth checking if model has vision_model and visual_projection attributes?

@@ -71,3 +71,65 @@ def _get_add_time_ids(self, *args, **kwargs):
raise ValueError(
f"The pipeline type {self.auto_model_class} is not yet supported by Optimum Neuron, please open an request on: https://github.com/huggingface/optimum-neuron/issues."
)


class NeuronIPAdapterMixin:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with IPAdapterMixin, right? Would it be worth mentioning where did you adapt it from, as you did with NeuronStableDiffusionXLPipelineMixin?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stablediffusion (sdxl) ip-adapter support
4 participants