Add IP-adapter support for stable diffusion #766

JingyaHuang · 2025-01-23T16:12:12Z

What does this PR do?

This adds support for Stable Diffusion IP-adapters.

IP-adapter (Image Prompt adapter) is a Stable Diffusion add-on for using images as prompts, similar to Midjourney and DaLLE 3. You can use it to copy the style, composition, or a face in the reference image.

Fixes #718

Export of sd models loaded with loaded IP adapter weights + image encoder
Ensure the caching works
Inference: add image encoder to the pipelines

Export via CLI

optimum-cli export neuron --model stable-diffusion-v1-5/stable-diffusion-v1-5 
                          --ip_adapter_ids h94/IP-Adapter 
                          --ip_adapter_subfolders models
                          --ip_adapter_weight_names ip-adapter-full-face_sd15.bin
                          --ip_adapter_scales 0.5
                          --batch_size 1 --height 512 --width 512 --num_images_per_prompt 1
                          --auto_cast matmul --auto_cast_type bf16 ip_adapter_neuron/

Export via NeuronModel API

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512}

stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
    model_id, 
    export=True, 
    ip_adapter_ids="h94/IP-Adapter",
    ip_adapter_subfolders="models",
    ip_adapter_weight_names="ip-adapter-full-face_sd15.bin",
    ip_adapter_scales=0.5,
    **compiler_args, 
    **input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = "ip_adapter_neuron/"
stable_diffusion.save_pretrained(save_directory)

Inference

With ip_adapter_image as input

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512}

stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
    model_id, 
    export=True, 
    ip_adapter_ids="h94/IP-Adapter",
    ip_adapter_subfolders="models",
    ip_adapter_weight_names="ip-adapter-full-face_sd15.bin",
    ip_adapter_scales=0.5,
    **compiler_args, 
    **input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = "ip_adapter_neuron/"
stable_diffusion.save_pretrained(save_directory)

With ip_adapter_image_embeds as input (encode the image first)

image_embeds = stable_diffusion.prepare_ip_adapter_image_embeds(
    ip_adapter_image=image,
    ip_adapter_image_embeds=None,
    device=None,
    num_images_per_prompt=1,
    do_classifier_free_guidance=True,
)
torch.save(image_embeds, "image_embeds.ipadpt")

image_embeds = torch.load("image_embeds.ipadpt")
images = stable_diffusion(
    prompt="a polar bear sitting in a chair drinking a milkshake",
    ip_adapter_image_embeds=image_embeds,
    negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
    num_inference_steps=100,
    generator=generator,
).images[0]

image.save("polar_bear.png")

Next steps

Support multiple IP adapters
Ensure it works for sdxl
Extend the support for diffusion transformers
Documentation along with refactoring

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2025-01-27T16:57:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/commands/export/neuronx.py

dacorvo · 2025-02-10T09:09:21Z

optimum/exporters/neuron/convert.py

@@ -186,22 +186,21 @@ def validate_model_outputs(
        reference_model.eval()
        inputs = config.generate_dummy_inputs(return_tuple=False, **input_shapes)
        ref_inputs = config.unflatten_inputs(inputs)
-        if hasattr(reference_model, "config") and getattr(reference_model.config, "is_encoder_decoder", False):
+        if hasattr(reference_model, "config") and getattr(config._config, "is_encoder_decoder", False):


This is a rather obscure change: does the _config member come from the SD model ?

dacorvo · 2025-02-10T09:12:14Z

optimum/exporters/neuron/__main__.py

@@ -522,6 +543,10 @@ def load_models_and_neuron_configs(
    torch_dtype: Optional[Union[str, torch.dtype]] = None,
    tensor_parallel_size: int = 1,
    controlnet_ids: Optional[Union[str, List[str]]] = None,
+    ip_adapter_ids: Optional[Union[str, List[str]]] = None,


This method has too many parameters: please create an intermediate dataclass that can be derived for each model subtype and pass it as parameter.
Example:

class ModelAddon(dataclass): pass class LoraAddon(ModelAddon): lora_model_ids: Optional[Union[str, List[str]]] lora_weight_names: Optional[Union[str, List[str]]] lora_adapter_names: Optional[Union[str, List[str]]] lora_scales: Optional[Union[float, List[float]]] class IPAdapterAddon(ModelAddon): ip_adapter_ids: Optional[Union[str, List[str]]] = None ip_adapter_subfolders: Optional[Union[str, List[str]]] = None ip_adapter_weight_names: Optional[Union[str, List[str]]] = None ip_adapter_scales: Optional[Union[float, List[float]]] = None

dacorvo · 2025-02-10T09:13:10Z

optimum/exporters/neuron/__main__.py

@@ -542,6 +567,10 @@ def load_models_and_neuron_configs(
    }
    if model is None:
        model = TasksManager.get_model_from_task(**model_kwargs)
+        # Load IP-Adapter if it exists


Here you could just test the type of the ModelAddon parameter

or do a try-except...

dacorvo · 2025-02-10T09:15:05Z

optimum/exporters/neuron/base.py

@@ -166,6 +167,8 @@ def __init__(
        num_beams: Optional[int] = None,
        vae_scale_factor: Optional[int] = None,
        encoder_hidden_size: Optional[int] = None,
+        image_encoder_sequence_length: Optional[int] = None,


Here also you should consider grouping parameters into dataclasses specific of each use case (not in this pull-request but in the upcoming refactoring one you mentioned).

dacorvo · 2025-02-10T09:19:47Z

optimum/neuron/modeling_diffusion.py

@@ -846,6 +882,10 @@ def _export(
        lora_adapter_names: Optional[Union[str, List[str]]] = None,
        lora_scales: Optional[Union[float, List[float]]] = None,
        controlnet_ids: Optional[Union[str, List[str]]] = None,
+        ip_adapter_ids: Optional[Union[str, List[str]]] = None,


Here reuse the same dataclass.

dacorvo · 2025-02-10T09:22:38Z

optimum/neuron/pipelines/diffusers/pipeline_utils.py

+
+            return image_embeds, uncond_image_embeds
+
+    def prepare_ip_adapter_image_embeds(


Where is this method called ? Is it part of an expected API ? If so, you should inherit from the corresponding abstract class.

dacorvo · 2025-02-10T09:23:38Z

tests/exporters/test_export.py

-    @requires_neuronx
-    def test_export_with_dynamic_batch_size(self, test_name, name, model_name, task, neuron_config_constructor):
-        self._neuronx_export(test_name, name, model_name, task, neuron_config_constructor, dynamic_batch_size=True)
+# class NeuronExportTestCase(unittest.TestCase):


You probably don't want to keep that commented out.

tengomucho · 2025-02-10T09:36:38Z

optimum/exporters/neuron/__main__.py

@@ -542,6 +567,10 @@ def load_models_and_neuron_configs(
    }
    if model is None:
        model = TasksManager.get_model_from_task(**model_kwargs)
+        # Load IP-Adapter if it exists


or do a try-except...

tengomucho · 2025-02-10T09:39:51Z

optimum/exporters/neuron/model_wrappers.py

+class CLIPVisionModelNeuronWrapper(torch.nn.Module):
+    def __init__(self, model, input_names: List[str]):
+        super().__init__()
+        self.model = model


wouldn't it be worth checking if model has vision_model and visual_projection attributes?

tengomucho · 2025-02-10T09:44:32Z

optimum/neuron/pipelines/diffusers/pipeline_utils.py

@@ -71,3 +71,65 @@ def _get_add_time_ids(self, *args, **kwargs):
            raise ValueError(
                f"The pipeline type {self.auto_model_class} is not yet supported by Optimum Neuron, please open an request on: https://github.com/huggingface/optimum-neuron/issues."
            )
+
+
+class NeuronIPAdapterMixin:


This has nothing to do with IPAdapterMixin, right? Would it be worth mentioning where did you adapt it from, as you did with NeuronStableDiffusionXLPipelineMixin?

JingyaHuang added 3 commits January 23, 2025 16:07

pre-export prep done

1b82b06

fix doc build

147c30e

step

fa7e2e8

JingyaHuang added 6 commits January 30, 2025 16:45

export done

eec1a7f

Merge branch 'main' into add-ip-adapter

b17e229

make style

c1ab701

fix validation & start api export

0ec30f7

api export done

d2c6821

inference done

0d828b5

JingyaHuang changed the title ~~Add IP-adapter support for stable diffusion (XL)~~ Add IP-adapter support for stable diffusion Feb 5, 2025

JingyaHuang and others added 4 commits February 5, 2025 16:11

fix style

fcd8393

fix

4b14646

Merge branch 'main' into add-ip-adapter

fe04adc

solve conflict

34b40ba

JingyaHuang requested review from dacorvo, michaelbenayoun and tengomucho February 7, 2025 09:42

dacorvo requested changes Feb 10, 2025

View reviewed changes

tengomucho reviewed Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IP-adapter support for stable diffusion #766

Add IP-adapter support for stable diffusion #766

JingyaHuang commented Jan 23, 2025 •

edited by dacorvo

Loading

HuggingFaceDocBuilderDev commented Jan 27, 2025

dacorvo Feb 10, 2025

dacorvo Feb 10, 2025

dacorvo Feb 10, 2025

tengomucho Feb 10, 2025

dacorvo Feb 10, 2025

dacorvo Feb 10, 2025

dacorvo Feb 10, 2025

dacorvo Feb 10, 2025

tengomucho Feb 10, 2025

tengomucho Feb 10, 2025

tengomucho Feb 10, 2025


		return image_embeds, uncond_image_embeds

		def prepare_ip_adapter_image_embeds(

Add IP-adapter support for stable diffusion #766

Are you sure you want to change the base?

Add IP-adapter support for stable diffusion #766

Conversation

JingyaHuang commented Jan 23, 2025 • edited by dacorvo Loading

What does this PR do?

Next steps

Before submitting

HuggingFaceDocBuilderDev commented Jan 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JingyaHuang commented Jan 23, 2025 •

edited by dacorvo

Loading