-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IP-adapter support for stable diffusion #766
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -186,22 +186,21 @@ def validate_model_outputs( | |||
reference_model.eval() | |||
inputs = config.generate_dummy_inputs(return_tuple=False, **input_shapes) | |||
ref_inputs = config.unflatten_inputs(inputs) | |||
if hasattr(reference_model, "config") and getattr(reference_model.config, "is_encoder_decoder", False): | |||
if hasattr(reference_model, "config") and getattr(config._config, "is_encoder_decoder", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather obscure change: does the _config member come from the SD model ?
@@ -522,6 +543,10 @@ def load_models_and_neuron_configs( | |||
torch_dtype: Optional[Union[str, torch.dtype]] = None, | |||
tensor_parallel_size: int = 1, | |||
controlnet_ids: Optional[Union[str, List[str]]] = None, | |||
ip_adapter_ids: Optional[Union[str, List[str]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has too many parameters: please create an intermediate dataclass that can be derived for each model subtype and pass it as parameter.
Example:
class ModelAddon(dataclass):
pass
class LoraAddon(ModelAddon):
lora_model_ids: Optional[Union[str, List[str]]]
lora_weight_names: Optional[Union[str, List[str]]]
lora_adapter_names: Optional[Union[str, List[str]]]
lora_scales: Optional[Union[float, List[float]]]
class IPAdapterAddon(ModelAddon):
ip_adapter_ids: Optional[Union[str, List[str]]] = None
ip_adapter_subfolders: Optional[Union[str, List[str]]] = None
ip_adapter_weight_names: Optional[Union[str, List[str]]] = None
ip_adapter_scales: Optional[Union[float, List[float]]] = None
@@ -542,6 +567,10 @@ def load_models_and_neuron_configs( | |||
} | |||
if model is None: | |||
model = TasksManager.get_model_from_task(**model_kwargs) | |||
# Load IP-Adapter if it exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you could just test the type of the ModelAddon parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do a try-except...
@@ -166,6 +167,8 @@ def __init__( | |||
num_beams: Optional[int] = None, | |||
vae_scale_factor: Optional[int] = None, | |||
encoder_hidden_size: Optional[int] = None, | |||
image_encoder_sequence_length: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also you should consider grouping parameters into dataclasses specific of each use case (not in this pull-request but in the upcoming refactoring one you mentioned).
@@ -846,6 +882,10 @@ def _export( | |||
lora_adapter_names: Optional[Union[str, List[str]]] = None, | |||
lora_scales: Optional[Union[float, List[float]]] = None, | |||
controlnet_ids: Optional[Union[str, List[str]]] = None, | |||
ip_adapter_ids: Optional[Union[str, List[str]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here reuse the same dataclass.
|
||
return image_embeds, uncond_image_embeds | ||
|
||
def prepare_ip_adapter_image_embeds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this method called ? Is it part of an expected API ? If so, you should inherit from the corresponding abstract class.
@requires_neuronx | ||
def test_export_with_dynamic_batch_size(self, test_name, name, model_name, task, neuron_config_constructor): | ||
self._neuronx_export(test_name, name, model_name, task, neuron_config_constructor, dynamic_batch_size=True) | ||
# class NeuronExportTestCase(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't want to keep that commented out.
@@ -542,6 +567,10 @@ def load_models_and_neuron_configs( | |||
} | |||
if model is None: | |||
model = TasksManager.get_model_from_task(**model_kwargs) | |||
# Load IP-Adapter if it exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do a try-except...
class CLIPVisionModelNeuronWrapper(torch.nn.Module): | ||
def __init__(self, model, input_names: List[str]): | ||
super().__init__() | ||
self.model = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be worth checking if model
has vision_model
and visual_projection
attributes?
@@ -71,3 +71,65 @@ def _get_add_time_ids(self, *args, **kwargs): | |||
raise ValueError( | |||
f"The pipeline type {self.auto_model_class} is not yet supported by Optimum Neuron, please open an request on: https://github.com/huggingface/optimum-neuron/issues." | |||
) | |||
|
|||
|
|||
class NeuronIPAdapterMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has nothing to do with IPAdapterMixin
, right? Would it be worth mentioning where did you adapt it from, as you did with NeuronStableDiffusionXLPipelineMixin
?
What does this PR do?
This adds support for Stable Diffusion IP-adapters.
IP-adapter (Image Prompt adapter) is a Stable Diffusion add-on for using images as prompts, similar to Midjourney and DaLLE 3. You can use it to copy the style, composition, or a face in the reference image.
Fixes #718
optimum-cli export neuron --model stable-diffusion-v1-5/stable-diffusion-v1-5 --ip_adapter_ids h94/IP-Adapter --ip_adapter_subfolders models --ip_adapter_weight_names ip-adapter-full-face_sd15.bin --ip_adapter_scales 0.5 --batch_size 1 --height 512 --width 512 --num_images_per_prompt 1 --auto_cast matmul --auto_cast_type bf16 ip_adapter_neuron/
Inference
ip_adapter_image
as inputip_adapter_image_embeds
as input (encode the image first)Next steps
Before submitting