Skip to content

Latest commit

 

History

History
76 lines (47 loc) · 4.36 KB

README.md

File metadata and controls

76 lines (47 loc) · 4.36 KB

Text to Image C++ Generation Pipeline

Examples in this folder showcase inference of text to image models like Stable Diffusion 1.5, 2.1, LCM. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample features ov::genai::Text2ImagePipeline and uses a text prompt as input source.

There are two sample files:

  • main.cpp demonstrates basic usage of the text to image pipeline
  • lora.cpp shows how to apply LoRA adapters to the pipeline

Users can change the sample code and play with the following generation parameters:

  • Change width or height of generated image
  • Generate multiple images per prompt
  • Adjust a number of inference steps
  • Play with guidance scale (read more details)
  • (SD 1.x, 2.x only) Add negative prompt when guidance scale > 1
  • Apply multiple different LoRA adapters and mix them with different blending coefficients

Download and convert the models and tokenizers

The --upgrade-strategy eager option is needed to ensure optimum-intel is upgraded to the latest version.

It's not required to install ../../requirements.txt for deployment if the model has already been exported.

pip install --upgrade-strategy eager -r ../../requirements.txt
optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 dreamlike_anime_1_0_ov/FP16

Run

stable_diffusion ./dreamlike_anime_1_0_ov/FP16 'cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting'

Examples

Prompt: cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting

Supported models

Models can be downloaded from HiggingFace. This sample can run the following list of models, but not limitied to:

Run with optional LoRA adapters

LoRA adapters can be connected to the pipeline and modify generated images to have certain style, details or quality. Adapters are supported in Safetensors format and can be downloaded from public sources like Civitai or HuggingFace or trained by the user. Adapters compatible with a base model should be used only. A weighted blend of multiple adapters can be applied by specifying multple adapter files with corresponding alpha parameters in command line. Check lora.cpp source code to learn how to enable adapters and specify them in each generate call.

Here is an example how to run the sample with a single adapter. First download adapter file from https://civitai.com/models/67927/soulcard page manually and save it as soulcard.safetensors. Or download it from command line:

wget -O soulcard.safetensors https://civitai.com/api/download/models/72591

Then run lora_stable_diffusion executable:

./lora_stable_diffusion dreamlike_anime_1_0_ov/FP16 'curly-haired unicorn in the forest, anime, line' soulcard.safetensors 0.7

The sample generates two images with and without adapters applied using the same prompt:

  • lora.bmp with adapters applied
  • baseline.bmp without adapters applied

Check the difference:

With adapter Without adapter

Note

  • Image generated with HuggingFace / Optimum Intel is not the same generated by this C++ sample:

C++ random generation with MT19937 results differ from numpy.random.randn() and diffusers.utils.randn_tensor. So, it's expected that image generated by Python and C++ versions provide different images, because latent images are initialize differently. Users can implement their own random generator derived from ov::genai::Generator and pass it to Text2ImagePipeline::generate method.