diff --git a/docs/notebooks/002-openvino-api-with-output.rst b/docs/notebooks/002-openvino-api-with-output.rst index 1c2d01199658b2..ef4eca621d258d 100644 --- a/docs/notebooks/002-openvino-api-with-output.rst +++ b/docs/notebooks/002-openvino-api-with-output.rst @@ -292,13 +292,15 @@ TensorFlow Model TensorFlow models saved in frozen graph format can also be passed to ``read_model`` starting in OpenVINO 2022.3. - **NOTE**: Directly loading TensorFlow models is available as a +.. note:: + + Directly loading TensorFlow models is available as a preview feature in the OpenVINO 2022.3 release. Fully functional support will be provided in the upcoming 2023 releases. Currently support is limited to only frozen graph inference format. Other TensorFlow model formats must be converted to OpenVINO IR using - `model conversion - API `__. + `model conversion API `__. + .. code:: ipython3 @@ -563,9 +565,11 @@ classes (``C``). The output is returned as 32-bit floating point. Doing Inference on a Model -------------------------- - **NOTE** this notebook demonstrates only the basic synchronous - inference API. For an async inference example, please refer to `Async - API notebook <115-async-api-with-output.html>`__ +.. note:: + + This notebook demonstrates only the basic synchronous + inference API. For an async inference example, please refer to + `Async API notebook <115-async-api-with-output.html>`__ The diagram below shows a typical inference pipeline with OpenVINO @@ -926,7 +930,9 @@ model will be loaded to the GPU. After running this cell once, the model will be cached, so subsequent runs of this cell will load the model from the cache. -*Note: Model Caching is also available on CPU devices* +.. note:: + + Model Caching is also available on CPU devices .. code:: ipython3 diff --git a/docs/notebooks/102-pytorch-to-openvino-with-output.rst b/docs/notebooks/102-pytorch-to-openvino-with-output.rst index 310055a2ec8419..cf6e83887ca1f9 100644 --- a/docs/notebooks/102-pytorch-to-openvino-with-output.rst +++ b/docs/notebooks/102-pytorch-to-openvino-with-output.rst @@ -237,14 +237,17 @@ Optimizer Python API should be used for these purposes. More details regarding PyTorch model conversion can be found in OpenVINO `documentation `__ - **Note**: Please, take into account that direct support PyTorch +.. note:: + + Please, take into account that direct support PyTorch models conversion is an experimental feature. Model coverage will be increased in the next releases. For cases, when PyTorch model conversion failed, you still can try to export the model to ONNX - format. Please refer to this + format. Please, refer to this `tutorial <102-pytorch-to-openvino-with-output.html>`__ which explains how to convert PyTorch model to ONNX, then to OpenVINO + The ``convert_model`` function accepts the PyTorch model object and returns the ``openvino.runtime.Model`` instance ready to load on a device using ``core.compile_model`` or save on disk for next usage using @@ -501,8 +504,8 @@ Run OpenVINO Model Inference with Static Input Shape `⇑ <#top>`__ 5: hamper - 2.35% -Benchmark OpenVINO Model Inference with Static Input Shape -`⇑ <#top>`__ +Benchmark OpenVINO Model Inference with Static Input Shape `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ .. code:: ipython3 @@ -645,8 +648,9 @@ OpenVINO IR is similar to the original PyTorch model. 5: hamper - 2.35% -Benchmark OpenVINO Model Inference Converted From Scripted Model -`⇑ <#top>`__ +Benchmark OpenVINO Model Inference Converted From Scripted Model `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + .. code:: ipython3 @@ -772,8 +776,8 @@ similar to the original PyTorch model. 5: hamper - 2.35% -Benchmark OpenVINO Model Inference Converted From Traced Model -`⇑ <#top>`__ +Benchmark OpenVINO Model Inference Converted From Traced Model `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ .. code:: ipython3 diff --git a/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst b/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst index fdd3ec526ece5f..082be1d66439ef 100644 --- a/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst +++ b/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst @@ -18,7 +18,7 @@ Source of the **Table of contents**: -- `Preparation <#1preparation>`__ +- `Preparation <#preparation>`__ - `Imports <#imports>`__ - `Settings <#settings>`__ diff --git a/docs/notebooks/104-model-tools-with-output.rst b/docs/notebooks/104-model-tools-with-output.rst index c60c541cece931..62dcd3132ea343 100644 --- a/docs/notebooks/104-model-tools-with-output.rst +++ b/docs/notebooks/104-model-tools-with-output.rst @@ -423,7 +423,9 @@ In the next cell, define the ``benchmark_model()`` function that calls ``benchmark_app``. This makes it easy to try different combinations. In the cell below that, you display available devices on the system. - **Note**: In this notebook, ``benchmark_app`` runs for 15 seconds to +.. note:: + + In this notebook, ``benchmark_app`` runs for 15 seconds to give a quick indication of performance. For more accurate performance, it is recommended to run inference for at least one minute by setting the ``t`` parameter to 60 or higher, and run @@ -432,6 +434,7 @@ the cell below that, you display available devices on the system. command prompt where you have activated the ``openvino_env`` environment. + .. code:: ipython3 def benchmark_model(model_xml, device="CPU", seconds=60, api="async", batch=1): @@ -523,9 +526,7 @@ Benchmark command: .. code:: ipython3 - benchmark_model(model_path, device="GPU", seconds=15, api="async") - - + benchmark_model(model_path, device="GPU", seconds=15, api="async") .. raw:: html @@ -534,8 +535,7 @@ Benchmark command: .. code:: ipython3 - benchmark_model(model_path, device="MULTI:CPU,GPU", seconds=15, api="async") - + benchmark_model(model_path, device="MULTI:CPU,GPU", seconds=15, api="async") .. raw:: html diff --git a/docs/notebooks/105-language-quantize-bert-with-output.rst b/docs/notebooks/105-language-quantize-bert-with-output.rst index f3d9df156fd9b2..cbd1ec2b557456 100644 --- a/docs/notebooks/105-language-quantize-bert-with-output.rst +++ b/docs/notebooks/105-language-quantize-bert-with-output.rst @@ -593,7 +593,9 @@ Finally, measure the inference performance of OpenVINO ``FP32`` and `Benchmark Tool `__ in OpenVINO. - **Note**: The ``benchmark_app`` tool is able to measure the +.. note:: + + The ``benchmark_app`` tool is able to measure the performance of the OpenVINO Intermediate Representation (OpenVINO IR) models only. For more accurate performance, run ``benchmark_app`` in a terminal/command prompt after closing other applications. Run @@ -602,6 +604,7 @@ in OpenVINO. Run ``benchmark_app --help`` to see an overview of all command-line options. + .. code:: ipython3 # Inference FP32 model (OpenVINO IR) diff --git a/docs/notebooks/106-auto-device-with-output.rst b/docs/notebooks/106-auto-device-with-output.rst index be32615035c6f5..3e51a92ee2eb82 100644 --- a/docs/notebooks/106-auto-device-with-output.rst +++ b/docs/notebooks/106-auto-device-with-output.rst @@ -36,18 +36,18 @@ first inference. - `Import modules and create Core <#import-modules-and-create-core>`__ - `Convert the model to OpenVINO IR format <#convert-the-model-to-openvino-ir-format>`__ -- `(1) Simplify selection logic <#1-simplify-selection-logic>`__ +- `(1) Simplify selection logic <#simplify-selection-logic>`__ - `Default behavior of Core::compile_model API without device_name <#default-behavior-of-core::compile_model-api-without-device_name>`__ - `Explicitly pass AUTO as device_name to Core::compile_model API <#explicitly-pass-auto-as-device_name-to-core::compile_model-api>`__ -- `(2) Improve the first inference latency <#2-improve-the-first-inference-latency>`__ +- `(2) Improve the first inference latency <#improve-the-first-inference-latency>`__ - `Load an Image <#load-an-image>`__ - `Load the model to GPU device and perform inference <#load-the-model-to-gpu-device-and-perform-inference>`__ - `Load the model using AUTO device and do inference <#load-the-model-using-auto-device-and-do-inference>`__ -- `(3) Achieve different performance for different targets <#3-achieve-different-performance-for-different-targets>`__ +- `(3) Achieve different performance for different targets <#achieve-different-performance-for-different-targets>`__ - `Class and callback definition <#class-and-callback-definition>`__ - `Inference with THROUGHPUT hint <#inference-with-throughput-hint>`__ diff --git a/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst b/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst index c79f51edaabf08..8b1b221b0aa470 100644 --- a/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst +++ b/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst @@ -342,11 +342,11 @@ Create a quantized model from the pre-trained ``FP16`` model and the calibration dataset. The optimization process contains the following steps: -:: - 1. Create a Dataset for quantization. - 2. Run `nncf.quantize` for getting an optimized model. The `nncf.quantize` function provides an interface for model quantization. It requires an instance of the OpenVINO Model and quantization dataset. Optionally, some additional parameters for the configuration quantization process (number of samples for quantization, preset, ignored scope, etc.) can be provided. For more accurate results, we should keep the operation in the postprocessing subgraph in floating point precision, using the `ignored_scope` parameter. `advanced_parameters` can be used to specify advanced quantization parameters for fine-tuning the quantization algorithm. In this tutorial we pass range estimator parameters for activations. For more information see [Tune quantization parameters](https://docs.openvino.ai/2023.0/basic_quantization_flow.html#tune-quantization-parameters). - 3. Serialize OpenVINO IR model using `openvino.runtime.serialize` function. +1. Create a Dataset for quantization. +2. Run ``nncf.quantize`` for getting an optimized model. The ``nncf.quantize`` function provides an interface for model quantization. It requires an instance of the OpenVINO Model and quantization dataset. Optionally, some additional parameters for the configuration quantization process (number of samples for quantization, preset, ignored scope, etc.) can be provided. For more accurate results, we should keep the operation in the postprocessing subgraph in floating point precision, using the ``ignored_scope`` parameter. ``advanced_parameters`` can be used to specify advanced quantization parameters for fine-tuning the quantization algorithm. In this tutorial we pass range estimator parameters for activations. For more information see +`Tune quantization parameters `__. +3. Serialize OpenVINO IR model using ``openvino.runtime.serialize`` function. .. code:: ipython3 @@ -663,7 +663,9 @@ Tool `__ in OpenVINO as well as the `Asynchronous Inference notebook `__. + Performance Comparison with benchmark_app `⇑ <#top>`__ ############################################################################################################################### diff --git a/docs/notebooks/109-latency-tricks-with-output.rst b/docs/notebooks/109-latency-tricks-with-output.rst index 5902b753e11cdd..f939f5e5d4afac 100644 --- a/docs/notebooks/109-latency-tricks-with-output.rst +++ b/docs/notebooks/109-latency-tricks-with-output.rst @@ -21,7 +21,9 @@ many hints simultaneously, like more inference threads + shared memory. It should give even better performance, but we recommend testing it anyway. - **NOTE**: We especially recommend trying +.. note:: + + We especially recommend trying ``OpenVINO IR model + CPU + shared memory in latency mode`` or ``OpenVINO IR model + CPU + shared memory + more inference threads``. @@ -29,12 +31,14 @@ The quantization and pre-post-processing API are not included here as they change the precision (quantization) or processing graph (prepostprocessor). You can find examples of how to apply them to optimize performance on OpenVINO IR files in -`111-detection-quantization <../111-detection-quantization>`__ and -`118-optimize-preprocessing <../118-optimize-preprocessing>`__. +`111-detection-quantization <111-yolov5-quantization-migration-with-output.html>`__ and +`118-optimize-preprocessing <118-optimize-preprocessing-with-output.html>`__. |image0| - **NOTE**: Many of the steps presented below will give you better +.. note:: + + Many of the steps presented below will give you better performance. However, some of them may not change anything if they are strongly dependent on either the hardware or the model. Please run this notebook on your computer with your model to learn which of @@ -45,7 +49,7 @@ optimize performance on OpenVINO IR files in result in different performance. A similar notebook focused on the throughput mode is available -`here <109-throughput-tricks.ipynb>`__. +`here <109-throughput-tricks-with-output.html>`__. **Table of contents**: @@ -193,7 +197,9 @@ Hardware `⇑ <#top>`__ The code below lists the available hardware we will use in the benchmarking process. - **NOTE**: The hardware you have is probably completely different from +.. note:: + + The hardware you have is probably completely different from ours. It means you can see completely different results. .. code:: ipython3 @@ -606,9 +612,9 @@ Other tricks `⇑ <#top>`__ There are other tricks for performance improvement, such as quantization and pre-post-processing or dedicated to throughput mode. To get even more from your model, please visit -`111-detection-quantization <../111-detection-quantization>`__, -`118-optimize-preprocessing <../118-optimize-preprocessing>`__, and -`109-throughput-tricks <109-throughput-tricks.ipynb>`__. +`111-detection-quantization <111-yolov5-quantization-migration-with-output.html>`__, +`118-optimize-preprocessing <118-optimize-preprocessing-with-output.html>`__, and +`109-throughput-tricks <109-latency-tricks-with-output.html>`__. Performance comparison `⇑ <#top>`__ ############################################################################################################################### diff --git a/docs/notebooks/109-throughput-tricks-with-output.rst b/docs/notebooks/109-throughput-tricks-with-output.rst index a877dab76f2d5d..d01b7d3f3dcfb1 100644 --- a/docs/notebooks/109-throughput-tricks-with-output.rst +++ b/docs/notebooks/109-throughput-tricks-with-output.rst @@ -26,12 +26,14 @@ The quantization and pre-post-processing API are not included here as they change the precision (quantization) or processing graph (prepostprocessor). You can find examples of how to apply them to optimize performance on OpenVINO IR files in -`111-detection-quantization <../111-detection-quantization>`__ and -`118-optimize-preprocessing <../118-optimize-preprocessing>`__. +`111-detection-quantization <111-yolov5-quantization-migration-with-output.html>`__ and +`118-optimize-preprocessing <118-optimize-preprocessing-with-output.html>`__. |image0| - **NOTE**: Many of the steps presented below will give you better +.. note:: + + Many of the steps presented below will give you better performance. However, some of them may not change anything if they are strongly dependent on either the hardware or the model. Please run this notebook on your computer with your model to learn which of @@ -42,7 +44,7 @@ optimize performance on OpenVINO IR files in result in different performance. A similar notebook focused on the latency mode is available -`here <109-latency-tricks.ipynb>`__. +`here <109-latency-tricks-with-output.html>`__. **Table of contents**: @@ -180,7 +182,9 @@ Hardware `⇑ <#top>`__ The code below lists the available hardware we will use in the benchmarking process. - **NOTE**: The hardware you have is probably completely different from +.. note:: + + The hardware you have is probably completely different from ours. It means you can see completely different results. .. code:: ipython3 @@ -616,7 +620,9 @@ automatically spawns the pool of InferRequest objects (also called “jobs”) and provides synchronization mechanisms to control the flow of the pipeline. - **NOTE**: Asynchronous processing cannot guarantee outputs to be in +.. note:: + + Asynchronous processing cannot guarantee outputs to be in the same order as inputs, so be careful in case of applications when the order of frames matters, e.g., videos. @@ -662,9 +668,9 @@ options, quantization and pre-post-processing or dedicated to latency mode. To get even more from your model, please visit `advanced throughput options `__, -`109-latency-tricks <109-latency-tricks.ipynb>`__, -`111-detection-quantization <../111-detection-quantization>`__, and -`118-optimize-preprocessing <../118-optimize-preprocessing>`__. +`109-latency-tricks <109-latency-tricks-with-output.html>`__, +`111-detection-quantization <111-yolov5-quantization-migration-with-output.html>`__, and +`118-optimize-preprocessing <118-optimize-preprocessing-with-output.html>`__. Performance comparison `⇑ <#top>`__ ############################################################################################################################### diff --git a/docs/notebooks/110-ct-scan-live-inference-with-output.rst b/docs/notebooks/110-ct-scan-live-inference-with-output.rst index 6115989b4bb066..7d543aa06d8e11 100644 --- a/docs/notebooks/110-ct-scan-live-inference-with-output.rst +++ b/docs/notebooks/110-ct-scan-live-inference-with-output.rst @@ -116,7 +116,9 @@ To measure the inference performance of the IR model, use is a command-line application that can be run in the notebook with ``! benchmark_app`` or ``%sx benchmark_app`` commands. - **Note**: The ``benchmark_app`` tool is able to measure the +.. note:: + + The ``benchmark_app`` tool is able to measure the performance of the OpenVINO Intermediate Representation (OpenVINO IR) models only. For more accurate performance, run ``benchmark_app`` in a terminal/command prompt after closing other applications. Run @@ -125,6 +127,7 @@ is a command-line application that can be run in the notebook with Run ``benchmark_app --help`` to see an overview of all command-line options. + .. code:: ipython3 core = Core() diff --git a/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst b/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst index 01c823c0cb2db6..2ff15e5eed48d3 100644 --- a/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst +++ b/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst @@ -455,19 +455,20 @@ this notebook. advanced algorithms for Neural Networks inference optimization in OpenVINO with minimal accuracy drop. - **Note**: NNCF Post-training Quantization is available in OpenVINO +.. note:: + + NNCF Post-training Quantization is available in OpenVINO 2023.0 release. + Create a quantized model from the pre-trained ``FP32`` model and the calibration dataset. The optimization process contains the following steps: -:: - - 1. Create a Dataset for quantization. - 2. Run `nncf.quantize` for getting an optimized model. - 3. Export the quantized model to ONNX and then convert to OpenVINO IR model. - 4. Serialize the INT8 model using `openvino.runtime.serialize` function for benchmarking. +1. Create a Dataset for quantization. +2. Run ``nncf.quantize`` for getting an optimized model. +3. Export the quantized model to ONNX and then convert to OpenVINO IR model. +4. Serialize the INT8 model using ``openvino.runtime.serialize`` function for benchmarking. .. code:: ipython3 @@ -580,13 +581,16 @@ command line application, part of OpenVINO development tools, that can be run in the notebook with ``! benchmark_app`` or ``%sx benchmark_app``. - **NOTE**: For the most accurate performance estimation, it is +.. note:: + + For the most accurate performance estimation, it is recommended to run ``benchmark_app`` in a terminal/command prompt after closing other applications. Run ``benchmark_app -m model.xml -d CPU`` to benchmark async inference on CPU for one minute. Change ``CPU`` to ``GPU`` to benchmark on GPU. Run ``benchmark_app --help`` to see all command line options. + .. code:: ipython3 # ! benchmark_app --help @@ -759,10 +763,13 @@ slices are annotated as kidney. Run this cell again to show results on a different subset. The random seed is displayed to enable reproducing specific runs of this cell. - **NOTE**: the images are shown after optional augmenting and +.. note:: + + The images are shown after optional augmenting and resizing. In the Kits19 dataset all but one of the cases has the ``(512, 512)`` input shape. + .. code:: ipython3 # The sigmoid function is used to transform the result of the network @@ -841,7 +848,9 @@ inference on the specified CT scan has completed, the total time and throughput (fps), including preprocessing and displaying, will be printed. - **NOTE**: If you experience flickering on Firefox, consider using +.. note:: + + If you experience flickering on Firefox, consider using Chrome or Edge to run this notebook. Load Model and List of Image Files `⇑ <#top>`__ diff --git a/docs/notebooks/111-yolov5-quantization-migration-with-output.rst b/docs/notebooks/111-yolov5-quantization-migration-with-output.rst index 46ac3d7ce76b91..230ace7db8c8d6 100644 --- a/docs/notebooks/111-yolov5-quantization-migration-with-output.rst +++ b/docs/notebooks/111-yolov5-quantization-migration-with-output.rst @@ -87,10 +87,7 @@ Download the YOLOv5 model `⇑ <#top>`__ .. parsed-literal:: Download Ultralytics Yolov5 project source: - - - -``git clone https://github.com/ultralytics/yolov5.git -b v7.0`` + ``git clone https://github.com/ultralytics/yolov5.git -b v7.0`` Conversion of the YOLOv5 model to OpenVINO `⇑ <#top>`__ diff --git a/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst b/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst index b4e5caafbb384f..16c64286c2bd6d 100644 --- a/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst +++ b/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst @@ -20,10 +20,12 @@ downsized to 64×64 colored images. The tutorial will demonstrate that only a tiny part of the dataset is needed for the post-training quantization, not demanding the fine-tuning of the model. - **NOTE**: This notebook requires that a C++ compiler is accessible on +.. note:: + + This notebook requires that a C++ compiler is accessible on the default binary search path of the OS you are running the - notebook. - + notebook. + **Table of contents**: @@ -355,8 +357,7 @@ Create and load original uncompressed model `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -ResNet-50 from the ```torchivision`` -repository `__ is pre-trained on +ResNet-50 from the `torchivision repository `__ is pre-trained on ImageNet with more prediction classes than Tiny ImageNet, so the model is adjusted by swapping the last FC layer to one with fewer output values. @@ -672,7 +673,9 @@ Benchmark Tool runs inference for 60 seconds in asynchronous mode on CPU. It returns inference speed as latency (milliseconds per image) and throughput (frames per second) values. - **NOTE**: This notebook runs benchmark_app for 15 seconds to give a +.. note:: + + This notebook runs benchmark_app for 15 seconds to give a quick indication of performance. For more accurate performance, it is recommended to run benchmark_app in a terminal/command prompt after closing other applications. Run ``benchmark_app -m model.xml -d CPU`` @@ -680,6 +683,7 @@ throughput (frames per second) values. to benchmark on GPU. Run ``benchmark_app --help`` to see an overview of all command-line options. + .. code:: ipython3 device diff --git a/docs/notebooks/113-image-classification-quantization-with-output.rst b/docs/notebooks/113-image-classification-quantization-with-output.rst index 55f40d20ab83e0..d72f5e3e4c061f 100644 --- a/docs/notebooks/113-image-classification-quantization-with-output.rst +++ b/docs/notebooks/113-image-classification-quantization-with-output.rst @@ -324,13 +324,16 @@ models, using `Benchmark Tool `__ - an inference performance measurement tool in OpenVINO. - **NOTE**: For more accurate performance, it is recommended to run +.. note:: + + For more accurate performance, it is recommended to run benchmark_app in a terminal/command prompt after closing other applications. Run ``benchmark_app -m model.xml -d CPU`` to benchmark async inference on CPU for one minute. Change CPU to GPU to benchmark on GPU. Run ``benchmark_app --help`` to see an overview of all command-line options. + .. code:: ipython3 # Inference FP16 model (OpenVINO IR) diff --git a/docs/notebooks/119-tflite-to-openvino-with-output.rst b/docs/notebooks/119-tflite-to-openvino-with-output.rst index b47e777c6fb700..aa0bc8713a3973 100644 --- a/docs/notebooks/119-tflite-to-openvino-with-output.rst +++ b/docs/notebooks/119-tflite-to-openvino-with-output.rst @@ -132,7 +132,7 @@ Load model using OpenVINO TensorFlow Lite Frontend `⇑ <#top>`__ TensorFlow Lite models are supported via ``FrontEnd`` API. You may skip conversion to IR and read models directly by OpenVINO runtime API. For more examples supported formats reading via Frontend API, please look -this `tutorial <../002-openvino-api>`__. +this `tutorial <002-openvino-api-with-output.html>`__. .. code:: ipython3 @@ -224,14 +224,16 @@ Estimate Model Performance `⇑ <#top>`__ is used to measure the inference performance of the model on CPU and GPU. +.. note:: - **NOTE**: For more accurate performance, it is recommended to run + For more accurate performance, it is recommended to run ``benchmark_app`` in a terminal/command prompt after closing other applications. Run ``benchmark_app -m model.xml -d CPU`` to benchmark async inference on CPU for one minute. Change ``CPU`` to ``GPU`` to benchmark on GPU. Run ``benchmark_app --help`` to see an overview of all command-line options. + .. code:: ipython3 print("Benchmark model inference on CPU") diff --git a/docs/notebooks/202-vision-superresolution-image-with-output.rst b/docs/notebooks/202-vision-superresolution-image-with-output.rst index c3902c024e8c38..18ea80db89dbd5 100644 --- a/docs/notebooks/202-vision-superresolution-image-with-output.rst +++ b/docs/notebooks/202-vision-superresolution-image-with-output.rst @@ -44,7 +44,7 @@ pp. 2777-2784, doi: 10.1109/ICPR.2018.8545760. - `Superresolution on full input image <#superresolution-on-full-input-image>`__ - `Compute patches <#compute-patches>`__ - - `Do Inference <#do-inference>`__ + - `Do Inference <#do-the-inference>`__ - `Save superresolution image and the bicubic image <#save-superresolution-image-and-the-bicubic-image>`__ Preparation `⇑ <#top>`__ @@ -260,10 +260,13 @@ Load and Show the Input Image `⇑ <#top>`__ ############################################################################################################################### - **NOTE**: For the best results, use raw images (like ``TIFF``, +.. note:: + + For the best results, use raw images (like ``TIFF``, ``BMP`` or ``PNG``). Compressed images (like ``JPEG``) may appear distorted after processing with the super resolution model. + .. code:: ipython3 IMAGE_PATH = Path("./data/tower.jpg") @@ -493,9 +496,12 @@ This may take a while. For the video, the superresolution and bicubic image are resized by a factor of 2 to improve processing speed. This gives an indication of the superresolution effect. The video is saved as an ``.avi`` file. You can click on the link to download the video, or -open it directly from the ``output/`` directory, and play it locally. > -Note: If you run the example in Google Colab, download video files using -the ``Files`` tool. +open it directly from the ``output/`` directory, and play it locally. + +.. note:: + + If you run the example in Google Colab, download video files using the ``Files`` tool. + .. code:: ipython3 @@ -612,6 +618,8 @@ Compute patches `⇑ <#top>`__ The output image will have a width of 11280 and a height of 7280 +.. _do-the-inference: + Do Inference `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ diff --git a/docs/notebooks/202-vision-superresolution-video-with-output.rst b/docs/notebooks/202-vision-superresolution-video-with-output.rst index 653f92d0d80485..840d31c84ee1fc 100644 --- a/docs/notebooks/202-vision-superresolution-video-with-output.rst +++ b/docs/notebooks/202-vision-superresolution-video-with-output.rst @@ -16,9 +16,12 @@ Resolution,” `__ 2018 24th International Conference on Pattern Recognition (ICPR), 2018, pp. 2777-2784, doi: 10.1109/ICPR.2018.8545760. - **NOTE**: The Single Image Super Resolution (SISR) model used in this +.. note:: + + The Single Image Super Resolution (SISR) model used in this demo is not optimized for a video. Results may vary depending on the - video. + video. + **Table of contents**: @@ -220,10 +223,13 @@ with superresolution. By default, only the first 100 frames of the video are processed. Change ``NUM_FRAMES`` in the cell below to modify this. - **NOTE**: The resulting video does not contain audio. The input video +.. note:: + + The resulting video does not contain audio. The input video should be a landscape video and have an input resolution of 360p (640x360) for the 1032 model, or 480p (720x480) for the 1033 model. + Settings `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ diff --git a/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst b/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst index 1db7b6e309c527..c516000c84ca82 100644 --- a/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst +++ b/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst @@ -384,7 +384,7 @@ file and create torch dummy input. Input dimensions are in our case - ``H`` - model input image height - ``W`` - model input image width -.. +.. note:: Note that H and W are here fixed to 512, as this is required by the model. Resizing is done inside the inference function from the @@ -604,19 +604,20 @@ Finally, use the OpenVINO `Benchmark Tool `__ to measure the inference performance of the model. - NOTE: For more accurate performance, it is recommended to run - ``benchmark_app`` in a terminal/command prompt after closing other - applications. Run ``benchmark_app -m model.xml -d CPU`` to benchmark - async inference on CPU for one minute. Change ``CPU`` to ``GPU`` to - benchmark on GPU. Run ``benchmark_app --help`` to see an overview of - all command-line options. +Note that for more accurate performance, it is recommended to run +``benchmark_app`` in a terminal/command prompt after closing other +applications. Run ``benchmark_app -m model.xml -d CPU`` to benchmark +async inference on CPU for one minute. Change ``CPU`` to ``GPU`` to +benchmark on GPU. Run ``benchmark_app --help`` to see an overview of +all command-line options. -.. +.. note:: Keep in mind that the authors of original paper used V100 GPU, which is significantly more powerful than the CPU used to obtain the following throughput. Therefore, FPS can’t be compared directly. + .. code:: ipython3 device diff --git a/docs/notebooks/206-vision-paddlegan-anime-with-output.rst b/docs/notebooks/206-vision-paddlegan-anime-with-output.rst index da7002cc9e99c2..7974ce25de12e8 100644 --- a/docs/notebooks/206-vision-paddlegan-anime-with-output.rst +++ b/docs/notebooks/206-vision-paddlegan-anime-with-output.rst @@ -359,8 +359,7 @@ level to ``CRITICAL`` to ignore warnings that are irrelevant for this demo. For information about setting the parameters, see this `page `__. -**Convert ONNX Model to OpenVINO IR with**\ `Model Conversion Python -API `__ +**Convert ONNX Model to OpenVINO IR with** `Model Conversion Python API `__ .. code:: ipython3 diff --git a/docs/notebooks/208-optical-character-recognition-with-output.rst b/docs/notebooks/208-optical-character-recognition-with-output.rst index 79845f408ee1aa..0815ae2d3cd700 100644 --- a/docs/notebooks/208-optical-character-recognition-with-output.rst +++ b/docs/notebooks/208-optical-character-recognition-with-output.rst @@ -38,7 +38,7 @@ information, refer to the - `Text Recognition <#text-recognition>`__ - `Load Text Recognition Model <#load-text-recognition-model>`__ - - `Do Inference <#do-inference>`__ + - `Do Inference <#do-the-inference>`__ - `Show Results <#show-results>`__ @@ -536,6 +536,9 @@ Load Text Recognition Model `⇑ <#top>`__ # Get the height and width of the input layer. _, _, H, W = recognition_input_layer.shape + +.. _do-the-inference: + Do Inference `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ diff --git a/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst b/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst index ecd9c800c0ecdf..8fabfbf8b90a1e 100644 --- a/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst +++ b/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst @@ -114,7 +114,9 @@ method by providing a path to the directory with pipeline configuration or identification from `HuggingFace hub `__. - **Note**: This tutorial uses a non-official version of model +.. note:: + + This tutorial uses a non-official version of model ``philschmid/pyannote-speaker-diarization-endpoint``, provided only for demo purposes. The original model (``pyannote/speaker-diarization``) requires you to accept the model @@ -128,6 +130,7 @@ hub `__. You can log in on HuggingFace Hub in the notebook environment using the following code: + .. code:: python diff --git a/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst b/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst index e8f4c4aafb690e..c5237117f8a960 100644 --- a/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst +++ b/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst @@ -70,7 +70,9 @@ model is already downloaded. The selected model comes from the public directory, which means it must be converted into OpenVINO Intermediate Representation (OpenVINO IR). - **Note**: To change the model, replace the name of the model in the +.. note:: + + To change the model, replace the name of the model in the code below, for example to ``"vehicle-detection-0201"`` or ``"vehicle-detection-0202"``. Keep in mind that they support different image input sizes in detection. Also, you can change the @@ -81,6 +83,7 @@ Representation (OpenVINO IR). ``"FP16"``, and ``"FP16-INT8"``. A different type has a different model size and a precision value. + .. code:: ipython3 # A directory where the model will be downloaded. diff --git a/docs/notebooks/219-knowledge-graphs-conve-with-output.rst b/docs/notebooks/219-knowledge-graphs-conve-with-output.rst index bf586695e67413..c623c3cfd0018e 100644 --- a/docs/notebooks/219-knowledge-graphs-conve-with-output.rst +++ b/docs/notebooks/219-knowledge-graphs-conve-with-output.rst @@ -536,9 +536,8 @@ https://docs.openvino.ai/2023.0/openvino_docs_optimization_guide_dldt_optimizati References `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - 1. Convolutional 2D Knowledge Graph -Embeddings, Tim Dettmers et al. (https://arxiv.org/abs/1707.01476) 2. -Model implementation: https://github.com/TimDettmers/ConvE +1. Convolutional 2D Knowledge Graph Embeddings, Tim Dettmers et al. (https://arxiv.org/abs/1707.01476) +2. Model implementation: https://github.com/TimDettmers/ConvE The ConvE model implementation used in this notebook is licensed under the MIT License. The license is displayed below: MIT License diff --git a/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst b/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst index 95db77642789e9..cd34355ccf9d6f 100644 --- a/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst +++ b/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst @@ -53,7 +53,7 @@ Prerequisites - `Visualize Sentence Alignment <#visualize-sentence-alignment>`__ - `Speed up Embeddings Computation <#speed-up-embeddings-computation>`__ -.. |image0| image:: https://user-images.githubusercontent.com/51917466/254582697-18f3ab38-e264-4b2c-a088-8e54b855c1b2.png%22 +.. |image0| image:: https://user-images.githubusercontent.com/51917466/254582697-18f3ab38-e264-4b2c-a088-8e54b855c1b2.png .. code:: ipython3 @@ -356,10 +356,13 @@ code `__, as the rules for splitting text into sentences may vary for different languages. - **Hint**: The ``book_metadata`` obtained from the Gutendex contains +.. hint:: + + The ``book_metadata`` obtained from the Gutendex contains the language code as well, enabling automation of this part of the pipeline. + .. code:: ipython3 import pysbd @@ -410,7 +413,7 @@ translation pairs. This makes LaBSE a great choice for our task and it can be reused for different language pairs still producing good results. -.. |image01| image:: https://user-images.githubusercontent.com/51917466/254582913-51531880-373b-40cb-bbf6-1965859df2eb.png%22 +.. |image01| image:: https://user-images.githubusercontent.com/51917466/254582913-51531880-373b-40cb-bbf6-1965859df2eb.png .. code:: ipython3 @@ -952,9 +955,12 @@ advance and fill it in as the inference requests are executed. Let’s compare the models and plot the results. - **Note**: To get a more accurate benchmark, use the `Benchmark Python +.. note:: + + To get a more accurate benchmark, use the `Benchmark Python Tool `__ + .. code:: ipython3 number_of_chars = 15_000 diff --git a/docs/notebooks/222-vision-image-colorization-with-output.rst b/docs/notebooks/222-vision-image-colorization-with-output.rst index 6117364879d78b..5985afd3fedb0f 100644 --- a/docs/notebooks/222-vision-image-colorization-with-output.rst +++ b/docs/notebooks/222-vision-image-colorization-with-output.rst @@ -223,7 +223,7 @@ respectively Loading the Model `⇑ <#top>`__ ############################################################################################################################### - Load the model in OpenVINO Runtime with +Load the model in OpenVINO Runtime with ``ie.read_model`` and compile it for the specified device with ``ie.compile_model``. diff --git a/docs/notebooks/223-text-prediction-with-output.rst b/docs/notebooks/223-text-prediction-with-output.rst index cb50b852e26d2d..ef77dd1d3e04f1 100644 --- a/docs/notebooks/223-text-prediction-with-output.rst +++ b/docs/notebooks/223-text-prediction-with-output.rst @@ -494,7 +494,8 @@ The ``text`` variable below is the input used to generate a predicted sequence. Selected Model is PersonaGPT. Please select GPT-Neo or GPT-2 in the first cell to generate text sequences -# Conversation with PersonaGPT using OpenVINO™ `⇑ <#top>`__ +Conversation with PersonaGPT using OpenVINO™ `⇑ <#top>`__ +############################################################################################################################### User Input is tokenized with ``eos_token`` concatenated in the end. Model input is tokenized text, which serves as initial condition for diff --git a/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst b/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst index 812d448e31d40b..90f6243f6c3dda 100644 --- a/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst +++ b/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst @@ -64,7 +64,9 @@ Prerequisites `⇑ <#top>`__ **The following is needed only if you want to use the original model. If not, you do not have to do anything. Just run the notebook.** - **Note**: The original model (for example, ``stable-diffusion-v1-4``) +.. note:: + + The original model (for example, ``stable-diffusion-v1-4``) requires you to accept the model license before downloading or using its weights. Visit the `stable-diffusion-v1-4 card `__ to @@ -76,6 +78,7 @@ not, you do not have to do anything. Just run the notebook.** You can login on Hugging Face Hub in notebook environment, using following code: + .. code:: python @@ -870,9 +873,12 @@ Now, you can define a text prompt for image generation and run inference pipeline. Optionally, you can also change the random generator seed for latent state initialization and number of steps. - **Note**: Consider increasing ``steps`` to get more precise results. +.. note:: + + Consider increasing ``steps`` to get more precise results. A suggested value is ``50``, but it will take longer time to process. + .. code:: ipython3 import ipywidgets as widgets diff --git a/docs/notebooks/226-yolov7-optimization-with-output.rst b/docs/notebooks/226-yolov7-optimization-with-output.rst index c04fb0c6263cf9..330d988cab3802 100644 --- a/docs/notebooks/226-yolov7-optimization-with-output.rst +++ b/docs/notebooks/226-yolov7-optimization-with-output.rst @@ -772,10 +772,13 @@ OpenVINO with minimal accuracy drop. We will use 8-bit quantization in post-training mode (without the fine-tuning pipeline) to optimize YOLOv7. - **Note**: NNCF Post-training Quantization is available as a preview +.. note:: + + NNCF Post-training Quantization is available as a preview feature in OpenVINO 2022.3 release. Fully functional support will be provided in the next releases. + The optimization process contains the following steps: 1. Create a Dataset for quantization. @@ -910,13 +913,16 @@ Tool `__ notebook first to generate OpenVINO IR model that is used for quantization. @@ -180,9 +180,12 @@ model. Create a quantized model from the pre-trained ``FP16`` model. - **NOTE**: Quantization is time and memory consuming operation. +.. note:: + + Quantization is time and memory consuming operation. Running quantization code below may take a long time. + .. code:: ipython3 import logging @@ -342,10 +345,13 @@ Compare inference time of the FP16 IR and quantized models we can approximately estimate the speed up of the dynamic quantized models. - **NOTE**: For the most accurate performance estimation, it is +.. note:: + + For the most accurate performance estimation, it is recommended to run ``benchmark_app`` in a terminal/command prompt after closing other applications with static shapes. + .. code:: ipython3 import time diff --git a/docs/notebooks/230-yolov8-optimization-with-output.rst b/docs/notebooks/230-yolov8-optimization-with-output.rst index e27522d664d22b..28d3b14a05169e 100644 --- a/docs/notebooks/230-yolov8-optimization-with-output.rst +++ b/docs/notebooks/230-yolov8-optimization-with-output.rst @@ -685,10 +685,13 @@ in the YOLOv8 repo, we also need to download annotations in the format used by the author of the model, for use with the original model evaluation function. - **Note**: The initial dataset download may take a few minutes to +.. note:: + + The initial dataset download may take a few minutes to complete. The download speed will vary depending on the quality of your internet connection. + .. code:: ipython3 from zipfile import ZipFile @@ -863,13 +866,19 @@ validator class instance. After definition test function and validator creation, we are ready for -getting accuracy metrics >\ **Note**: Model evaluation is time consuming -process and can take several minutes, depending on the hardware. For -reducing calculation time, we define ``num_samples`` parameter with -evaluation subset size, but in this case, accuracy can be noncomparable -with originally reported by the authors of the model, due to validation -subset difference. *To validate the models on the full dataset set -``NUM_TEST_SAMPLES = None``.* +getting accuracy metrics. + +.. note:: + + Model evaluation is time consuming + process and can take several minutes, depending on the hardware. For + reducing calculation time, we define ``num_samples`` parameter with + evaluation subset size, but in this case, accuracy can be noncomparable + with originally reported by the authors of the model, due to validation + subset difference. + +To validate the models on the full dataset set +``NUM_TEST_SAMPLES = None``. .. code:: ipython3 @@ -1005,9 +1014,12 @@ asymmetric quantization of activations. For more accurate results, we should keep the operation in the postprocessing subgraph in floating point precision, using the ``ignored_scope`` parameter. - **Note**: Model post-training quantization is time-consuming process. +.. note:: + + Model post-training quantization is time-consuming process. Be patient, it can take several minutes depending on your hardware. + .. code:: ipython3 ignored_scope = nncf.IgnoredScope( @@ -1189,7 +1201,9 @@ Tool -d CPU -shape ""`` to @@ -1198,6 +1212,7 @@ models. ``benchmark_app --help`` to see an overview of all command-line options. + Compare performance object detection models `⇑ <#top>`__ ------------------------------------------------------------------------------------------------------------------------------- @@ -1637,13 +1652,13 @@ meets passing criteria. Next steps `⇑ <#top>`__ ############################################################################################################################### - This section contains suggestions on how to +This section contains suggestions on how to additionally improve the performance of your application using OpenVINO. Async inference pipeline `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - The key advantage of the Async +The key advantage of the Async API is that when a device is busy with inference, the application can perform other tasks in parallel (for example, populating inputs or scheduling other requests) rather than wait for the current inference to @@ -1692,7 +1707,7 @@ preprocessing and postprocessing steps for a model. Define input data format `⇑ <#top>`__ ------------------------------------------------------------------------------------------------------------------------------- - To address particular input of +To address particular input of a model/preprocessor, the ``input(input_id)`` method, where ``input_id`` is a positional index or input tensor name for input in ``model.inputs``, if a model has a single input, ``input_id`` can be @@ -1917,13 +1932,16 @@ using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set \ ``use_popup=True``. - **NOTE**: To use this notebook with a webcam, you need to run the +.. note:: + + To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a remote server (for example, in Binder or Google Colab service), the webcam will not work. By default, the lower cell will run model inference on a video file. If you want to try live inference on your webcam set ``WEBCAM_INFERENCE = True`` + Run the object detection: .. code:: ipython3 diff --git a/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst b/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst index ac325b2bf11202..bf63a422e49bcf 100644 --- a/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst +++ b/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst @@ -117,7 +117,9 @@ just a few lines of code provided as part First, we load the pre-trained weights of all components of the model. - **NOTE**: Initially, model loading can take some time due to +.. note:: + + Initially, model loading can take some time due to downloading the weights. Also, the download speed depends on your internet connection. @@ -961,9 +963,12 @@ by the model on this need inspiration. Optionally, you can also change the random generator seed for latent state initialization and number of steps. - **Note**: Consider increasing ``steps`` to get more precise results. +.. note:: + + Consider increasing ``steps`` to get more precise results. A suggested value is ``100``, but it will take more time to process. + .. code:: ipython3 style = {'description_width': 'initial'} @@ -986,9 +991,10 @@ seed for latent state initialization and number of steps. VBox(children=(Text(value=' Make it in galaxy', description='your text'), IntSlider(value=42, description='see… +.. note:: + + Diffusion process can take some time, depending on what hardware you select. - **Note**: Diffusion process can take some time, depending on what - hardware you select. .. code:: ipython3 diff --git a/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst index a07b5c22ebbd1c..6916ae2fd5f239 100644 --- a/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst @@ -1265,9 +1265,11 @@ Configure Inference Pipeline `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -Configuration steps: 1. Load models on device 2. Configure tokenizer and -scheduler 3. Create instance of ``OVStableDiffusionInpaintingPipeline`` -class +Configuration steps: + +1. Load models on device. +2. Configure tokenizer and scheduler. +3. Create instance of ``OVStableDiffusionInpaintingPipeline`` class. .. code:: ipython3 diff --git a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst index 14bb516e284e50..59df2505a79d6f 100644 --- a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst @@ -20,7 +20,10 @@ accelerate end-to-end pipelines on Intel architectures. More details in this `repository `__. -``Note: We suggest you to create a different environment and run the following installation command there.`` +.. note:: + + We suggest you to create a different environment and run the following installation command there. + .. code:: ipython3 @@ -43,11 +46,14 @@ you have integrated GPU (iGPU) and discrete GPU (dGPU), it will show If you just have either an iGPU or dGPU that will be assigned to ``"GPU"`` -Note: For more details about GPU with OpenVINO visit this -`link `__. -If you have been facing any issue in Ubuntu 20.04 or Windows 11 read -this -`blog `__. +.. note:: + + For more details about GPU with OpenVINO visit this + `link `__. + If you have been facing any issue in Ubuntu 20.04 or Windows 11 read + this + `blog `__. + .. code:: ipython3 diff --git a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst index 6ee58bbacd0c38..59641538c131ce 100644 --- a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst @@ -20,16 +20,18 @@ accelerate end-to-end pipelines on Intel architectures. More details in this `repository `__. -``Note: We suggest you to create a different environment and run the following installation command there.`` +.. note:: + + We suggest you to create a different environment and run the following installation command there. .. code:: ipython3 %pip install -q "optimum-intel[openvino,diffusers]" "ipywidgets" -.. parsed-literal:: +.. hint:: - Note: you may need to restart the kernel to use updated packages. + You may need to restart the kernel to use updated packages. Stable Diffusion pipeline should brings 6 elements together, a text @@ -65,11 +67,13 @@ you have integrated GPU (iGPU) and discrete GPU (dGPU), it will show If you just have either an iGPU or dGPU that will be assigned to ``"GPU"`` -Note: For more details about GPU with OpenVINO visit this -`link `__. -If you have been facing any issue in Ubuntu 20.04 or Windows 11 read -this -`blog `__. +.. note:: + + For more details about GPU with OpenVINO visit this + `link `__. + If you have been facing any issue in Ubuntu 20.04 or Windows 11 read + this + `blog `__. .. code:: ipython3 diff --git a/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst index 2c54a64fc99d5c..fc0468612224fe 100644 --- a/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst @@ -13,15 +13,18 @@ including being able to use more data, employ more training, and has less restrictive filtering of the dataset. All of these features give us promising results for selecting a wide range of input text prompts! -**Note:** This is a shorter version of the -`236-stable-diffusion-v2-text-to-image `__ -notebook for demo purposes and to get started quickly. This version does -not have the full implementation of the helper utilities needed to -convert the models from PyTorch to ONNX to OpenVINO, and the OpenVINO -``OVStableDiffusionPipeline`` within the notebook directly. If you would -like to see the full implementation of stable diffusion for text to -image, please visit -`236-stable-diffusion-v2-text-to-image `__. +.. note:: + + This is a shorter version of the + `236-stable-diffusion-v2-text-to-image `__ + notebook for demo purposes and to get started quickly. This version does + not have the full implementation of the helper utilities needed to + convert the models from PyTorch to ONNX to OpenVINO, and the OpenVINO + ``OVStableDiffusionPipeline`` within the notebook directly. If you would + like to see the full implementation of stable diffusion for text to + image, please visit + `236-stable-diffusion-v2-text-to-image `__. + **Table of contents**: diff --git a/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst index e1196e0625a96c..826dc04d7ee881 100644 --- a/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst @@ -73,10 +73,13 @@ Notebook contains the following steps: API. 3. Run Stable Diffusion v2 Text-to-Image pipeline with OpenVINO. -**Note:** This is the full version of the Stable Diffusion text-to-image -implementation. If you would like to get started and run the notebook -quickly, check out `236-stable-diffusion-v2-text-to-image-demo -notebook `__. +.. note:: + + This is the full version of the Stable Diffusion text-to-image + implementation. If you would like to get started and run the notebook + quickly, check out `236-stable-diffusion-v2-text-to-image-demo + notebook `__. + **Table of contents**: @@ -380,8 +383,10 @@ When running Text-to-Image pipeline, we will see that we **only need the VAE decoder**, but preserve VAE encoder conversion, it will be useful in next chapter of our tutorial. -Note: This process will take a few minutes and use significant amount of -RAM (recommended at least 32GB). +.. note:: + + This process will take a few minutes and use significant amount of RAM (recommended at least 32GB). + .. code:: ipython3 @@ -964,9 +969,12 @@ Now, you can define a text prompts for image generation and run inference pipeline. Optionally, you can also change the random generator seed for latent state initialization and number of steps. - **Note**: Consider increasing ``steps`` to get more precise results. +.. note:: + + Consider increasing ``steps`` to get more precise results. A suggested value is ``50``, but it will take longer time to process. + .. code:: ipython3 import gradio as gr diff --git a/docs/notebooks/237-segment-anything-with-output.rst b/docs/notebooks/237-segment-anything-with-output.rst index a8109bf23ee053..454adae0660af3 100644 --- a/docs/notebooks/237-segment-anything-with-output.rst +++ b/docs/notebooks/237-segment-anything-with-output.rst @@ -1425,7 +1425,9 @@ result, we will use a ``mixed`` quantization preset. It provides symmetric quantization of weights and asymmetric quantization of activations. - **Note**: Model post-training quantization is time-consuming process. +.. note:: + + Model post-training quantization is time-consuming process. Be patient, it can take several minutes depending on your hardware. .. code:: ipython3 diff --git a/docs/notebooks/239-image-bind-convert-with-output.rst b/docs/notebooks/239-image-bind-convert-with-output.rst index a4407207b50321..bc4a983a5a21ba 100644 --- a/docs/notebooks/239-image-bind-convert-with-output.rst +++ b/docs/notebooks/239-image-bind-convert-with-output.rst @@ -149,6 +149,8 @@ Currently, there is only one ImageBind model available for downloading, ``imagebind_huge``, more details about it can be found in `model card `__. +.. note:: + Please note, depending on internet connection speed, the model downloading process can take some time. It also requires at least 5 GB of free space on disk for saving model checkpoint. diff --git a/docs/notebooks/240-dolly-2-instruction-following-with-output.rst b/docs/notebooks/240-dolly-2-instruction-following-with-output.rst index ff759a53d6816a..bbc1e2401599b8 100644 --- a/docs/notebooks/240-dolly-2-instruction-following-with-output.rst +++ b/docs/notebooks/240-dolly-2-instruction-following-with-output.rst @@ -427,8 +427,8 @@ Helpers for output parsing `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -Model was retrained to finish generation using special token ``### End`` -the code below find its id for using it as generation stop-criteria. +Model was retrained to finish generation using special token ``### End``. +The code below find its id for using it as generation stop-criteria. .. code:: ipython3 diff --git a/docs/notebooks/242-freevc-voice-conversion-with-output.rst b/docs/notebooks/242-freevc-voice-conversion-with-output.rst index 30c08385bd5037..5fcb41ebaf590d 100644 --- a/docs/notebooks/242-freevc-voice-conversion-with-output.rst +++ b/docs/notebooks/242-freevc-voice-conversion-with-output.rst @@ -44,17 +44,22 @@ Prerequisites `⇑ <#top>`__ ############################################################################################################################### This steps can be done manually or will be performed automatically during the execution of the notebook, but in -minimum necessary scope. 1. Clone this repo: git clone -https://github.com/OlaWod/FreeVC.git. 2. Download -`WavLM-Large `__ -and put it under directory ``FreeVC/wavlm/``. 3. You can download the -`VCTK `__ dataset. For -this example we download only two of them from `Hugging Face FreeVC -example `__. 4. -Download `pretrained -models `__ -and put it under directory ‘checkpoints’ (for current example only -``freevc.pth`` are required). +minimum necessary scope. + +1. Clone this repo: + +.. code-block:: sh + + git clone https://github.com/OlaWod/FreeVC.git + +2. Download `WavLM-Large `__ + and put it under directory ``FreeVC/wavlm/``. +3. You can download the `VCTK `__ dataset. For + this example we download only two of them from + `Hugging Face FreeVC example `__. +4. Download `pretrained models `__ + and put it under directory ‘checkpoints’ (for current example only + ``freevc.pth`` are required). Install extra requirements diff --git a/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst b/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst index 9e2ba45e11eda8..69a2c1eecdde0b 100644 --- a/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst +++ b/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst @@ -534,13 +534,16 @@ using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set \ ``use_popup=True``. - **NOTE**: To use this notebook with a webcam, you need to run the +.. note:: + + To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a remote server (for example, in Binder or Google Colab service), the webcam will not work. By default, the lower cell will run model inference on a video file. If you want to try to live inference on your webcam set ``WEBCAM_INFERENCE = True`` + .. code:: ipython3 WEBCAM_INFERENCE = False diff --git a/docs/notebooks/246-depth-estimation-videpth-with-output.rst b/docs/notebooks/246-depth-estimation-videpth-with-output.rst index b515d590807305..eb91c99950ab26 100644 --- a/docs/notebooks/246-depth-estimation-videpth-with-output.rst +++ b/docs/notebooks/246-depth-estimation-videpth-with-output.rst @@ -1011,17 +1011,17 @@ directories and files which were created during the download process. Concluding notes ~~~~~~~~~~~~~~~~ - 1. The code for this tutorial is adapted from the `VI-Depth - repository `__. - 2. Users may choose to download the original and raw datasets from - the `VOID - dataset `__. - 3. The `isl-org/VI-Depth `__ - works on a slightly older version of released model assets from - its `MiDaS sibling - repository `__. However, the new - releases beginning from - `v3.1 `__ - directly have OpenVINO™ ``.xml`` and ``.bin`` model files as their - assets thereby rendering the **major pre-processing and model - compilation step irrelevant**. +1. The code for this tutorial is adapted from the `VI-Depth + repository `__. +2. Users may choose to download the original and raw datasets from + the `VOID + dataset `__. +3. The `isl-org/VI-Depth `__ + works on a slightly older version of released model assets from + its `MiDaS sibling + repository `__. However, the new + releases beginning from + `v3.1 `__ + directly have OpenVINO™ ``.xml`` and ``.bin`` model files as their + assets thereby rendering the **major pre-processing and model + compilation step irrelevant**. diff --git a/docs/notebooks/247-code-language-id-with-output.rst b/docs/notebooks/247-code-language-id-with-output.rst index 68fb8ba5b90a91..778ddf7e6e1ca4 100644 --- a/docs/notebooks/247-code-language-id-with-output.rst +++ b/docs/notebooks/247-code-language-id-with-output.rst @@ -4,11 +4,11 @@ Programming Language Classification with OpenVINO Overview -------- -This tutorial will be divided in 2 parts: 1. Create a simple inference -pipeline with a pre-trained model using the OpenVINO™ IR format. 2. -Conduct `post-training quantization `__ -on a pre-trained model using Hugging Face Optimum and benchmark -performance. +This tutorial will be divided in 2 parts: + +1. Create a simple inference pipeline with a pre-trained model using the OpenVINO™ IR format. +2. Conduct `post-training quantization `__ + on a pre-trained model using Hugging Face Optimum and benchmark performance. Feel free to use the notebook outline in Jupyter or your IDE for easy navigation. @@ -69,7 +69,7 @@ will allow to automatically convert models to the OpenVINO™ IR format. Install prerequisites ~~~~~~~~~~~~~~~~~~~~~ -First, complete the `repository installation steps <../../README.md>`__. +First, complete the `repository installation steps <../notebooks_installation.html>`__. Then, the following cell will install: - HuggingFace Optimum with OpenVINO support - HuggingFace Evaluate to benchmark results @@ -305,8 +305,10 @@ Define constants and functions Load resources ~~~~~~~~~~~~~~ -NOTE: the base model is loaded using -``AutoModelForSequenceClassification`` from ``Transformers`` +.. note:: + + The base model is loaded using ``AutoModelForSequenceClassification`` from ``Transformers``. + .. code:: ipython3 @@ -330,8 +332,10 @@ Load calibration dataset The ``get_dataset_sample()`` function will sample up to ``num_samples``, with an equal number of examples across the 6 programming languages. -NOTE: Uncomment the method below to download and use the full dataset -(5+ Gb). +.. note:: + + Uncomment the method below to download and use the full dataset (5+ Gb). + .. code:: ipython3 @@ -491,8 +495,9 @@ dataset to quantize and save the model Load quantized model ~~~~~~~~~~~~~~~~~~~~ -NOTE: the argument ``export=True`` is not required since the quantized -model is already in the OpenVINO format. +.. note:: + + The argument ``export=True`` is not required since the quantized model is already in the OpenVINO format. .. code:: ipython3 @@ -531,8 +536,10 @@ Inference on new input using quantized model Load evaluation set ~~~~~~~~~~~~~~~~~~~ -NOTE: Uncomment the method below to download and use the full dataset -(5+ Gb). +.. note:: + + Uncomment the method below to download and use the full dataset (5+ Gb). + .. code:: ipython3 diff --git a/docs/notebooks/248-stable-diffusion-xl-with-output.rst b/docs/notebooks/248-stable-diffusion-xl-with-output.rst index 5631ca77c0ce3d..457c66ce5399b9 100644 --- a/docs/notebooks/248-stable-diffusion-xl-with-output.rst +++ b/docs/notebooks/248-stable-diffusion-xl-with-output.rst @@ -62,9 +62,9 @@ The tutorial consists of the following steps: Optimum `__. - Run 2-stages Stable Diffusion XL pipeline -.. +.. note:: - **Note**: Some demonstrated models can require at least 64GB RAM for + Some demonstrated models can require at least 64GB RAM for conversion and running. **Table of contents**: diff --git a/docs/notebooks/252-fastcomposer-image-generation-with-output.rst b/docs/notebooks/252-fastcomposer-image-generation-with-output.rst new file mode 100644 index 00000000000000..891e1dd364663c --- /dev/null +++ b/docs/notebooks/252-fastcomposer-image-generation-with-output.rst @@ -0,0 +1,1067 @@ +`FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention `__ +===================================================================================================================== + +.. _top: + +FastComposer uses subject embeddings extracted by an image encoder to +augment the generic text conditioning in diffusion models, enabling +personalized image generation based on subject images and textual +instructions with only forward passes. Moreover it addresses two +problems: + +- **The identity blending problem.** To address the problem in the + multi-subject generation it proposes cross-attention localization + supervision during training, enforcing the attention of reference + subjects localized to the correct regions in the target images. + +- **Subject overfitting.** Naively conditioning on subject embeddings + results in subject overfitting. FastComposer proposes delayed subject + conditioning in the denoising step to maintain both identity and + editability in subject-driven image generation. + +FastComposer generates images of multiple unseen individuals with +different styles, actions, and contexts. + +.. image:: 252-fastcomposer-image-generation-with-output_files/multi-subject.png + +.. note:: + + ``model.py`` is slightly changed ``model.py`` from + fastcomposer repository. There are two main changes: - some unused + lines of code are removed to avoid errors if there are no CUDA + drivers in the system - changes to have compatibility with + transformers >= 4.30.1 (due to security vulnerability) + +**Table of contents**: + +- `Install Prerequisites <#install-prerequisites>`__ +- `Convert models to OpenVINO Intermediate representation (IR) format <#convert-models-to-openvino-intermediate-representation-ir-format>`__ +- `Convert text_encoder <#convert-text_encoder>`__ +- `The Object Transform <#the-object-transform>`__ +- `The Image Encoder <#the-image-encoder>`__ +- `Postfuse module <#postfuse-module>`__ +- `Convert Unet <#convert-unet>`__ +- `Rebuild pipeline <#rebuild-pipeline>`__ +- `Inference <#inference>`__ +- `Run Gradio <#run-gradio>`__ + +.. important:: + + This tutorial requires about 25-28GB of free memory to generate one image. Each extra image requires ~11GB of free memory. + + +Install Prerequisites `⇑ <#top>`__ +############################################################################################################################### + +Install required packages. + +.. code:: ipython2 + + !pip install -q --upgrade pip + !pip install -q torch torchvision huggingface-hub + !pip install -q transformers accelerate "diffusers==0.16.1" gradio + !pip install -q "openvino==2023.1.0.dev20230811" + +Clone FastComposer project from GitHub + +.. code:: ipython2 + + from pathlib import Path + + + # clone FastComposer repo + if not Path("fastcomposer").exists(): + !git clone https://github.com/mit-han-lab/fastcomposer.git + else: + print("FastComposer repo already cloned") + +Download pretrained model. + +.. code:: ipython2 + + from huggingface_hub import hf_hub_download + + + model_path = hf_hub_download(repo_id='mit-han-lab/fastcomposer', filename='pytorch_model.bin') + +Convert models to OpenVINO Intermediate representation (IR) format `⇑ <#top>`__ +############################################################################################################################### + +Define a configuration and make instance of ``FastComposerModel``. + +.. code:: ipython2 + + from dataclasses import dataclass + + import torch + + from model import FastComposerModel + + + @dataclass() + class Config: + finetuned_model_path = str(model_path) + image_encoder_name_or_path = 'openai/clip-vit-large-patch14' + localization_layers = 5 + mask_loss = False + mask_loss_prob = 0.5 + non_ema_revision = None + object_localization = False + object_localization_weight = 0.01 + object_resolution = 256 + pretrained_model_name_or_path = 'runwayml/stable-diffusion-v1-5' + revision = None + + + config = Config() + model = FastComposerModel.from_pretrained(config) + model.load_state_dict(torch.load(config.finetuned_model_path, map_location="cpu"), strict=False) + +Pipeline consist of next models: ``Unet``, ``TextEncoder``, +``ImageEncoder`` and ``PostfuseModule`` (MLP), ``object_transforms`` . + +.. figure:: https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/1d858a65-e7c7-43f8-83df-1e896d745725 + :alt: inference-pipeline + + inference-pipeline + +So, convert the models into OpenVINO IR format. + +Convert text_encoder `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Model components are PyTorch modules, that can be converted with +openvino.convert_model function directly. We also use +openvino.save_model function to serialize the result of conversion. +Let’s create a helper function. + +.. code:: ipython2 + + import gc + import openvino + + + def convert(model: torch.nn.Module, xml_path: str, example_input): + xml_path = Path(xml_path) + if not xml_path.exists(): + xml_path.parent.mkdir(parents=True, exist_ok=True) + with torch.no_grad(): + converted_model = openvino.convert_model(model, example_input=example_input) + openvino.save_model(converted_model, xml_path) + + # cleanup memory + torch._C._jit_clear_class_registry() + torch.jit._recursive.concrete_type_store = torch.jit._recursive.ConcreteTypeStore() + torch.jit._state._clear_class_state() + +The text encoder is responsible for converting the input prompt into an +embedding space that can be fed to the next stage’s U-Net. Typically, it +is a transformer-based encoder that maps a sequence of input tokens to a +sequence of text embeddings. + +The input for the text encoder consists of a tensor ``input_ids``, which +contains token indices from the text processed by the tokenizer and +padded to the maximum length accepted by the model. + +.. code:: ipython2 + + text_encoder_ir_xml_path = Path('models/text_encoder_ir.xml') + example_input = torch.zeros((1, 77), dtype=torch.int64) + + model.text_encoder.eval() + convert(model.text_encoder, text_encoder_ir_xml_path, example_input) + + del model.text_encoder + gc.collect(); + +The Object Transform `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +It pads an incoming user image to +square and resize it. An input is a tensor of size [3, height, width]. + +.. code:: ipython2 + + from collections import OrderedDict + from torchvision import transforms as T + from fastcomposer.fastcomposer.transforms import PadToSquare + + + object_transforms = torch.nn.Sequential( + OrderedDict( + [ + ("pad_to_square", PadToSquare(fill=0, padding_mode="constant")), + ( + "resize", + T.Resize( + (config.object_resolution, config.object_resolution), + interpolation=T.InterpolationMode.BILINEAR, + antialias=True, + ), + ), + ("convert_to_float", T.ConvertImageDtype(torch.float32)), + ] + ) + ) + + object_transforms_ir_xml_path = Path('models/object_transforms_ir.xml') + example_input = torch.zeros([3, 1500, 1453], dtype=torch.uint8) + + object_transforms.eval() + convert(object_transforms, object_transforms_ir_xml_path, example_input) + + del object_transforms + gc.collect(); + +The Image Encoder `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +The image encoder is a CLIP +(Contrastive Language-Image Pretraining) Image Encoder. It takes a +transformed image from the previous step as input and transforms it into +a high-dimensional vector or embeddings. + +.. code:: ipython2 + + image_encoder_ir_xml_path = Path('models/image_encoder_ir.xml') + example_input = torch.zeros((1, 2, 3, 256, 256), dtype=torch.float32) + + model.image_encoder.eval() + convert(model.image_encoder, image_encoder_ir_xml_path, example_input) + + del model.image_encoder + gc.collect(); + +Postfuse module `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +On this step it is employed a multilayer +perceptron (MLP) to augment the text embeddings with visual features +extracted from the reference subjects. The Postfuse module concatenates +the word embeddings with the visual features and feeds the resulting +augmented embeddings into the MLP. + +.. code:: ipython2 + + postfuse_module_ir_xml_path = Path('models/postfuse_module_ir.xml') + + example_input = [ + torch.zeros((1, 77, 768), dtype=torch.float32), + torch.zeros((1, 2, 1, 768), dtype=torch.float32), + torch.zeros((1, 77), dtype=torch.bool), + torch.zeros((1,), dtype=torch.int64) + ] + + model.postfuse_module.eval() + convert(model.postfuse_module, postfuse_module_ir_xml_path, example_input) + + del model.postfuse_module + gc.collect(); + +Convert Unet `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +U-Net model gradually denoises latent image +representation guided by text encoder hidden state. + +.. code:: ipython2 + + unet_ir_xml_path = Path('models/unet_ir.xml') + + example_input = [ + torch.zeros((8, 4, 64, 64), dtype=torch.float32), + torch.zeros((), dtype=torch.int64), + torch.zeros((8, 77, 768), dtype=torch.float32) + ] + model.unet.eval() + convert(model.unet, unet_ir_xml_path, example_input) + + + del model + del example_input + + gc.collect() + +Rebuild pipeline `⇑ <#top>`__ +############################################################################################################################### + +Also, it needs to modify some internal +FastComposer entities, to use OpenVINO models. First of all, how to get +results. For example, to convert outputs from numpy to torch types. + +.. code:: ipython2 + + import numpy as np + from diffusers.pipelines.stable_diffusion.safety_checker import StableDiffusionSafetyChecker + from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput + from diffusers.pipelines.stable_diffusion import StableDiffusionPipeline + from diffusers.loaders import TextualInversionLoaderMixin + from diffusers.models import AutoencoderKL, UNet2DConditionModel + from typing import Any, Callable, Dict, List, Optional, Union + from diffusers.schedulers import KarrasDiffusionSchedulers + from transformers import CLIPImageProcessor, CLIPTokenizer + from PIL import Image + + from model import FastComposerTextEncoder + + + class StableDiffusionFastCompposerPipeline(StableDiffusionPipeline): + r""" + Pipeline for text-to-image generation using FastComposer (https://arxiv.org/abs/2305.10431). + + This model inherits from [`StableDiffusionPipeline`]. Check the superclass documentation for the generic methods the + library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.) + """ + def __init__( + self, + vae: AutoencoderKL, + text_encoder: FastComposerTextEncoder, + tokenizer: CLIPTokenizer, + unet: UNet2DConditionModel, + scheduler: KarrasDiffusionSchedulers, + safety_checker: StableDiffusionSafetyChecker, + feature_extractor: CLIPImageProcessor, + requires_safety_checker: bool = True, + ): + super().__init__( + vae, + text_encoder, + tokenizer, + unet, + scheduler, + safety_checker, + feature_extractor, + requires_safety_checker, + ) + + + @torch.no_grad() + def _tokenize_and_mask_noun_phrases_ends(self, caption): + input_ids = self.special_tokenizer.encode(caption) + noun_phrase_end_mask = [False for _ in input_ids] + clean_input_ids = [] + clean_index = 0 + + for i, id in enumerate(input_ids): + if id == self.image_token_id: + noun_phrase_end_mask[clean_index - 1] = True + else: + clean_input_ids.append(id) + clean_index += 1 + + max_len = self.special_tokenizer.model_max_length + + if len(clean_input_ids) > max_len: + clean_input_ids = clean_input_ids[:max_len] + else: + clean_input_ids = clean_input_ids + [self.tokenizer.pad_token_id] * ( + max_len - len(clean_input_ids) + ) + + if len(noun_phrase_end_mask) > max_len: + noun_phrase_end_mask = noun_phrase_end_mask[:max_len] + else: + noun_phrase_end_mask = noun_phrase_end_mask + [False] * ( + max_len - len(noun_phrase_end_mask) + ) + + clean_input_ids = torch.tensor(clean_input_ids, dtype=torch.long) + noun_phrase_end_mask = torch.tensor(noun_phrase_end_mask, dtype=torch.bool) + return clean_input_ids.unsqueeze(0), noun_phrase_end_mask.unsqueeze(0) + + @torch.no_grad() + def _encode_augmented_prompt(self, prompt: str, reference_images: List[Image.Image], device: torch.device, weight_dtype: torch.dtype): + # TODO: check this + # encode reference images + object_pixel_values = [] + for image in reference_images: + image_tensor = torch.from_numpy(np.array(image.convert("RGB"))).permute(2, 0, 1) + image = torch.from_numpy((self.object_transforms(image_tensor)[0])) + object_pixel_values.append(image) + + object_pixel_values = torch.stack(object_pixel_values, dim=0).to(memory_format=torch.contiguous_format).float() + object_pixel_values = object_pixel_values.unsqueeze(0).to(dtype=torch.float32, device=device) + object_embeds = self.image_encoder(object_pixel_values)[0] + object_embeds = torch.from_numpy(object_embeds) + + # augment the text embedding + input_ids, image_token_mask = self._tokenize_and_mask_noun_phrases_ends(prompt) + input_ids, image_token_mask = input_ids.to(device), image_token_mask.to(device) + + num_objects = image_token_mask.sum(dim=1) + + text_embeds = torch.from_numpy(self.text_encoder(input_ids)[0]) + augmented_prompt_embeds = self.postfuse_module([ + text_embeds, + object_embeds, + image_token_mask, + num_objects + ])[0] + return torch.from_numpy(augmented_prompt_embeds) + + def _encode_prompt( + self, + prompt, + device, + num_images_per_prompt, + do_classifier_free_guidance, + negative_prompt=None + ): + r""" + Encodes the prompt into text encoder hidden states. + + Args: + prompt (`str` or `List[str]`, *optional*): + prompt to be encoded + device: (`torch.device`): + torch device + num_images_per_prompt (`int`): + number of images that should be generated per prompt + do_classifier_free_guidance (`bool`): + whether to use classifier free guidance or not + negative_prompt (`str` or `List[str]`, *optional*): + The prompt or prompts not to guide the image generation. If not defined, one has to pass + `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is + less than `1`). + """ + if isinstance(prompt, str): + batch_size = 1 + elif isinstance(prompt, list): + batch_size = len(prompt) + + # textual inversion: procecss multi-vector tokens if necessary + if isinstance(self, TextualInversionLoaderMixin): + prompt = self.maybe_convert_prompt(prompt, self.tokenizer) + + text_inputs = self.tokenizer( + prompt, + padding="max_length", + max_length=self.tokenizer.model_max_length, + truncation=True, + return_tensors="pt", + ) + text_input_ids = text_inputs.input_ids + untruncated_ids = self.tokenizer(prompt, padding="longest", return_tensors="pt").input_ids + + if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal( + text_input_ids, untruncated_ids + ): + removed_text = self.tokenizer.batch_decode( + untruncated_ids[:, self.tokenizer.model_max_length - 1 : -1] + ) + print( + "The following part of your input was truncated because CLIP can only handle sequences up to" + f" {self.tokenizer.model_max_length} tokens: {removed_text}" + ) + + prompt_embeds = self.text_encoder(text_input_ids.to(device))[0] + prompt_embeds = torch.from_numpy(prompt_embeds) + + bs_embed, seq_len, _ = prompt_embeds.shape + # duplicate text embeddings for each generation per prompt, using mps friendly method + prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1) + prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1) + + # get unconditional embeddings for classifier free guidance + if do_classifier_free_guidance: + uncond_tokens: List[str] + if negative_prompt is None: + uncond_tokens = [""] * batch_size + elif type(prompt) is not type(negative_prompt): + raise TypeError( + f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !=" + f" {type(prompt)}." + ) + elif isinstance(negative_prompt, str): + uncond_tokens = [negative_prompt] + elif batch_size != len(negative_prompt): + raise ValueError( + f"`negative_prompt`: {negative_prompt} has batch size {len(negative_prompt)}, but `prompt`:" + f" {prompt} has batch size {batch_size}. Please make sure that passed `negative_prompt` matches" + " the batch size of `prompt`." + ) + else: + uncond_tokens = negative_prompt + + # textual inversion: procecss multi-vector tokens if necessary + if isinstance(self, TextualInversionLoaderMixin): + uncond_tokens = self.maybe_convert_prompt(uncond_tokens, self.tokenizer) + + max_length = prompt_embeds.shape[1] + uncond_input = self.tokenizer( + uncond_tokens, + padding="max_length", + max_length=max_length, + truncation=True, + return_tensors="pt", + ) + + negative_prompt_embeds = self.text_encoder(uncond_input.input_ids.to(device))[0] + negative_prompt_embeds = torch.from_numpy(negative_prompt_embeds) + + if do_classifier_free_guidance: + # duplicate unconditional embeddings for each generation per prompt, using mps friendly method + seq_len = negative_prompt_embeds.shape[1] + + negative_prompt_embeds = negative_prompt_embeds.to(dtype=torch.float32, device=device) + + negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1) + negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1) + + # For classifier free guidance, we need to do two forward passes. + # Here we concatenate the unconditional and text embeddings into a single batch + # to avoid doing two forward passes + prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds]) + + return prompt_embeds + + + @torch.no_grad() + def __call__( + self, + prompt: Union[str, List[str]] = None, + height: Optional[int] = None, + width: Optional[int] = None, + num_inference_steps: int = 50, + guidance_scale: float = 7.5, + negative_prompt: Optional[Union[str, List[str]]] = None, + num_images_per_prompt: Optional[int] = 1, + eta: float = 0.0, + generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, + latents: Optional[torch.FloatTensor] = None, + prompt_embeds: Optional[torch.FloatTensor] = None, + negative_prompt_embeds: Optional[torch.FloatTensor] = None, + output_type: Optional[str] = "pil", + return_dict: bool = True, + callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, + callback_steps: int = 1, + cross_attention_kwargs: Optional[Dict[str, Any]] = None, + alpha_: float = 0.7, + reference_subject_images: List[Image.Image] = None, + augmented_prompt_embeds: Optional[torch.FloatTensor] = None + ): + r""" + Function invoked when calling the pipeline for generation. + + Args: + prompt (`str` or `List[str]`, *optional*): + The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. + instead. + height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor): + The height in pixels of the generated image. + width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor): + The width in pixels of the generated image. + num_inference_steps (`int`, *optional*, defaults to 50): + The number of denoising steps. More denoising steps usually lead to a higher quality image at the + expense of slower inference. + guidance_scale (`float`, *optional*, defaults to 7.5): + Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). + `guidance_scale` is defined as `w` of equation 2. of [Imagen + Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale > + 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, + usually at the expense of lower image quality. + negative_prompt (`str` or `List[str]`, *optional*): + The prompt or prompts not to guide the image generation. If not defined, one has to pass + `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is + less than `1`). + num_images_per_prompt (`int`, *optional*, defaults to 1): + The number of images to generate per prompt. + eta (`float`, *optional*, defaults to 0.0): + Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to + [`schedulers.DDIMScheduler`], will be ignored for others. + generator (`torch.Generator` or `List[torch.Generator]`, *optional*): + One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) + to make generation deterministic. + latents (`torch.FloatTensor`, *optional*): + Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image + generation. Can be used to tweak the same generation with different prompts. If not provided, a latents + tensor will ge generated by sampling using the supplied random `generator`. + prompt_embeds (`torch.FloatTensor`, *optional*): + Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not + provided, text embeddings will be generated from `prompt` input argument. + negative_prompt_embeds (`torch.FloatTensor`, *optional*): + Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt + weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input + argument. + output_type (`str`, *optional*, defaults to `"pil"`): + The output format of the generate image. Choose between + [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`. + return_dict (`bool`, *optional*, defaults to `True`): + Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a + plain tuple. + callback (`Callable`, *optional*): + A function that will be called every `callback_steps` steps during inference. The function will be + called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. + callback_steps (`int`, *optional*, defaults to 1): + The frequency at which the `callback` function will be called. If not specified, the callback will be + called at every step. + cross_attention_kwargs (`dict`, *optional*): + A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under + `self.processor` in + [diffusers.cross_attention](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py). + alpha_ (`float`, defaults to 0.7): + The ratio of subject conditioning. If `alpha_` is 0.7, the beginning 30% of denoising steps use text prompts, while the + last 70% utilize image-augmented prompts. Increase alpha for identity preservation, decrease it for prompt consistency. + reference_subject_images (`List[PIL.Image.Image]`): + a list of PIL images that are used as reference subjects. The number of images should be equal to the number of augmented + tokens in the prompts. + augmented_prompt_embeds: (`torch.FloatTensor`, *optional*): + Pre-generated image augmented text embeddings. If not provided, embeddings will be generated from `prompt` and + `reference_subject_images`. + Examples: + + Returns: + [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`: + [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] if `return_dict` is True, otherwise a `tuple. + When returning a tuple, the first element is a list with the generated images, and the second element is a + list of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work" + (nsfw) content, according to the `safety_checker`. + """ + # 0. Default height and width to unet + height = height or self.unet.config.sample_size * self.vae_scale_factor + width = width or self.unet.config.sample_size * self.vae_scale_factor + + # 1. Check inputs. Raise error if not correct + self.check_inputs( + prompt, + height, + width, + callback_steps, + negative_prompt, + prompt_embeds, + negative_prompt_embeds, + ) + + assert (prompt is not None and reference_subject_images is not None) or (prompt_embeds is not None and augmented_prompt_embeds is not None), \ + "Prompt and reference subject images or prompt_embeds and augmented_prompt_embeds must be provided." + + # 2. Define call parameters + if prompt is not None and isinstance(prompt, str): + batch_size = 1 + elif prompt is not None and isinstance(prompt, list): + batch_size = len(prompt) + else: + batch_size = prompt_embeds.shape[0] + + device = self._execution_device + # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2) + # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1` + # corresponds to doing no classifier free guidance. + do_classifier_free_guidance = guidance_scale > 1.0 + + assert do_classifier_free_guidance + + # 3. Encode input prompt + prompt_text_only = prompt.replace("", "") + + prompt_embeds = self._encode_prompt( + prompt_text_only, + device, + num_images_per_prompt, + do_classifier_free_guidance, + negative_prompt, + ) + + if augmented_prompt_embeds is None: + augmented_prompt_embeds = self._encode_augmented_prompt(prompt, reference_subject_images, device, prompt_embeds.dtype) + augmented_prompt_embeds = augmented_prompt_embeds.repeat(num_images_per_prompt, 1, 1) + + prompt_embeds = torch.cat([prompt_embeds, augmented_prompt_embeds], dim=0) + + # 4. Prepare timesteps + self.scheduler.set_timesteps(num_inference_steps, device=device) + timesteps = self.scheduler.timesteps + + # 5. Prepare latent variables + # num_channels_latents = self.unet.in_channels + num_channels_latents = 4 + latents = self.prepare_latents( + batch_size * num_images_per_prompt, + num_channels_latents, + height, + width, + prompt_embeds.dtype, + device, + generator, + latents, + ) + + start_subject_conditioning_step = (1 - alpha_) * num_inference_steps + + extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta) + ( + null_prompt_embeds, + text_prompt_embeds, + augmented_prompt_embeds + ) = prompt_embeds.chunk(3) + + # 7. Denoising loop + num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order + with self.progress_bar(total=num_inference_steps) as progress_bar: + for i, t in enumerate(timesteps): + latent_model_input = ( + torch.cat([latents] * 2) if do_classifier_free_guidance else latents + ) + latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + + if i <= start_subject_conditioning_step: + current_prompt_embeds = torch.cat( + [null_prompt_embeds, text_prompt_embeds], dim=0 + ) + else: + current_prompt_embeds = torch.cat( + [null_prompt_embeds, augmented_prompt_embeds], dim=0 + ) + + # predict the noise residual + noise_pred = self.unet([ + latent_model_input, + t, + current_prompt_embeds, + # cross_attention_kwargs + ], + )[0] + noise_pred = torch.from_numpy(noise_pred) + + + # perform guidance + if do_classifier_free_guidance: + noise_pred_uncond, noise_pred_text = noise_pred.chunk(2) + noise_pred = noise_pred_uncond + guidance_scale * ( + noise_pred_text - noise_pred_uncond + ) + else: + assert 0, "Not Implemented" + + # compute the previous noisy sample x_t -> x_t-1 + latents = self.scheduler.step( + noise_pred, t, latents, **extra_step_kwargs + ).prev_sample + + # call the callback, if provided + if i == len(timesteps) - 1 or ( + (i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0 + ): + progress_bar.update() + if callback is not None and i % callback_steps == 0: + callback(i, t, latents) + + if output_type == "latent": + image = latents + has_nsfw_concept = None + elif output_type == "pil": + # 8. Post-processing + image = self.decode_latents(latents) + + # 9. Run safety checker + image, has_nsfw_concept = self.run_safety_checker( + image, device, prompt_embeds.dtype + ) + + # 10. Convert to PIL + image = self.numpy_to_pil(image) + else: + # 8. Post-processing + image = self.decode_latents(latents) + + # 9. Run safety checker + image, has_nsfw_concept = self.run_safety_checker( + image, device, prompt_embeds.dtype + ) + + # Offload last model to CPU + if hasattr(self, "final_offload_hook") and self.final_offload_hook is not None: + self.final_offload_hook.offload() + + if not return_dict: + return (image, has_nsfw_concept) + + return StableDiffusionPipelineOutput( + images=image, nsfw_content_detected=has_nsfw_concept + ) + +And replace all model in the pipeline by converted models. + +.. code:: ipython2 + + import PIL + from transformers import CLIPTokenizer + + + def create_pipeline( + args, + *, + text_encoder, + image_encoder, + unet, + object_transforms, + postfuse_module, + device + ): + weight_dtype = torch.float32 + + tokenizer = CLIPTokenizer.from_pretrained( + args.pretrained_model_name_or_path, + subfolder="tokenizer", + revision=args.revision, + ) + tokenizer.add_tokens(["img"], special_tokens=True) + image_token_id = tokenizer.convert_tokens_to_ids("img") + + pipe = StableDiffusionFastCompposerPipeline.from_pretrained( + args.pretrained_model_name_or_path, torch_dtype=weight_dtype + ).to(device) + + pipe.object_transforms = object_transforms + pipe.unet = unet + pipe.text_encoder = text_encoder + pipe.postfuse_module = postfuse_module + pipe.image_encoder = image_encoder + pipe.image_token_id = image_token_id + pipe.special_tokenizer = tokenizer + + return pipe + + + class ModelWrapper: + def __init__(self, model): + super().__init__() + self.model = model + + def inference( + self, + image1: PIL.Image.Image, + image2: PIL.Image.Image, + prompt: str, + negative_prompt: str, + seed: int, + guidance_scale: float, + alpha_: float, + num_steps: int, + num_images: int, + ): + print("Running model inference...") + image = [] + if image1 is not None: + image.append(image1) + + if image2 is not None: + image.append(image2) + + if len(image) == 0: + return [], "You need to upload at least one image." + + num_subject_in_text = ( + np.array(self.model.special_tokenizer.encode(prompt)) + == self.model.image_token_id + ).sum() + if num_subject_in_text != len(image): + return ( + [], + f"Number of subjects in the text description doesn't match the number of reference images, #text subjects: {num_subject_in_text} #reference image: {len(image)}", + ) + + if seed == -1: + seed = np.random.randint(0, 1000000) + + generator = torch.manual_seed(seed) + + return ( + self.model( + prompt=prompt, + negative_prompt=negative_prompt, + height=512, + width=512, + num_inference_steps=num_steps, + guidance_scale=guidance_scale, + num_images_per_prompt=num_images, + generator=generator, + alpha_=alpha_, + reference_subject_images=image, + ).images, + "run successfully", + ) + + + core = openvino.Core() + compiled_unet = core.compile_model(unet_ir_xml_path) + compiled_text_encoder = core.compile_model(text_encoder_ir_xml_path) + compiled_image_encoder = core.compile_model(image_encoder_ir_xml_path) + compiled_postfuse_module = core.compile_model(postfuse_module_ir_xml_path) + compiled_object_transforms = core.compile_model(object_transforms_ir_xml_path) + + wrapped_model = ModelWrapper( + create_pipeline( + config, + text_encoder=compiled_text_encoder, + image_encoder=compiled_image_encoder, + unet=compiled_unet, + object_transforms=compiled_object_transforms, + postfuse_module=compiled_postfuse_module, + device='cpu' + ) + ) + +Inference `⇑ <#top>`__ +############################################################################################################################### + +And now it is possible to make inference. You +can provide 1 or 2 images (``image1`` and ``image2``). If you want to +provide only one image pass in inference ``None`` instead image. +``prompt`` describes context in what objects from user images will be +generated. Word ``img`` is a token that correlates with input images. + +.. code:: ipython2 + + image1 = Image.open('fastcomposer/data/newton_einstein/einstein/0.png') + image2 = Image.open('fastcomposer/data/newton_einstein/newton/0.png') + prompt = 'A man img and a man img sitting in a park' + negative_prompt = '((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))' + alpha_ = 0.7 + num_images = 1 # each extra image requires ~11GB of free memory + num_steps = 50 + guidance_scale = 5 + seed = -1 + + + result = wrapped_model.inference( + image1, + image2, + prompt, + negative_prompt, + seed, + guidance_scale, + alpha_, + num_steps, + num_images + ) + +Result consists of several (``num_images``) images and now it possible +to display them. + +.. code:: ipython2 + + display(result[0][0]) + +Run Gradio `⇑ <#top>`__ +############################################################################################################################### + +Also, it is possible to run with Gradio + +.. code:: ipython2 + + import gradio as gr + + + def create_demo(): + TITLE = "# [FastComposer Demo](https://github.com/mit-han-lab/fastcomposer) with OpenVINO" + + DESCRIPTION = """To run the demo, you should: + 1. Upload your images. The order of image1 and image2 needs to match the order of the subects in the prompt. You only need 1 image for single subject generation. + 2. Input proper text prompts, such as "A woman img and a man img in the snow" or "A painting of a man img in the style of Van Gogh", where "img" specifies the token you want to augment and comes after the word. + 3. Click the Run button. You can also adjust the hyperparameters to improve the results. Look at the job status to see if there are any errors with your input. + As a result, pictures with person or persons from input images will be generated in accordance with the description in the prompt. + """ + + with gr.Blocks() as demo: + gr.Markdown(TITLE) + gr.Markdown(DESCRIPTION) + with gr.Row(): + with gr.Column(): + with gr.Box(): + image1 = gr.Image(label="Image 1", type="pil") + gr.Examples( + examples=["fastcomposer/data/newton.jpeg"], + inputs=image1, + ) + image2 = gr.Image(label="Image 2", type="pil") + gr.Examples( + examples=["fastcomposer/data/einstein.jpeg"], + inputs=image2, + ) + gr.Markdown("Upload the image for your subject") + + prompt = gr.Text( + value="A man img and a man img sitting in a park", + label="Prompt", + placeholder='e.g. "A woman img and a man img in the snow", "A painting of a man img in the style of Van Gogh"', + info='Use "img" to specify the word you want to augment.', + ) + negative_prompt = gr.Text( + value="((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))", + label="Negative Prompt", + info='Features that you want to avoid.', + ) + alpha_ = gr.Slider( + label="alpha", + minimum=0, + maximum=1, + step=0.05, + value=0.75, + info="A smaller alpha aligns images with text better, but may deviate from the subject image. Increase alpha to improve identity preservation, decrease it for prompt consistency.", + ) + num_images = gr.Slider( + label="Number of generated images", + minimum=1, + maximum=8, + step=1, + value=1, + info="Each extra image requires ~11GB of free memory.", + ) + run_button = gr.Button("Run") + with gr.Accordion(label="Advanced options", open=False): + seed = gr.Slider( + label="Seed", + minimum=-1, + maximum=1000000, + step=1, + value=-1, + info="If set to -1, a different seed will be used each time.", + ) + guidance_scale = gr.Slider( + label="Guidance scale", + minimum=1, + maximum=10, + step=1, + value=5, + ) + num_steps = gr.Slider( + label="Steps", + minimum=1, + maximum=300, + step=1, + value=50, + ) + with gr.Column(): + result = gr.Gallery(label="Generated Images").style( + grid=[2], height="auto" + ) + error_message = gr.Text(label="Job Status") + + inputs = [ + image1, + image2, + prompt, + negative_prompt, + seed, + guidance_scale, + alpha_, + num_steps, + num_images, + ] + run_button.click( + fn=wrapped_model.inference, inputs=inputs, outputs=[result, error_message] + ) + return demo + + + demo = create_demo() + + if __name__ == "__main__": + try: + demo.launch(debug=True) + except Exception: + demo.launch(share=True, debug=True) + # if you are launching remotely, specify server_name and server_port + # demo.launch(server_name='your server name', server_port='server port in int') + # Read more in the docs: https://gradio.app/docs/ diff --git a/docs/notebooks/252-fastcomposer-image-generation-with-output_files/multi-subject.png b/docs/notebooks/252-fastcomposer-image-generation-with-output_files/multi-subject.png new file mode 100644 index 00000000000000..306c414299d0f6 --- /dev/null +++ b/docs/notebooks/252-fastcomposer-image-generation-with-output_files/multi-subject.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8760283d06f1b29e26a3f684c22afe65d809208a2cd624b70acda3e6a9b87a1f +size 16854851 diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst index dec5b056bff10c..353297f180564d 100644 --- a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst +++ b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst @@ -331,8 +331,10 @@ OpenVINO with minimal accuracy drop. Create a quantized model from the pre-trained FP32 model and the calibration dataset. The optimization process contains the following -steps: 1. Create a Dataset for quantization. 2. Run nncf.quantize for -getting an optimized model. +steps: + +1. Create a Dataset for quantization. +2. Run ``nncf.quantize`` for getting an optimized model. The validation dataset already defined in the training notebook. @@ -601,10 +603,13 @@ In the next cells, inference speed will be measured for the original and quantized model on CPU. If an iGPU is available, inference speed will be measured for CPU+GPU as well. The number of seconds is set to 15. - **NOTE**: For the most accurate performance estimation, it is +.. note:: + + For the most accurate performance estimation, it is recommended to run ``benchmark_app`` in a terminal/command prompt after closing other applications. + .. code:: ipython3 # print the available devices on this system diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output.rst b/docs/notebooks/301-tensorflow-training-openvino-with-output.rst index 8350f1843f48b4..53b511021f88dc 100644 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output.rst +++ b/docs/notebooks/301-tensorflow-training-openvino-with-output.rst @@ -397,8 +397,10 @@ range by using a Rescaling layer. normalization_layer = layers.Rescaling(1./255) -Note: The Keras Preprocessing utilities and layers introduced in this -section are currently experimental and may change. + +.. note:: + + The Keras Preprocessing utilities and layers introduced in this section are currently experimental and may change. There are two ways to use this layer. You can apply it to the dataset by calling map: @@ -428,11 +430,13 @@ calling map: Or, you can include the layer inside your model definition, which can simplify deployment. Let’s use the second approach here. -Note: you previously resized images using the ``image_size`` argument of -``image_dataset_from_directory``. If you want to include the resizing -logic in your model as well, you can use the -`Resizing `__ -layer. +.. note:: + + You previously resized images using the ``image_size`` argument of + ``image_dataset_from_directory``. If you want to include the resizing + logic in your model as well, you can use the + `Resizing `__ + layer. Create the Model `⇑ <#top>`__ ############################################################################################################################### @@ -482,7 +486,9 @@ Model Summary `⇑ <#top>`__ View all the layers of the network using the model’s ``summary`` method. - **NOTE:** This section is commented out for performance reasons. +.. note:: + + This section is commented out for performance reasons. Please feel free to uncomment these to compare the results. .. code:: ipython3 @@ -816,8 +822,10 @@ Predict on New Data `⇑ <#top>`__ Finally, let us use the model to classify an image that was not included in the training or validation sets. - **Note**: Data augmentation and Dropout layers are inactive at - inference time. +.. note:: + + Data augmentation and Dropout layers are inactive at inference time. + .. code:: ipython3 diff --git a/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst b/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst index 693329f641dffa..766537b933d2df 100644 --- a/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst +++ b/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst @@ -29,7 +29,10 @@ notebook. Using the smaller model and dataset will speed up training and download time. To see other ResNet models, visit `PyTorch hub `__. - **NOTE**: This notebook requires a C++ compiler. +.. note:: + + This notebook requires a C++ compiler. + **Table of contents**: @@ -58,7 +61,9 @@ for the model, and the image width and height that will be used for the network. Also define paths where PyTorch, ONNX and OpenVINO IR versions of the models will be stored. - **NOTE**: All NNCF logging messages below ERROR level (INFO and +.. note:: + + All NNCF logging messages below ERROR level (INFO and WARNING) are disabled to simplify the tutorial. For production use, it is recommended to enable logging by removing ``set_log_level(logging.ERROR)``. @@ -732,7 +737,9 @@ Benchmark Tool runs inference for 60 seconds in asynchronous mode on CPU. It returns inference speed as latency (milliseconds per image) and throughput (frames per second) values. - **NOTE**: This notebook runs ``benchmark_app`` for 15 seconds to give +.. note:: + + This notebook runs ``benchmark_app`` for 15 seconds to give a quick indication of performance. For more accurate performance, it is recommended to run ``benchmark_app`` in a terminal/command prompt after closing other applications. Run @@ -741,6 +748,7 @@ throughput (frames per second) values. ``benchmark_app --help`` to see an overview of all command-line options. + .. code:: ipython3 def parse_benchmark_output(benchmark_output): diff --git a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst index 85acba00ec728b..b4673eb4c3e4bf 100644 --- a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst +++ b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst @@ -41,11 +41,14 @@ Import NNCF and all auxiliary packages from your Python code. Set a name for the size, used batch size, and the learning rate. Also, define paths where Frozen Graph and OpenVINO IR versions of the models will be stored. - **NOTE**: All NNCF logging messages below ERROR level (INFO and +.. note:: + + All NNCF logging messages below ERROR level (INFO and WARNING) are disabled to simplify the tutorial. For production use, it is recommended to enable logging by removing ``set_log_level(logging.ERROR)``. + .. code:: ipython3 !pip install -q "openvino-dev>=2023.0.0" "nncf>=2.5.0" @@ -261,10 +264,13 @@ Pre-train a Floating-Point Model `⇑ <#top>`__ Using NNCF for model compression assumes that the user has a pre-trained model and a training pipeline. - **NOTE** For the sake of simplicity of the tutorial, it is +.. note:: + + For the sake of simplicity of the tutorial, it is recommended to skip ``FP32`` model training and load the weights that are provided. + .. code:: ipython3 # Load the floating-point weights. @@ -471,7 +477,9 @@ Benchmark Tool runs inference for 60 seconds in asynchronous mode on CPU. It returns inference speed as latency (milliseconds per image) and throughput (frames per second) values. - **NOTE**: This notebook runs ``benchmark_app`` for 15 seconds to give +.. note:: + + This notebook runs ``benchmark_app`` for 15 seconds to give a quick indication of performance. For more accurate performance, it is recommended to run ``benchmark_app`` in a terminal/command prompt after closing other applications. Run @@ -480,6 +488,7 @@ throughput (frames per second) values. ``benchmark_app --help`` to see an overview of all command-line options. + .. code:: ipython3 serialize(model_ir_fp32, str(fp32_ir_path)) diff --git a/docs/notebooks/402-pose-estimation-with-output.rst b/docs/notebooks/402-pose-estimation-with-output.rst index f6f9773a6b36eb..fbee0c5e4708fa 100644 --- a/docs/notebooks/402-pose-estimation-with-output.rst +++ b/docs/notebooks/402-pose-estimation-with-output.rst @@ -11,10 +11,12 @@ Zoo `__. Final part of this notebook shows live inference results from a webcam. Additionally, you can also upload a video file. - **NOTE**: To use a webcam, you must run this Jupyter notebook on a +.. note:: + + To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a server, the webcam will not work. However, you can still do inference on a video in the final - step. + step. **Table of contents**: @@ -73,7 +75,10 @@ selected model. If you want to download another model, replace the name of the model and precision in the code below. - **NOTE**: This may require a different pose decoder. +.. note:: + + This may require a different pose decoder. + .. code:: ipython3 @@ -421,12 +426,15 @@ using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set ``use_popup=True``. - **NOTE**: To use this notebook with a webcam, you need to run the +.. note:: + + To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a server (for example, Binder), the webcam will not work. Popup mode may not work if you run this notebook on a remote computer (for example, Binder). + Run the pose estimation: .. code:: ipython3 diff --git a/docs/notebooks/403-action-recognition-webcam-with-output.rst b/docs/notebooks/403-action-recognition-webcam-with-output.rst index 7a08d4f335d831..d0cb4b74b57b00 100644 --- a/docs/notebooks/403-action-recognition-webcam-with-output.rst +++ b/docs/notebooks/403-action-recognition-webcam-with-output.rst @@ -153,10 +153,13 @@ This tutorial uses `Kinetics-400 dataset `__, and also provides the text file embedded into this notebook. - **NOTE**: If you want to run +.. note:: + + If you want to run ``"driver-action-recognition-adas-0002"`` model, replace the ``kinetics.txt`` file to ``driver_actions.txt``. + .. code:: ipython3 labels = "../data/text/kinetics.txt" @@ -664,11 +667,14 @@ Run Action Recognition Using a Webcam `⇑ <#top>`__ Now, try to see yourself in your webcam. - **NOTE**: To use a webcam, you must run this Jupyter notebook on a +.. note:: + + To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a server, the webcam will not work. However, you can still do inference on a video file in the final step. + .. code:: ipython3 run_action_recognition(source=0, flip=False, use_popup=False, skip_first_frames=0) diff --git a/docs/notebooks/404-style-transfer-with-output.rst b/docs/notebooks/404-style-transfer-with-output.rst index 4854386268c1d5..7c5d9c1022830d 100644 --- a/docs/notebooks/404-style-transfer-with-output.rst +++ b/docs/notebooks/404-style-transfer-with-output.rst @@ -397,11 +397,14 @@ starting at 0. Set ``flip=True`` when using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set ``use_popup=True``. - **NOTE**: To use a webcam, you must run this Jupyter notebook on a +.. note:: + + To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run it on a server, you will not be able to access the webcam. However, you can still perform inference on a video file in the final step. + .. code:: ipython3 run_style_transfer(source=0, flip=True, use_popup=False) diff --git a/docs/notebooks/406-3D-pose-estimation-with-output.rst b/docs/notebooks/406-3D-pose-estimation-with-output.rst index 82a00c18827eb3..9038ce3098118c 100644 --- a/docs/notebooks/406-3D-pose-estimation-with-output.rst +++ b/docs/notebooks/406-3D-pose-estimation-with-output.rst @@ -16,17 +16,19 @@ extension `__\ **and been using JupyterLab to run the demo as suggested in the ``README.md``** - **NOTE**: *To use a webcam, you must run this Jupyter notebook on a +.. note:: + + To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a remote server, the webcam will not work. However, you can still do inference on a video file in the final step. This demo utilizes the Python interface in ``Three.js`` integrated with WebGL to process data from the model inference. These results are processed and displayed in the - notebook.* + notebook. -*To ensure that the results are displayed correctly, run the code in a -recommended browser on one of the following operating systems:* *Ubuntu, -Windows: Chrome* *macOS: Safari* +To ensure that the results are displayed correctly, run the code in a +recommended browser on one of the following operating systems: Ubuntu, +Windows: Chrome, macOS: Safari. **Table of contents**: @@ -178,7 +180,7 @@ directory structure and downloads the selected model. Convert Model to OpenVINO IR format `⇑ <#top>`__ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - The selected model +The selected model comes from the public directory, which means it must be converted into OpenVINO Intermediate Representation (OpenVINO IR). We use ``omz_converter`` to convert the ONNX format model to the OpenVINO IR @@ -588,7 +590,7 @@ using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set ``use_popup=True``. - **NOTE**: +.. note:: *1. To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a server @@ -597,6 +599,7 @@ Firefox, may cause flickering. If you experience flickering, set *2. Popup mode may not work if you run this notebook on a remote computer (e.g. Binder).* + Using the following method, you can click and move your mouse over the picture on the left to interact. diff --git a/docs/notebooks/407-person-tracking-with-output.rst b/docs/notebooks/407-person-tracking-with-output.rst index bb12d03b91059a..abc808bb273289 100644 --- a/docs/notebooks/407-person-tracking-with-output.rst +++ b/docs/notebooks/407-person-tracking-with-output.rst @@ -252,10 +252,12 @@ Load model `⇑ <#top>`__ Define a common class for model loading and predicting. There are four main steps for OpenVINO model initialization, and they -are required to run for only once before inference loop. 1. Initialize -OpenVINO Runtime. 2. Read the network from ``*.bin`` and ``*.xml`` files -(weights and architecture). 3. Compile the model for device. 4. Get -input and output names of nodes. +are required to run for only once before inference loop. + +1. Initialize OpenVINO Runtime. +2. Read the network from ``*.bin`` and ``*.xml`` files (weights and architecture). +3. Compile the model for device. +4. Get input and output names of nodes. In this case, we can put them all in a class constructor function. @@ -344,10 +346,12 @@ Select device from dropdown list for running inference using OpenVINO: Data Processing `⇑ <#top>`__ ############################################################################################################################### -Data Processing includes data preprocess and postprocess functions. - Data preprocess function is used to change -the layout and shape of input data, according to requirement of the -network input format. - Data postprocess function is used to extract the -useful information from network’s original output and visualize it. +Data Processing includes data preprocess and postprocess functions. + +- Data preprocess function is used to change the layout and shape of input data, + according to requirement of the network input format. +- Data postprocess function is used to extract the useful information from + network’s original output and visualize it. .. code:: ipython3 diff --git a/docs/tutorials.md b/docs/tutorials.md index 2b0a08e47d147f..234d529a4e99cf 100644 --- a/docs/tutorials.md +++ b/docs/tutorials.md @@ -87,7 +87,7 @@ Tutorials that explain how to optimize and quantize models with OpenVINO tools. +------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ | `103-paddle-onnx-to-openvino `__ |br| |n103| | Convert PaddlePaddle models to OpenVINO IR. | |n103-img1| | +------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `104-model-tools `__ |br| |n104| | Download, convert and benchmark models from Open Model Zoo. | |n104-img1| | +| `121-convert-to-openvino `__ |br| |n121| |br| |c121| | Learn OpenVINO model conversion API | | +------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ .. dropdown:: Explore more notebooks here. @@ -97,6 +97,8 @@ Tutorials that explain how to optimize and quantize models with OpenVINO tools. +====================================================================================================================================================+==================================================================================================================================+ | `102-pytorch-onnx-to-openvino `__ | Convert PyTorch models to OpenVINO IR. | +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ + | `104-model-tools `__ |br| |n104| | Download, convert and benchmark models from Open Model Zoo. | + +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ | `105-language-quantize-bert `__ | Optimize and quantize a pre-trained BERT model. | +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ | `106-auto-device `__ |br| |n106| | Demonstrates how to use AUTO Device. | @@ -129,8 +131,6 @@ Tutorials that explain how to optimize and quantize models with OpenVINO tools. +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ | `120-tensorflow-object-detection-to-openvino `__ |br| |n120| |br| |c120| | Convert TensorFlow Object Detection models to OpenVINO IR | +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ - | `121-convert-to-openvino `__ |br| |n121| |br| |c121| | Learn OpenVINO model conversion API | - +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ Model Demos @@ -261,6 +261,8 @@ Demos that demonstrate inference on a particular model. +-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ | `251-tiny-sd-image-generation `__ |br| |c251| | Image Generation with Tiny-SD and OpenVINO™. | |n251-img1| | +-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ + | `252-fastcomposer-image-generation `__ | Image generation with FastComposer and OpenVINO™. | | + +-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ Model Training diff --git a/src/bindings/python/src/compatibility/openvino/requirements-dev.txt b/src/bindings/python/src/compatibility/openvino/requirements-dev.txt index aca50982d0dc53..ff31321fd4a4c1 100644 --- a/src/bindings/python/src/compatibility/openvino/requirements-dev.txt +++ b/src/bindings/python/src/compatibility/openvino/requirements-dev.txt @@ -1 +1 @@ -cython>=0.29.32 +cython>=0.29.32,<=3.0.0