Not able to get good performance for diffusion models when doing single image inference with batch size 1 #1195

basantaxpatra · 2024-08-02T18:19:22Z

System Info

System Configuration: Single node Habana Gaudi setup
Firmware Version: hl-1.15.0-fw-48.2.1.1
Software Stack: Synapse AI 1.15

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

$ docker pull vault.habana.ai/gaudi-docker/1.15.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.0:latest
$ docker run --rm -it vault.habana.ai/gaudi-docker/1.15.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.0:latest bash
$ git clone git@github.com:huggingface/optimum-habana.git
$ optimum-habana
$ pip install .
$ cd examples/stable-diffusion
$ pip install -r requirements.txt
$ python text_to_image_generation.py
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0
--prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off"
--num_images_per_prompt 20
--batch_size 8
--image_save_dir /tmp/stable_diffusion_xl_images
--scheduler euler_discrete
--use_habana
--use_hpu_graphs
--gaudi_config Habana/stable-diffusion
--bf16

Logs for reference:
2 prompt(s) received, 20 generation(s) per prompt, 8 sample(s) per batch, 5 total batch(es).
{'generation_runtime': 470.2324, 'generation_samples_per_second': 0.219, 'generation_steps_per_second': 0.068}

initial compilation took 170 seconds, so if we disregard that, it'd be like 300 second for 32 images which is ~9.2 seconds per image on SDXL (H100s are around 2-3seconds depending on sampling params)
[{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:17:58.694850", "statistics": {"TotalNumber": 1, "TotalTime": 2406683, "AvgTime": 2406683.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:19:04.525621", "statistics": {"TotalNumber": 2, "TotalTime": 66733949, "AvgTime": 33366974.5}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:19:05.394485", "statistics": {"TotalNumber": 3, "TotalTime": 66871477, "AvgTime": 22290492.333333332}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:20:08.701577", "statistics": {"TotalNumber": 4, "TotalTime": 130001484, "AvgTime": 32500371.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:20:09.602500", "statistics": {"TotalNumber": 5, "TotalTime": 130138275, "AvgTime": 26027655.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:20:58.849669", "statistics": {"TotalNumber": 6, "TotalTime": 144735532, "AvgTime": 24122588.666666668}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:22:02.477322", "statistics": {"TotalNumber": 7, "TotalTime": 207751639, "AvgTime": 29678805.57142857}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:22:03.371944", "statistics": {"TotalNumber": 8, "TotalTime": 207892568, "AvgTime": 25986571.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:06.978577", "statistics": {"TotalNumber": 9, "TotalTime": 271316124, "AvgTime": 30146236.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:56.499370", "statistics": {"TotalNumber": 10, "TotalTime": 285510855, "AvgTime": 28551085.5}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:57.930979", "statistics": {"TotalNumber": 11, "TotalTime": 285652606, "AvgTime": 25968418.727272727}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:23:58.791526", "statistics": {"TotalNumber": 12, "TotalTime": 285788064, "AvgTime": 23815672.0}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:26:00.652013", "statistics": {"TotalNumber": 13, "TotalTime": 299983406, "AvgTime": 23075646.615384616}},
{"metric_name": "graph_compilation", "triggered_by": "metric_change", "generated_on": "2024-06-10T19:26:01.511422", "statistics": {"TotalNumber": 14, "TotalTime": 300058888, "AvgTime": 21432777.714285713}},
{"metric_name": "graph_compilation", "triggered_by": "process_exit", "generated_on": "2024-06-10T19:26:15.341419", "statistics": {"TotalNumber": 14, "TotalTime": 300058888, "AvgTime": 21432777.714285713}},
{"metric_name": "cpu_fallback", "triggered_by": "process_exit", "generated_on": "2024-06-10T19:26:15.341498", "statistics": {"TotalNumber": 0, "FallbackOps": {}}},
{"metric_name": "memory_defragmentation", "triggered_by": "process_exit", "generated_on": "2024-06-10T19:26:15.341520", "statistics": {"TotalNumber": 0, "TotalSuccessful": 0, "AvgTime": 0, "MaxTime": 0}}]

Expected behavior

initial compilation took 170 seconds, so if we disregard that, it'd be like 300 second for 32 images which is ~9.2 seconds per image on SDXL. Expecting performance ~ 2-3seconds

The text was updated successfully, but these errors were encountered:

regisss · 2024-10-21T18:46:31Z

@basantaxpatra Are you still seeing this issue on newer versions of the lib?

basantaxpatra added the bug Something isn't working label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to get good performance for diffusion models when doing single image inference with batch size 1 #1195

Not able to get good performance for diffusion models when doing single image inference with batch size 1 #1195

basantaxpatra commented Aug 2, 2024

regisss commented Oct 21, 2024

Not able to get good performance for diffusion models when doing single image inference with batch size 1 #1195

Not able to get good performance for diffusion models when doing single image inference with batch size 1 #1195

Comments

basantaxpatra commented Aug 2, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

regisss commented Oct 21, 2024