Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support_deepcache_svd_pipeline #514

Merged
merged 12 commits into from
Jan 19, 2024
Merged

Conversation

clackhan
Copy link
Contributor

@clackhan clackhan commented Jan 11, 2024

  • A100
PyTorch Torch-Compile Oneflow OneDiff-DeepCache
576 x 1024, 25 frames, decode chunk size 5 50.930s 43.376s 31.933s 18.486s

@@ -0,0 +1,257 @@
# Run with ONEFLOW_RUN_GRAPH_BY_VM=1 to get faster
MODEL = 'stabilityai/stable-video-diffusion-img2vid-xt'
Copy link
Collaborator

@strint strint Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个例子我们放到适配 diffusers 的文件夹(https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions)下,建一个 video 文件夹,然后添一个 video 的readme ,介绍下使用方式和性能。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example 后面拿来测试 compile 。和 diffusers 适配相关的后面都放 https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions

@isidentical
Copy link
Contributor

I'm testing this pipeline although the inference seems to work, it fails before returning the output (so I suspect it might be VAE related?) when using the OneDiff's Enterprise edition (VM Mode=1)

Logs
2024-01-11 18:42:10.210 [stderr   ] libibverbs not available, ibv_fork_init skipped
2024-01-11 18:42:15.135 [stdout   ] In our loading pipeline
2024-01-11 18:42:15.969 [stderr   ] 
2024-01-11 18:42:15.969 [stderr   ] Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]
2024-01-11 18:42:19.617 [stderr   ] 
2024-01-11 18:42:19.617 [stderr   ] Loading pipeline components...:  20%|██        | 1/5 [00:03<00:14,  3.65s/it]
2024-01-11 18:42:19.782 [stderr   ] 
2024-01-11 18:42:19.782 [stderr   ] Loading pipeline components...:  40%|████      | 2/5 [00:03<00:04,  1.60s/it]
2024-01-11 18:42:20.060 [stderr   ] 
2024-01-11 18:42:20.060 [stderr   ] Loading pipeline components...:  80%|████████  | 4/5 [00:04<00:00,  1.46it/s]
2024-01-11 18:42:20.700 [stderr   ] 
2024-01-11 18:42:20.700 [stderr   ] Loading pipeline components...: 100%|██████████| 5/5 [00:04<00:00,  1.49it/s]
2024-01-11 18:42:20.700 [stderr   ] 
2024-01-11 18:42:20.700 [stderr   ] Loading pipeline components...: 100%|██████████| 5/5 [00:04<00:00,  1.06it/s]
2024-01-11 18:42:27.030 [stdout   ] Image size: (1024, 576)
2024-01-11 18:43:01.403 [stderr   ] 
2024-01-11 18:43:01.403 [stderr   ]   0%|          | 0/20 [00:00<?, ?it/s]
2024-01-11 18:46:34.227 [stderr   ] 
2024-01-11 18:46:34.227 [stderr   ]   5%|▌         | 1/20 [03:32<1:07:23, 212.82s/it]
2024-01-11 18:46:40.903 [stderr   ] 
2024-01-11 18:46:40.903 [stderr   ]  10%|█         | 2/20 [03:39<27:28, 91.56s/it]   
2024-01-11 18:46:41.469 [stderr   ] 
2024-01-11 18:46:41.469 [stderr   ]  15%|█▌        | 3/20 [03:40<14:10, 50.01s/it]
2024-01-11 18:46:42.508 [stderr   ] 
2024-01-11 18:46:42.508 [stderr   ]  20%|██        | 4/20 [03:41<08:10, 30.68s/it]
2024-01-11 18:46:43.074 [stderr   ] 
2024-01-11 18:46:43.074 [stderr   ]  25%|██▌       | 5/20 [03:41<04:57, 19.82s/it]
2024-01-11 18:46:43.639 [stderr   ] 
2024-01-11 18:46:43.639 [stderr   ]  30%|███       | 6/20 [03:42<03:05, 13.27s/it]
2024-01-11 18:46:44.678 [stderr   ] 
2024-01-11 18:46:44.678 [stderr   ]  35%|███▌      | 7/20 [03:43<02:00,  9.27s/it]
2024-01-11 18:46:45.245 [stderr   ] 
2024-01-11 18:46:45.245 [stderr   ]  40%|████      | 8/20 [03:43<01:18,  6.50s/it]
2024-01-11 18:46:45.810 [stderr   ] 
2024-01-11 18:46:45.810 [stderr   ]  45%|████▌     | 9/20 [03:44<00:51,  4.65s/it]
2024-01-11 18:46:46.850 [stderr   ] 
2024-01-11 18:46:46.850 [stderr   ]  50%|█████     | 10/20 [03:45<00:35,  3.53s/it]
2024-01-11 18:46:47.416 [stderr   ] 
2024-01-11 18:46:47.416 [stderr   ]  55%|█████▌    | 11/20 [03:46<00:23,  2.62s/it]
2024-01-11 18:46:47.982 [stderr   ] 
2024-01-11 18:46:47.982 [stderr   ]  60%|██████    | 12/20 [03:46<00:15,  2.00s/it]
2024-01-11 18:46:49.021 [stderr   ] 
2024-01-11 18:46:49.021 [stderr   ]  65%|██████▌   | 13/20 [03:47<00:11,  1.71s/it]
2024-01-11 18:46:49.588 [stderr   ] 
2024-01-11 18:46:49.588 [stderr   ]  70%|███████   | 14/20 [03:48<00:08,  1.36s/it]
2024-01-11 18:46:50.153 [stderr   ] 
2024-01-11 18:46:50.153 [stderr   ]  75%|███████▌  | 15/20 [03:48<00:05,  1.12s/it]
2024-01-11 18:46:51.192 [stderr   ] 
2024-01-11 18:46:51.192 [stderr   ]  80%|████████  | 16/20 [03:49<00:04,  1.10s/it]
2024-01-11 18:46:51.759 [stderr   ] 
2024-01-11 18:46:51.759 [stderr   ]  85%|████████▌ | 17/20 [03:50<00:02,  1.07it/s]
2024-01-11 18:46:52.324 [stderr   ] 
2024-01-11 18:46:52.324 [stderr   ]  90%|█████████ | 18/20 [03:50<00:01,  1.21it/s]
2024-01-11 18:46:53.363 [stderr   ] 
2024-01-11 18:46:53.364 [stderr   ]  95%|█████████▌| 19/20 [03:51<00:00,  1.12it/s]
2024-01-11 18:46:53.930 [stderr   ] 
2024-01-11 18:46:53.930 [stderr   ] 100%|██████████| 20/20 [03:52<00:00,  1.26it/s]
2024-01-11 18:46:53.930 [stderr   ] 
2024-01-11 18:46:53.930 [stderr   ] 100%|██████████| 20/20 [03:52<00:00, 11.63s/it]
2024-01-11 18:47:40.710 [stderr   ] F20240111 18:47:40.710036    67 cutlass_conv_tuner_impl.cpp:123] Check failed: cudaEventSynchronize(end) : an illegal memory access was encountered (700) 
2024-01-11 18:47:41.602 [stderr   ] *** Check failure stack trace: ***
2024-01-11 18:47:41.962 [stderr   ]     @     0x7f58d75938da  google::LogMessage::Fail()
2024-01-11 18:47:41.973 [stderr   ]     @     0x7f58d7596811  google::LogMessage::SendToLog()
2024-01-11 18:47:41.979 [stderr   ]     @     0x7f58d7593409  google::LogMessage::Flush()
2024-01-11 18:47:41.985 [stderr   ]     @     0x7f58d75970f9  google::LogMessageFatal::~LogMessageFatal()
2024-01-11 18:47:42.065 [stderr   ]     @     0x7f58cfe1f87f  oneflow::CutlassConvTunerImpl<>::Find()
2024-01-11 18:47:42.068 [stderr   ]     @     0x7f58ce8dc45b  _ZZNK7oneflow12_GLOBAL__N_127CutlassConvTuningWarmupPass5ApplyEPNS_3JobEPNS_10JobPassCtxEENKUlPKNS_6OpNodeEE_clES8_
2024-01-11 18:47:42.070 [stderr   ]     @     0x7f58ce8de065  oneflow::(anonymous namespace)::CutlassConvTuningWarmupPass::Apply()
2024-01-11 18:47:42.072 [stderr   ]     @     0x7f58ce703e04  _ZZN7oneflow23LazyJobBuildAndInferCtx8CompleteEvENKUlRKSsiE2_clES2_i
2024-01-11 18:47:42.079 [stderr   ]     @     0x7f58ce7095b1  oneflow::LazyJobBuildAndInferCtx::Complete()
2024-01-11 18:47:42.082 [stderr   ]     @     0x7f59ec50abde  oneflow::CurJobBuildAndInferCtx_Complete()
2024-01-11 18:47:42.086 [stderr   ]     @     0x7f59ec50b9db  (unknown)
2024-01-11 18:47:42.090 [stderr   ]     @     0x7f59ec288ea8  (unknown)
2024-01-11 18:47:42.107 [stderr   ]     @     0x7f5b2b9c5493  cfunction_call
2024-01-11 18:47:42.110 [stderr   ]     @     0x7f5b2b973e37  _PyObject_MakeTpCall
2024-01-11 18:47:42.113 [stderr   ]     @     0x7f5b2b91671b  _PyEval_EvalFrameDefault
2024-01-11 18:47:42.115 [stderr   ]     @     0x7f5b2ba6e621  _PyEval_Vector
2024-01-11 18:47:42.118 [stderr   ]     @     0x7f5b2b976ff8  method_vectorcall
2024-01-11 18:47:42.120 [stderr   ]     @     0x7f5b2b973c30  _PyObject_Call
2024-01-11 18:47:42.123 [stderr   ]     @     0x7f5b2b916e8d  _PyEval_EvalFrameDefault
2024-01-11 18:47:42.125 [stderr   ]     @     0x7f5b2ba6e621  _PyEval_Vector
2024-01-11 18:47:42.128 [stderr   ]     @     0x7f5b2b976ff8  method_vectorcall
2024-01-11 18:47:42.130 [stderr   ]     @     0x7f5b2b973c30  _PyObject_Call
2024-01-11 18:47:42.133 [stderr   ]     @     0x7f5b2b916e8d  _PyEval_EvalFrameDefault
2024-01-11 18:47:42.136 [stderr   ]     @     0x7f5b2ba6e621  _PyEval_Vector
2024-01-11 18:47:42.138 [stderr   ]     @     0x7f5b2b976ff8  method_vectorcall
2024-01-11 18:47:42.140 [stderr   ]     @     0x7f5b2b973c30  _PyObject_Call
2024-01-11 18:47:42.143 [stderr   ]     @     0x7f5b2b916e8d  _PyEval_EvalFrameDefault
2024-01-11 18:47:42.146 [stderr   ]     @     0x7f5b2ba6e621  _PyEval_Vector
2024-01-11 18:47:42.148 [stderr   ]     @     0x7f5b2b976ff8  method_vectorcall
2024-01-11 18:47:42.150 [stderr   ]     @     0x7f5b2b973c30  _PyObject_Call
2024-01-11 18:47:42.153 [stderr   ]     @     0x7f5b2b916e8d  _PyEval_EvalFrameDefault
2024-01-11 18:47:42.156 [stderr   ]     @     0x7f5b2ba6e621  _PyEval_Vector
2024-01-11 18:47:42.161 [stderr   ] Stack trace (most recent call last) in thread 67:
2024-01-11 18:47:42.180 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/_oneflow_internal.cpython-311-x86_64-linux-gnu.so", at 0x7f59ec288ea7, in 
2024-01-11 18:47:42.180 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/_oneflow_internal.cpython-311-x86_64-linux-gnu.so", at 0x7f59ec50b9da, in 
2024-01-11 18:47:42.180 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/_oneflow_internal.cpython-311-x86_64-linux-gnu.so", at 0x7f59ec50abdd, in CurJobBuildAndInferCtx_Complete()
2024-01-11 18:47:42.180 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58ce7095b0, in LazyJobBuildAndInferCtx::Complete()
2024-01-11 18:47:42.180 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58ce703e03, in 
2024-01-11 18:47:42.180 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58ce8de064, in 
2024-01-11 18:47:42.181 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58ce8dc45a, in 
2024-01-11 18:47:42.181 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58cfe1f87e, in CutlassConvTunerImpl<cutlass::library::Conv2dConfiguration, cutlass::library::ConvArguments>::Find(ep::CudaStream*, cutlass::library::ConvFunctionalKey, cutlass::library::Conv2dConfiguration const&, cutlass::library::ConvArguments const&, void*, unsigned long)
2024-01-11 18:47:42.181 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58d75970f8, in 
2024-01-11 18:47:42.181 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58d7593408, in 
2024-01-11 18:47:42.181 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58d7596810, in 
2024-01-11 18:47:42.181 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58d75938d9, in 
2024-01-11 18:47:42.181 [stderr   ]    Object "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/../oneflow.libs/liboneflow-cca397a1.so", at 0x7f58c76c946e, in 
2024-01-11 18:47:42.181 [stderr   ] 
2024-01-11 18:47:42.181 [stderr   ] Aborted (Signal sent by tkill() 38 0)

@isidentical
Copy link
Contributor

isidentical commented Jan 11, 2024

If I stop compiling image_encoder and vae (which i assume is the main problem), it works (so only compiling unet). Although it only returns black images with the following warning on the fp16 variant, not sure if you tested it?:

2024-01-11 19:16:46.772 [stderr   ] 100%|██████████| 20/20 [00:15<00:00,  1.39it/s]
2024-01-11 19:16:46.773 [stderr   ] 
2024-01-11 19:16:46.773 [stderr   ] 100%|██████████| 20/20 [00:15<00:00,  1.27it/s]
2024-01-11 19:16:46.773 [stderr   ] 
2024-01-11 19:16:51.932 [stderr   ] /root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/diffusers/image_processor.py:97: RuntimeWarning: invalid value encountered in cast
2024-01-11 19:16:51.932 [stderr   ]   images = (images * 255).round().astype("uint8")

Looking into underlying tensors (output_type="latent"), they are all NaN so seems like something is broken (this doesn't happen when I omit compilation for unet for this pipeline, so something weird going on there).

@lixiang007666
Copy link
Contributor

lixiang007666 commented Jan 11, 2024

I'm testing this pipeline although the inference seems to work, it fails before returning the output (so I suspect it might be VAE related?) when using the OneDiff's Enterprise edition (VM Mode=1)

Logs

You can test reduce the --decode-chunk-size, for example, to 7.

@lixiang007666
Copy link
Contributor

lixiang007666 commented Jan 12, 2024

If I stop compiling image_encoder and vae (which i assume is the main problem), it works (so only compiling unet). Although it only returns black images with the following warning on the fp16 variant, not sure if you tested it?:

This script has already limited the fp16 range to avoid overflow by setting ONEFLOW_ATTENTION_ALLOW_HALF_PRECISION_SCORE_ACCUMULATION_MAX_M = 0.
If you still get black videos, you can further try disabling ONEFLOW_ATTENTION_ALLOW_HALF_PRECISION_ACCUMULATION.

@isidentical
Copy link
Contributor

Thanks for the hint, setting ONEFLOW_ATTENTION_ALLOW_HALF_PRECISION_SCORE_ACCUMULATION_MAX_M to 0 in my case (I wasn't using this script, just the pipeline) fixed the issue.

@isidentical
Copy link
Contributor

isidentical commented Jan 12, 2024

You can test reduce the --decode-chunk-size, for example, to 7.

I've tried setting to 6 and 8 (instead of what we had, which is 12 and works without compiling the vae decoder) but it still crashes (running it on an A100 40G as a point of reference). Actually taking it back, setting it 4 prevents crashes!

def __init__(
self,
vae: AutoencoderKLTemporalDecoder,
image_encoder: CLIPVisionModelWithProjection,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this component fails to get saved, FYI:

2024-01-12 01:11:46.164 [stderr   ]   File "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/onediff/infer_compiler/with_oneflow_compile.py", line 406, in save_graph
2024-01-12 01:11:46.164 [stderr   ]     flow.save(state_dict, file_path)
2024-01-12 01:11:46.164 [stderr   ]   File "/root/.cache/isolate/virtualenv/5371bcd257691477725a108bc5f62c9ccb52f53db92ecf3cddd4ceb4111f24d0/lib/python3.11/site-packages/oneflow/framework/check_point_v2.py", line 720, in save
2024-01-12 01:11:46.164 [stderr   ]     pickled_bytes = pickle.dumps(obj)
2024-01-12 01:11:46.164 [stderr   ]                     ^^^^^^^^^^^^^^^^^
2024-01-12 01:11:46.164 [stderr   ] _pickle.PicklingError: Can't pickle <class 'transformers.models.clip.modeling_clip.CLIPVisionModelOutput'>: it's not the same object as transformers.models.clip.modeling_clip.CLIPVisionModelOutput

Copy link
Contributor

@lixiang007666 lixiang007666 Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation of image_encoder can be skipped, we have not tested its graph save.

@isidentical
Copy link
Contributor

Another bug occurs when I try two different image resolutions back to back, but not sure if it is an deepcache problem or a generic SVD problem (ONEFLOW_RUN_GRAPH_BY_VM=1)

2024-01-12 20:25:46.360 [stdout   ] Image size:
2024-01-12 20:25:46.360 [stdout   ]  (256, 256)
2024-01-12 20:25:46.714 [stderr   ] 
2024-01-12 20:25:46.714 [stderr   ]   0%|          | 0/20 [00:00<?, ?it/s]
2024-01-12 20:25:46.722 [stdout   ] [ERROR](GRAPH:OneflowGraph_0:OneflowGraph) run got error: <class 'oneflow._oneflow_internal.exception.RuntimeError'> Error: Reshape infered output element count is different with input in op_name: model.down_blocks.0.resnets.0-reshape-72 input shape is : (1,50,320,32,32) , output shape is : (2,2,320,72,128) , output logical shape is (2,2,320,72,128) , and reshape shape conf is : (2,-1,320,72,128) op_loc: 
2024-01-12 20:25:46.723 [stdout   ] 
2024-01-12 20:25:46.723 [stderr   ] ERROR [2024-01-12 20:25:46] - Exception in __call__: e=RuntimeError('\x1b[1m\x1b[38;2;255;000;000mError\x1b[0m: Reshape infered output element count is different with input in op_name: model.down_blocks.0.resnets.0-reshape-72 input shape is : (1,50,320,32,32) , output shape is : (2,2,320,72,128) , output logical shape is (2,2,320,72,128) , and reshape shape conf is : (2,-1,320,72,128) op_loc: \n')

(and this takes a really long time to complete)

@lixiang007666
Copy link
Contributor

Another bug occurs when I try two different image resolutions back to back, but not sure if it is an deepcache problem or a generic SVD problem (ONEFLOW_RUN_GRAPH_BY_VM=1)

2024-01-12 20:25:46.360 [stdout   ] Image size:
2024-01-12 20:25:46.360 [stdout   ]  (256, 256)
2024-01-12 20:25:46.714 [stderr   ] 
2024-01-12 20:25:46.714 [stderr   ]   0%|          | 0/20 [00:00<?, ?it/s]
2024-01-12 20:25:46.722 [stdout   ] [ERROR](GRAPH:OneflowGraph_0:OneflowGraph) run got error: <class 'oneflow._oneflow_internal.exception.RuntimeError'> Error: Reshape infered output element count is different with input in op_name: model.down_blocks.0.resnets.0-reshape-72 input shape is : (1,50,320,32,32) , output shape is : (2,2,320,72,128) , output logical shape is (2,2,320,72,128) , and reshape shape conf is : (2,-1,320,72,128) op_loc: 
2024-01-12 20:25:46.723 [stdout   ] 
2024-01-12 20:25:46.723 [stderr   ] ERROR [2024-01-12 20:25:46] - Exception in __call__: e=RuntimeError('\x1b[1m\x1b[38;2;255;000;000mError\x1b[0m: Reshape infered output element count is different with input in op_name: model.down_blocks.0.resnets.0-reshape-72 input shape is : (1,50,320,32,32) , output shape is : (2,2,320,72,128) , output logical shape is (2,2,320,72,128) , and reshape shape conf is : (2,-1,320,72,128) op_loc: \n')

(and this takes a really long time to complete)

The method for dynamic switch resolution in SVD is currently under development.
For the time being, it is necessary to enable export VM_REBUILD_DYNAMIC_SHAPE="1" to avoid this issue.
If you use the save_graph and load_graph functions, and save the resolution required in advance, then there will be no additional compile time when loading and matching that resolution.

@isidentical
Copy link
Contributor

we have been playing with this for a while and it seems to work really well 💯

@clackhan
Copy link
Contributor Author

clackhan commented Jan 18, 2024

we have been playing with this for a while and it seems to work really well 💯

have you tried compile fast_unet? we forgot to do this in the previous example. compiling fast_unet will make the pipeline faster.
image

@isidentical
Copy link
Contributor

yep! we are compiling unet, fast_unet and vae.decoder (the rest seem to be problematic for save/load cases and didn't provide enough speed up to warrant investigating why)

@clackhan clackhan merged commit b7a976b into main Jan 19, 2024
4 of 5 checks passed
@clackhan clackhan deleted the support_deepcache_svd_pipeline branch January 19, 2024 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants