This repository was archived by the owner on Aug 1, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 129
This repository was archived by the owner on Aug 1, 2025. It is now read-only.
[Bug]: RuntimeErrors when using TorchInductor for stable diffusion #1743
Copy link
Copy link
Closed
Labels
Description
🐛 Describe the bug
Hi,
when running a stable diffusion example I am experiencing a high number of non deterministic runtime errors (~50% of the times when running my example with inductor). I observed two flavors.
First error is:
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
Its much more frequent than the second which is:
RuntimeError: CUDA: Error- illegal address
+
RuntimeError: false INTERNAL ASSERT FAILED at "../c10/cuda/CUDAGraphsC10Utils.h":73, please report a bug to PyTorch. Unknown CUDA graph CaptureStatus32650
@soumith reported here that he saw the same set of errors.
Any thoughts on how I can help to collect more data or could create more reliable repros?
My specific example looks as follows:
Repro:
pip install --upgrade diffusers transformers
Then run:
import torchdynamo
from diffusers import StableDiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") #ACCESS TOKEN REQUIRED
pipe = pipe.to("cuda")
@torchdynamo.optimize("inductor")
def apply(x):
return pipe(x)
prompt = "a photo of an astronaut riding a horse on mars"
image = apply(prompt).images[0]
I collected the stack traces of the failed attempts here: https://gist.github.com/mreso/982ef0b4c915eea6170fd81f0de16e98
Error logs
Example for the first error is:
Traceback (most recent call last):
File "/home/mreso/torchdynamo/test.py", line 40, in <module>
image = apply(prompt).images[0]
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 157, in _fn
return fn(*args, **kwargs)
File "/home/mreso/torchdynamo/test.py", line 29, in apply
return pipe(x)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 214, in __call__
text_inputs = self.tokenizer(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 269, in <graph break in __call__>
uncond_embeddings = self.text_encoder(uncond_input.input_ids.to(self.device))[0]
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1357, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 705, in forward
@add_start_docstrings_to_model_forward(CLIP_TEXT_INPUTS_DOCSTRING)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 157, in _fn
return fn(*args, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 870, in forward
return compiled_f(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 861, in new_func
return compiled_fn(args)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 299, in new_fn
fw_outs = call_func_with_args(compiled_fw, args, disable_amp=disable_amp)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 255, in call_func_with_args
out = normalize_as_list(f(args))
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 194, in run
return model(new_inputs_to_cuda)
File "/tmp/torchinductor_mreso/qt/cqt3tpyb2awwxhp3vhrmtamvb6b57ctovraioa5mbqzemitewns2.py", line 915, in call
kernel2.run(arg197_1, arg0_1, arg196_1, arg1_1, arg2_1, arg3_1, buf7, 77, 768, grid=grid(77), stream=stream0)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/triton_ops/autotune.py", line 170, in run
result = launcher(
File "<string>", line 4, in launcher
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
For the second error its:
[2022-10-21 00:11:28,203] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py line 232
due to:
Traceback (most recent call last):
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 401, in clone_input
result.copy_(x.clone())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 410, in clone_input
y = torch.clone(x)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
from user code:
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 286, in forward
t_emb = t_emb.to(dtype=self.dtype)
Set torch._dynamo.config.verbose=True for more information
==========
0%| | 0/51 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 65, in preserve_rng_state
yield
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 579, in create_aot_dispatcher_function
return aot_dispatch_base(flat_fn, fake_flat_tensor_args, aot_config)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 295, in aot_dispatch_base
compiled_fw = aot_config.fw_compiler(fw_module, flat_args)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 85, in time_wrapper
r = func(*args, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 362, in fw_compiler
return compile_fx_inner(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/debug_utils.py", line 464, in debug_wrapper
compiled_fn = compiler_fn(gm, example_inputs, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/debug.py", line 178, in inner
return fn(*args, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 127, in compile_fx_inner
compiled_fn = graph.compile_to_fn()
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/graph.py", line 339, in compile_to_fn
return self.compile_to_module().call
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 85, in time_wrapper
r = func(*args, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/graph.py", line 329, in compile_to_module
mod = PyCodeCache.load(code)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 212, in load
exec(code, mod.__dict__, mod.__dict__)
File "/tmp/torchinductor_mreso/gb/cgb5kyxk5qcj67xbzxmiuwpxlrcjspq7dyscbrllcrgknkoahswm.py", line 94, in <module>
async_compile.wait(globals())
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 352, in wait
scope[key] = result.result()
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 260, in result
kernel = self.kernel = _load_kernel(self.source_code)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 246, in _load_kernel
kernel.precompile()
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/triton_ops/autotune.py", line 58, in precompile
self.launchers = [
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/triton_ops/autotune.py", line 59, in <listcomp>
self._precompile_config(c, warm_cache_only_with_cc)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_inductor/triton_ops/autotune.py", line 83, in _precompile_config
binary = triton.compile(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/triton/compiler.py", line 1263, in compile
return CompiledKernel(name, so_cache_manager._make_path(so_name), fn_cache_manager.cache_dir, device)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/triton/compiler.py", line 1296, in __init__
mod, func, n_regs, n_spills = _triton.code_gen.load_binary(metadata["name"], self.asm["cubin"], self.shared, device)
RuntimeError: CUDA: Error- illegal address
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mreso/torchdynamo/test.py", line 24, in <module>
image = apply(prompt).images[0]
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 157, in _fn
return fn(*args, **kwargs)
File "/home/mreso/torchdynamo/test.py", line 21, in apply
return pipe(x)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 214, in __call__
text_inputs = self.tokenizer(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 326, in <graph break in __call__>
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1357, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 281, in forward
t_emb = self.time_proj(timesteps)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1357, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/diffusers/models/embeddings.py", line 91, in forward
def forward(self, timesteps):
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 157, in _fn
return fn(*args, **kwargs)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 870, in forward
return compiled_f(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 856, in new_func
compiled_fn = create_aot_dispatcher_function(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 579, in create_aot_dispatcher_function
return aot_dispatch_base(flat_fn, fake_flat_tensor_args, aot_config)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/aot_autograd.py", line 69, in preserve_rng_state
torch.cuda.set_rng_state(cuda_rng_state)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/cuda/random.py", line 64, in set_rng_state
_lazy_call(cb)
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/cuda/__init__.py", line 176, in _lazy_call
callable()
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/torch/cuda/random.py", line 62, in cb
default_generator.set_state(new_state_copy)
RuntimeError: false INTERNAL ASSERT FAILED at "../c10/cuda/CUDAGraphsC10Utils.h":73, please report a bug to PyTorch. Unknown CUDA graph CaptureStatus32650
Did Dynamo succeed?
- Does dynamo.optimize("eager") succeed?
Did AOT succeed?
- Did dynamo.optimize("aot_eager") succeed?
Did Inductor succeed?
- Does dynamo.optimize("inductor") succeed?
Minified repro
Not applicable as the created minified examples do not produce errors.
$python /home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/minifier_launcher.py
Traceback (most recent call last):
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/minifier_launcher.py", line 1028, in <module>
minifier(
File "/home/mreso/miniconda3/envs/torchdynamo2/lib/python3.9/site-packages/functorch/_src/fx_minifier.py", line 96, in minifier
raise RuntimeError("Input graph did not fail the tester")
RuntimeError: Input graph did not fail the tester