[PyTorch] Adapt new kernel launch API #830

whitneywhtsang · 2024-04-07T00:51:34Z

Triton changed its kernel launch API recently. This issue is to adapt inductor side call site to make it work with both old and new triton APIs. We merged 88abff6 in #825, which causes benchmark failures:

$ ./inductor_xpu_test.sh huggingface float32 inference accuracy xpu 0 static 1 0 AlbertForMaskedLM
Testing model AlbertForMaskedLM
loading model: 0it [01:22, ?it/s]
xpu  eval  AlbertForMaskedLM                  
skipping cudagraphs for unknown reason
ERROR:common:function takes exactly 18 arguments (23 given)
Traceback (most recent call last):
  File "/cache/pytorch-3.10-22ce6c6508d1d13b263d4c8b1fd6b98505983e92-4/benchmarks/dynamo/common.py", line 2144, in check_accuracy
    new_result = optimized_model_iter_fn(model_copy, example_inputs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/cache/pytorch-3.10-22ce6c6508d1d13b263d4c8b1fd6b98505983e92-4/benchmarks/dynamo/common.py", line 1908, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/cache/pytorch-3.10-22ce6c6508d1d13b263d4c8b1fd6b98505983e92-4/benchmarks/dynamo/huggingface.py", line 550, in forward_pass
    def forward_pass(self, mod, inputs, collect_outputs=True):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 3905, in forward
    return compiled_fn(full_args)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1482, in g
    return f(*args)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2533, in runtime_wrapper
    all_outs = call_func_with_args(
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1506, in call_func_with_args
    out = normalize_as_list(f(args))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1594, in rng_functionalization_wrapper
    return compiled_fw(args)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 944, in wrapper
    return optimized_function(args_new)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 378, in __call__
    return self.get_current_callable()(inputs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 405, in _run_from_cache
    return compiled_graph.compiled_artifact(inputs)
  File "/tmp/torchinductor_runner/73/c73tbo7eozox67v4sg7jsajlqbh5f4quu4bemhxsp6piqsmow2cv.py", line 659, in call
    triton_per_fused_add_embedding_native_layer_norm_0.run(arg31_1, constant0, constant7, constant8, constant1, constant2, buf3, 512, [128](https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/8584252287/job/23524321946?pr=825#step:20:129), grid=grid(512), stream=stream0)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 513, in run
    return launcher(
  File "<string>", line 8, in launcher
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/triton/backends/intel/driver.py", line 386, in __call__
    self.launch(*args, **kwargs)
TypeError: function takes exactly 18 arguments (23 given)
libcudart.so.12: cannot open shared object file: No such file or directory
libcudart.so.12: cannot open shared object file: No such file or directory
TorchDynamo optimized model failed to run because of following error
fail_to_run

The text was updated successfully, but these errors were encountered:

whitneywhtsang · 2024-04-07T05:13:30Z

Verified that the fix in IPEX works, thanks @Stonepia.

intel/intel-xpu-backend-for-triton#830

Align API with intel/intel-xpu-backend-for-triton#830

whitneywhtsang assigned Stonepia Apr 7, 2024

Stonepia mentioned this issue Apr 7, 2024

[inductor] make inductor work with new triton kernel launch API intel/intel-extension-for-pytorch#584

Merged

whitneywhtsang closed this as completed Apr 7, 2024

Stonepia added a commit to intel/intel-extension-for-pytorch that referenced this issue Apr 8, 2024

Bump triton version for new API change

e1c9ed8

intel/intel-xpu-backend-for-triton#830

Stonepia mentioned this issue Apr 8, 2024

Bump triton version for new API change intel/intel-extension-for-pytorch#586

Merged

Stonepia added a commit to intel/intel-extension-for-pytorch that referenced this issue Apr 8, 2024

Bump triton version for new API change (#586)

fa34731

Align API with intel/intel-xpu-backend-for-triton#830

vlad-penkin added the enhancement New feature or request label Apr 17, 2024

vlad-penkin added this to the 00.3 [Triton] Language and Runtime milestone Apr 17, 2024

Stonepia added a commit to intel/intel-extension-for-pytorch that referenced this issue Nov 7, 2024

Bump triton version for new API change (#586)

869e0b5

Align API with intel/intel-xpu-backend-for-triton#830

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Adapt new kernel launch API #830

[PyTorch] Adapt new kernel launch API #830

whitneywhtsang commented Apr 7, 2024

whitneywhtsang commented Apr 7, 2024 •

edited

Loading

[PyTorch] Adapt new kernel launch API #830

[PyTorch] Adapt new kernel launch API #830

Comments

whitneywhtsang commented Apr 7, 2024

whitneywhtsang commented Apr 7, 2024 • edited Loading

whitneywhtsang commented Apr 7, 2024 •

edited

Loading