Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch] Adapt new kernel launch API #830

Closed
whitneywhtsang opened this issue Apr 7, 2024 · 1 comment
Closed

[PyTorch] Adapt new kernel launch API #830

whitneywhtsang opened this issue Apr 7, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@whitneywhtsang
Copy link
Contributor

Triton changed its kernel launch API recently. This issue is to adapt inductor side call site to make it work with both old and new triton APIs. We merged 88abff6 in #825, which causes benchmark failures:

$ ./inductor_xpu_test.sh huggingface float32 inference accuracy xpu 0 static 1 0 AlbertForMaskedLM
Testing model AlbertForMaskedLM
loading model: 0it [01:22, ?it/s]
xpu  eval  AlbertForMaskedLM                  
skipping cudagraphs for unknown reason
ERROR:common:function takes exactly 18 arguments (23 given)
Traceback (most recent call last):
  File "/cache/pytorch-3.10-22ce6c6508d1d13b263d4c8b1fd6b98505983e92-4/benchmarks/dynamo/common.py", line 2144, in check_accuracy
    new_result = optimized_model_iter_fn(model_copy, example_inputs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/cache/pytorch-3.10-22ce6c6508d1d13b263d4c8b1fd6b98505983e92-4/benchmarks/dynamo/common.py", line 1908, in run_n_iterations
    self.model_iter_fn(mod, inputs, collect_outputs=False)
  File "/cache/pytorch-3.10-22ce6c6508d1d13b263d4c8b1fd6b98505983e92-4/benchmarks/dynamo/huggingface.py", line 550, in forward_pass
    def forward_pass(self, mod, inputs, collect_outputs=True):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 3905, in forward
    return compiled_fn(full_args)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1482, in g
    return f(*args)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2533, in runtime_wrapper
    all_outs = call_func_with_args(
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1506, in call_func_with_args
    out = normalize_as_list(f(args))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1594, in rng_functionalization_wrapper
    return compiled_fw(args)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 944, in wrapper
    return optimized_function(args_new)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 378, in __call__
    return self.get_current_callable()(inputs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 405, in _run_from_cache
    return compiled_graph.compiled_artifact(inputs)
  File "/tmp/torchinductor_runner/73/c73tbo7eozox67v4sg7jsajlqbh5f4quu4bemhxsp6piqsmow2cv.py", line 659, in call
    triton_per_fused_add_embedding_native_layer_norm_0.run(arg31_1, constant0, constant7, constant8, constant1, constant2, buf3, 512, [128](https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/8584252287/job/23524321946?pr=825#step:20:129), grid=grid(512), stream=stream0)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 513, in run
    return launcher(
  File "<string>", line 8, in launcher
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/triton/backends/intel/driver.py", line 386, in __call__
    self.launch(*args, **kwargs)
TypeError: function takes exactly 18 arguments (23 given)
libcudart.so.12: cannot open shared object file: No such file or directory
libcudart.so.12: cannot open shared object file: No such file or directory
TorchDynamo optimized model failed to run because of following error
fail_to_run
@whitneywhtsang
Copy link
Contributor Author

whitneywhtsang commented Apr 7, 2024

Verified that the fix in IPEX works, thanks @Stonepia.

Stonepia added a commit to intel/intel-extension-for-pytorch that referenced this issue Apr 8, 2024
Stonepia added a commit to intel/intel-extension-for-pytorch that referenced this issue Apr 8, 2024
@vlad-penkin vlad-penkin added the enhancement New feature or request label Apr 17, 2024
Stonepia added a commit to intel/intel-extension-for-pytorch that referenced this issue Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants