Describe the issue
as mentioned :#3503
It is very expensive to call binder() and pack call args.
bound_args, specialization, options = binder(*args, **kwargs)
When repeatedly calling a compiled kernel, JIT will still repeatedly call the function to generate key for comparison, which brings certain overhead. Can we simplify this?
|
def specialize_impl(arg, specialize_extra, is_const=False, specialize_value=True, align=True): |
Environment details
Hardware independent
The latest version of Triton