[Fusion] Enhance the prologue epilogue fusion #277

yaoyaoding · 2023-06-11T00:59:44Z

Previously, to enable automatic prologue epilogue fusion, we require the template written in a single kernel function (e.g., cuda kernel or cpu kernel), and the prologue/epilogue tensors must be directly used in the kernel function. This PR allows the user to define launch function and use arbitrary implementation (e.g., call multiple kernel functions, use workspace, ...) while still support fusion. This enhancement is necessary to support parallel-k optimization in the operator-level.

. . . . . . . . .

**Describe the bug** **gemma 2-b** is failing to compile due to missing **torch.any** operator support in hidet **To Reproduce** Run [this](https://drive.google.com/file/d/11ovSzoiHGG2f_qWucwoRxhRjCRRuCH72/view?usp=drive_link) script with **gemma-2b** model. --------- Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>

.

a3d597b

. . . . . . . . .

yaoyaoding force-pushed the compute-ops branch from 5a0bae1 to a3d597b Compare June 11, 2023 01:03

yaoyaoding merged commit 09463e8 into hidet-org:main Jun 11, 2023

yaoyaoding deleted the compute-ops branch June 11, 2023 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fusion] Enhance the prologue epilogue fusion #277

[Fusion] Enhance the prologue epilogue fusion #277

yaoyaoding commented Jun 11, 2023

[Fusion] Enhance the prologue epilogue fusion #277

[Fusion] Enhance the prologue epilogue fusion #277

Conversation

yaoyaoding commented Jun 11, 2023