rfcs: graph api: Constant Block Weight Mechanism Graph API design #2280

xiang1guo · 2024-12-17T16:02:15Z

The RFC proposes a design for handling constant block weight mechanism in graph API.

Rendered version: link

dzarukin · 2024-12-17T21:46:27Z

rfcs/20241216-constant-block-weight-api-design/README.md

+graph library cannot directly store the constant weight cache in the compiled
+partition cache. Instead, it needs to distinguish between these different 
+constant weight tensors. However, distinguishing constant weights requires 
+additional information from the user side. This raises new considerations for


The problematic case is not quite clear to me: is the case that different data is supplied with a single compiled partition that it's caching is not possible? When re-usage of compiled partition happens exactly? What's the real-world scenario for it?

Hi, Dima, thanks for the review!

is the case that different data is supplied with a single compiled partition that it's caching is not possible? When re-usage of compiled partition happens exactly?

Consider a whole graph/model contains 2 subgraphs, the first subgraph compiles and caches the compiled partition with constant weight cache, and the second subgraph compilation will cache hit (it's possible) the first compiled partition cache and reuse the constant tensor cache in this compiled partition cache(with constant cache feature enabled). However, the second subgraph has a different weight input(the constant tensor cache for weight input in the first cp cache cannot be reused), thus leading to an accuracy issue.

What's the real-world scenario for it?

Currently we are integrating graph API in PyTorch for SDPA op direct optimization, the workflow is that each SDPA's compilation, and execution with graph API happens in every iteration of the PyTorch SDPA operation call. Thus there may be some compiled partition cache hit and reuse but with different input.

But we are safe now since SDPA doesn't contain any constant tensor cache(constant weight), so compiled partition cache reuse won't introduce any issues. However, we are planning for some other ops that may need constant tensor cache, like the MLP optimization on the OpenVINO side.

the second subgraph compilation will cache hit (it's possible) the first compiled partition cache

Hasn't compiled partition have an ID which should be checked by the CP-cache? Will underneath LTs' IDs be ignored in case as this potential hit as well?

the second subgraph compilation will cache hit (it's possible) the first compiled partition cache

Hasn't compiled partition have an ID which should be checked by the CP-cache? Will underneath LTs' IDs be ignored in case as this potential hit as well?

We do have ID(both logical tensor and op ID) hashed into the key of CP-cache, it works very well on the legacy integration on IPEX and ITEX, since the whole model will be mapped to the graph API with unique ID.

We are trying to design a new API/Mechanism for the current new integration solution of OP direct optimization. Such integration exists under a single operation, they may use the same ID to create op and logical tensor. So the cp cache will hit. Actually, for such new integration solution, compilation happens on every iteration, more cp cache hit helps to eliminate the compilation time. That's why we want to address the issue without hurting cp cache hit.

xiang1guo added the RFC A design document label Dec 17, 2024

xiang1guo requested a review from a team December 17, 2024 16:02

xiang1guo self-assigned this Dec 17, 2024

xiang1guo requested a review from a team as a code owner December 17, 2024 16:02

dzarukin reviewed Dec 17, 2024

View reviewed changes

rfcs: graph api: support constant block weight API

2560609

xiang1guo force-pushed the xiang/rfcs/constant-weight-cache-design branch from 6d686ac to 2560609 Compare December 20, 2024 02:19

xiang1guo mentioned this pull request Dec 24, 2024

graph: backend: dnnl: encode mem address into constant cache key #2312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfcs: graph api: Constant Block Weight Mechanism Graph API design #2280

rfcs: graph api: Constant Block Weight Mechanism Graph API design #2280

xiang1guo commented Dec 17, 2024

dzarukin Dec 17, 2024

xiang1guo Dec 18, 2024 •

edited

Loading

dzarukin Dec 18, 2024

xiang1guo Dec 18, 2024 •

edited

Loading

rfcs: graph api: Constant Block Weight Mechanism Graph API design #2280

Are you sure you want to change the base?

rfcs: graph api: Constant Block Weight Mechanism Graph API design #2280

Conversation

xiang1guo commented Dec 17, 2024

dzarukin Dec 17, 2024

Choose a reason for hiding this comment

xiang1guo Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

dzarukin Dec 18, 2024

Choose a reason for hiding this comment

xiang1guo Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

xiang1guo Dec 18, 2024 •

edited

Loading

xiang1guo Dec 18, 2024 •

edited

Loading