-
Notifications
You must be signed in to change notification settings - Fork 52
Description
[ Spun off from issue #780 ]
@reillyeon mentioned to me about introducing a caching mechanism for MLGraph (e.g. save it for later use and avoid repeated graph compilation). Said mechanism might help here.
I'd like to discuss this proposal from @reillyeon a bit more. I believe the group's current working assumption is that graph compilation could take a long time and caching would improve the user experience on subsequent visits to the same site (cross-origin is a harder, separate problem). Depending on the size of the model, underlying implementation and other factors, this could be a significant performance and UX improvement.
@reillyeon have you thought about this more since you came up with the idea? Known implementation blockers?
@bbernhar what could we learn from the WebGPU compilation caches for shaders, pipelines? I see some toggles in Dawn code to control caching and you've done work in this space e.g. in https://issues.chromium.org/issues/41479574 suggesting you might have insights to share.
Also paging @huningxin @fdwr and @RafaelCintron for thoughts. Interested in all insights in the spirit of brainstorming.
A few additional questions:
Do we foresee the caching of compiled graphs to be purely an implementation detail?
Privacy impact? We already discuss caching-related timing attack vectors in privacy considerations and reference the WebGPU compilation cache considerations. Depending on which way we go, might want to revise these considerations.