-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support output tensor in native memory format #173
Comments
Do we have enough info here to close this issue given what have been discussed in the related thread? |
I suppose this is proposed as a v2 feature. I am that we close it for now. Or could we mark it as v2 and treat it lower priority for v1? @anssiko , any thoughts? |
I created a new label "v2" and tagged this issue with it. This allows us to "park" an issue we feel is not making progress toward consensus at this point of development, to be revisited later if/when new information emerges. |
"MLTensor" can be useful in the model loader API (it best matches the concepts in backends like TFLite, where input/output are tensors). So to make model loader and WebNN APIs consistent, maybe we should consider using "MLTensor" in WebNN too. Besides, "MLInput", in its current definition (data+dimensions), are exactly tensors. So "MLTensor" is a more accurate name because such a structure is not necessarily an input. |
I think the core problem we try to summarize in this issue is how WebNN can support block format, or data blob that is optimized towards certain classes of hardware but not publicly defined in a way that can fully interoperate in a web scenario. I don't think it necessarily implies that it's going to be compatible with |
Yeah, I understood MLTensor was proposed for another reason. I just want to express that "MLTensor" may be a more accurate name than "MLInput" for this structure which can have other use cases, e.g., in the model loader API. |
Seems appropriate to close, but I'd like to check with @huningxin to make sure all relevant information from this issue and proposals linked to it have been ported over to the active MLBuffer discussion. |
@huningxin do you mind checking if we could close this, see also above? |
This issue is to track the ongoing discussion in PR #166 for a potential v2 feature.
Requirement
Allow the compiled graph to be able to produce the result in native (platform-specific) memory format to accommodate frameworks that uses webnn in an op-by-op calling pattern.
Background
Some native ML API, such as oneDNN, uses optimized (platform-specific) memory layout for better vectorization and cache reuse on different platforms. These APIs would require not only the constants (such as weights) but also input and output tensors in the optimized memory layout for best performance. So when the JS frameworks call WebNN compiled graphs for an op-by-op calling pattern but without accessing the intermediate outputs, the memory conversions for intermediate outputs would be unnecessary and could be optimized if WebNN supports output tensor in native memory format.
Proposals
MLTensor interface that supports converting the the output data layout asynchronously.
MLGraphBuilder.convertToStandardFormat by @wchao1115 that allows native memory format graph output as a special case through an optional flag in the graph builder and coupled it with an explicit conversion API.
The text was updated successfully, but these errors were encountered: