-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define the set of operations and their specification #17
Comments
Thanks for bringing this issue up @huningxin – this is something we’ve been pondering as well. We would like to propose that this CG not duplicate the efforts being done by the open ONNX community. This community has produced an MIT licensed operator specification based on years of investigations of the top ML models to standardize around. The community brings together platform, software, web and hardware companies to help provide a standardized approach which allows any team across the stack leverage the same set of operators and make development in ML more efficient for everyone involved. A few companies that are involved in this open community are Amazon, Facebook, Microsoft, Intel, IBM, Unity, Nvidia, Qualcomm, AMD and more. We recommend that we follow the same pattern done by other specifications within the W3C & other standard organizations by referencing the work done by others rather than duplicating efforts and creating numerous standards. A few examples of this are:
Additionally, this is an open community so if there are any changes we desire to this specification we can make them (and they also welcome them). Now, this doesn’t mean that this API needs to support ALL of the operators listed by default, we can create a list of operations from within that specification that must be supported by an UA in order to fully support the WebNN API, but we shouldn’t redefine the operators and their signatures, inputs & outputs. |
PROPOSED RESOLUTION: The specification will reference the ONNX operations and if there are any improvements desired for ONNX the work should be there. We will be resolving on this on the next telecon, let us know if you want any objections to this. |
As discussed on today's call, please provide your input in this issue for the proposed resolution documented below prior to our next call 8 August 2019. During that call we'll resolve this issue as proposed unless objections are recorded in this issue.
(That's 6 weeks review time to provide feedback due to holiday season in some parts of the world.) |
An important part of this specification will be ensuring this set of ops are compatible with the major ML JavaScript frameworks (e.g. conv2d paddings are compatible, memory layout of tensors are compatible, etc). It's not possible for us to move forward with this resolution without understanding compatibility. Would somebody be willing to take on the work of detailing each op and whether it could be used by major frameworks in a doc that can be shared with the group? Thanks! |
A related question: what's the plan for dealing with versioning? I'd expect ONNX to evolve and add ops over time. Also, the ops would be versioned, and details may change over time. Is the idea to reference an explicit list of ONNX ops with versions, and include them by reference in the web standard? If yes, then the next step would be to agree on which specific ops + versions to reference, ensuring that they're compatible with the major ML JavaScript frameworks we want to support, as well as the backend providers. Maybe the answer is easy, and it's all of the ONNX ops, in their latest versions. I agree that we (the community group) need to do the homework to determine if that's the case. |
@nsthorat and @jbingham, below are links to existing documents and code that should help answer your questions.
On the compatibility front, there already exists conversion tools to and from other frameworks and model types.
|
Thanks, @RafaelCintron ! We'll take a closer look. I see the compatibility matrix. Great! It would be good for us to understand what compatibility really means here. Some questions that you may already have great answers for: Versioning: Converting to ONNX from js: Custom ops: How many ops? Thanks in advance for explaining! |
@jbingham : re. your versioning question: ONNX has the notion of ops and opsets. An opset consists of a collection of ops with a given specification. Typically an op has the same specification in a number of consecutive opsets until it is updated. Thus, if we consider the Equal op, there is a version introduced in opset 1, which had the same spec in opsets 1 through 6, and was updated in opset 7 (to add broadcasting support). I think the notation used in the compatibility matrix reflects this. It lists versions 1 and 7 for Equal op because opset versions 1 through 6 have the same spec for Equal, and version 7 onward have the same spec (until now, that is the current opset). |
To also add some more history and context for @jbingham questions: Versioning: For tensorflow the journey has been a bit longer. Since ONNX adopted the LCD set of ops needed for inference, TF had a much larger set of ops (there are 500+ in the tensorflow::ops C++ namespace), contrib ops, experimentation variations, training, etc.. Just like TOCO went through its journey to convert from tf to tflite, we went through a similar journey with tf2onnx. LSTM is a great example of another issue we hit. Keras has LSTM primitives, where TF is more loosely structured (we solved the LSTM with our converters btw!). Regarding the compatibility matrix link you posted , @gramalingam response is spot on. All of the opsets roll forward and include the previous versions. You target a single opset. Opset 7 is where we supported most tf models. And we just keep getting better everyday. We focus on the data science community and the tf operators they need most for their production deployed models. Examples are like in opset9 when we added NonMaxSuppression. If your model absolutely required NMS, you could use opset7 plus a custom operator, or roll forward to opset9 and the model would just work. Again very tractable. Using ONNX in js: This particular github issue would track the idea that the op schema, namespaces, versioning, semantic definitions, could all be driven from ONNX. The idea being no matter which IDL we land with, we still need a common currency how to describe op kernels, schema, and behavior. Custom ops: How many ops?
I would imagine for WebML we would want the same. Execution providers that provide enough ops to run most models without leaving the provider resource boundary. Right? What training frameworks are you thinking about? With ONNX opset7 we were able to run almost all of the models we had targeted for (tf, pytorch, scikit-learn, LightGBM, CoreML). That was our goal, to be able to fully represent the models across ML frameworks. I'm not sure what the right answer is , but I like your idea of starting with a straw man. What if we took opset10 (the latest opset) as a start to reason around that ? @nsthorat totally agree, compatibility is a huge deal. ONNX has 2 things it drives here:
I image the community would totally jump in and help fill any testing and documentation gaps as we find them. the entire goal here is to enable framework developers to work with it. We have a couple of frameworks supporting ONNX natively (ORT, Caffe2, pytorch, CNTK) and would love to keep adding more ! |
@gramalingam, thanks for clarifying. So if we wanted to cover a large number of common operators across many real-world models in multiple frameworks, we'd want to select a high enough op set version, such as opset10. That makes sense. @walrusmcd Lots of great info there. We will likely want to go deeper on some of these topics, possibly in the other github issues you mention, or new ones. Here's a summary of some of what you describe: imagine a future web platform that lets you train an ML model in any ML framework, and as long as you can convert it to a compatible graph representation, with a standardized set of ops, you can perform inference on any device, taking advantage of hardware acceleration, from any web app, running in any web browser. While I'm digesting, I have another thought, which I'll riff on in a separate comment. |
Here's that other thought: Given how complex this all is, and how many ops and ML frameworks, is it realistic to expect that there will ever be more than one code base that fully implements ONNX, much less keeps up with the evolution of opsets? From what I see on github, well over a hundred people -- and not just random people, but knowledgeable, highly technical people with deep domain knowledge -- have contributed code to get ONNX to where it is today. That would be difficult to reproduce, especially in a compatible way. And if there is likely to only ever be one ONNX implementation, does that mean that to achieve standardization across web browsers, ML frameworks, and opsets, the only viable path, realistically, is for the browser vendors to agree to ship the exact same binary? This has been done before, so it's not unprecedented or impossible. But it would take some real effort to make happen, assuming the web should have a standard at this level of abstraction, as opposed to a few low level shaders, which would be less controversial. Putting it all together would the decision to standardize on ONNX opsets mean that the web standards community would be effectively agreeing to adopt the first and, in practice, only implementation of ONNX, and the browser vendors would be agreeing to ship it? If that's what's at stake, webnn issue 17 seems kind of important. We might want to put together a detailed case for why this is really where the community should go, and why the ML frameworks and browser makers should all be onboard. It's also possible that I just took an accidental turn down a rabbit hole, in which case, please help :) |
Hi @jbingham ,
The issue #6 is for custom ops discussion. So far, the proposal is that JS ML frameworks offload the supported sub-graph execution to WebNN and execute own (custom) WASM/WebGL/GPU kernels for ops that are not supported. It would require WebNN supports high performance data/tensor exchange between WebNN execution and WASM/WebGL/GPU kernel execution. There are two early prototypes for TensorFlow.js/WebNN integration and ONNX.js/WebNN integration based on this proposal and WebNN POC.
It would depend on what models WebNN would support and the models are according to use cases. (I'd like to provide some inputs in the next comment.)
As our investigation of custom ops, we have two key observations:
|
For the starting op set, there are some inputs from use case perspective. The idea is to derive the op set by models that support defined use cases. This table maps the WebNN use cases to related networks.
There are pre-trained models in TFLite or ONNX for above networks. This table lists the required ops of each model.
Notes:
The following table maps the TFLite op in above TFLite models to ONNX op.
|
Great info, Ningxin. This is a ton of great work digging into these details. Summarizing on a bunch of the comments above:
|
Related to my previous comment about there being only one ONNX implementation, a similar case can be made for providers. Realistically, maybe there will only be one provider implementation per hardware platform. Eg, for a given Android, iOS, or Windows device and chip set, there would be only one provider implementation that takes advantage of the hardware. The reason is that it's unlikely that anyone other than the device and chip manufacturers would be able to write a performant provider and ship it with the device. Pure play browser vendors wouldn't be in a position to do it, would they? |
@jbingham : I am not sure I understand the comment about their being only one ONNX implementation (or the related comment about providers above). TensorRT, OpenVino, NGraph, Menoh, NCNN, and others all support ONNX models to various degrees. Most convert ONNX models to their own formats either via scripts outside the runtime or as a preprocessing step in their runtime. It seems to me that the question you are asking is an essential aspect of this WebML standardization effort and it was an essential aspect of the ONNX standardization effort as well. I am unclear if you are suggesting that standardizing the spec for ops is too difficult and not worth doing? I agree that we can separate the questions of (a) What ops are important enough to include in the WebML spec, and (b) The standardization of the ops selected in (a). But it seems to me that (b) is essential. |
@gramalingam Yes, agree that we need to standardize the ops that we choose to include in the spec. And agree that ONNX looks like a reasonable proposal. I'm still trying to understand the layering here, of what's ML framework vs WebNN vs ONNX vs provider vs platform vs hardware, assuming those are the right categories. In my earlier comment, I think I may have misunderstood what ONNX is. IIUC, you're saying that ONNX is just the model interchange format, not a library or provider or ML framework. There are partial conversions implemented for multiple ML frameworks already. So there's only one ONNX model format, but translation layers aren't that painful to write, since several have been created already. Is that right? The only complication is around the "to various degrees" caveat you added. As we add more ops, the likelihood of them all being implemented correctly by every ML framework and provider decreases. At a certain point, it could become quite difficult and impractical to standardize. If there's a way to minimize the number of ops that are standardized, that's one potential solution. Another solution is to share code, to ease the burden of implementing a huge set of ops and versions. But if ONNX is an interchange format only, maybe there's no opportunity for code sharing at that level. Maybe at the provider level, as I was suggesting above? |
Popping back up a level: we on the Google side are coming up to speed on the overall proposal, and have dumped a bunch of thoughts into this issue, many only tangentially related. In the interests of keeping this specific issue scoped narrowly enough to be able to close it, let's split other topics into separate issues. IIUC, there are at least 2 fundamental premises:
TBH, we on Chrome and the TensorFlow team are not yet fully convinced of the first premise, though it's certainly plausible. Let's move that discussion to another thread (eg, Issue: "Decide if a graph API is the right thing to standardize on"), and ideally spend some time f2f or separately coming up to speed on the thinking that led the group here. One thing that might help is to write out a use cases or user story for what we want to enable on the web. Eg, Issue: "Agree on a user story for ML on the web" Assuming a graph API, premise 2 seems pretty reasonable. The options seem to be: TBH, I can't rule out A, because I'm not yet persuaded that a graph API is the right level of abstraction for a web standard. If the operations are a black box, the whole graph is. Let's move that discussion to a separate thread as well. (Eg, Issue: "Determine if a black box ML model is useful".) Assuming a graph API, B seems like it wouldn't provide any good way to benefit from hardware acceleration, and would make a graph API pretty useless. I feel like we can safely rule it out. C and D are in bounds for the proposal on this thread. The proposed solution is to use ONNX for the specifications, and defer their details to the ONNX project. If we have a set of versioned operations with specifications, I agree that ONNX looks like the best interoperable standard currently out there. Sure, later, we might encounter some reason why it's not ideal, but if so, we can deal with it in a separate issue then. It seems like next up for this github issue is to choose a specific set of operations, within ONNX, that will be included in the graph API, so that we can confirm that there's adequate compatibility in the major ML frameworks. Then we can close out this issue, with the caveat that it relies on some assumptions that we haven't yet agreed to, and should address in separate issues. Does anyone have a concrete proposal? Eg, something like, "Let's use the complete ONNX Opset 9." Or: "Let's use ops A, B, C... of Opset 7 as a proof-of-concept". Totally made up, but you get the idea. @gramalingam @huningxin @walrusmcd wdyt? |
Hi @jbingham : a couple of quick comments. Re. your point (1): in terms of background, there has been a discussion about this question. My thoughts are as summarized here: #11 (comment) I think (2) (at least the "The WebNN API requires a set of operations and their specification" part) is orthogonal to (1). I think we mostly require this, even if we decide to go with an "operation API" (just for executing operations, no building graphs) … the one exception might be something like option (B) you mention (though I am not sure I understand this). I don't know what option (A) is: is it like the "load model" API discussed previously (in other issue/thread)? I am not sure what (B) means either, I agree that for the graph API (or even otherwise), (C) and or (D) make sense and seem most relevant. |
@jbingham , thanks for your comments. The next step you proposed sounds good. Based on my #17 (comment), I propose to support following 14 operators of Opset 10 as the initial proof-of-concept. They are:
+1. There are some ones related to what you mentioned.
The Use cases was defined as a start, those include both application level and framework level. Feel free to propose new ones.
There was a discussion about high level or low level API. Then it split into Executing models and Executing operations (eager or graph). |
Thanks, Rafael and Ningxin! Great links and summaries. I'll read through
and follow up where appropriate.
…On Wed, Jul 17, 2019, 8:23 PM G. Ramalingam ***@***.***> wrote:
Hi @jbingham <https://github.com/jbingham> : a couple of quick comments.
Re. your point (1): in terms of background, there has been a discussion
about this question. My thoughts are as summarized here: #11 (comment)
<#11 (comment)>
I think (2) (at least the "The WebNN API requires a set of operations and
their specification" part) is orthogonal to (1). I think we mostly require
this, even if we decide to go with an "operation API" (just for executing
operations, no building graphs) … the one exception might be something like
option (B) you mention (though I am not sure I understand this).
I don't know what option (A) is: is it like the "load model" API discussed
previously (in other issue/thread)? I am not sure what (B) means either, I
agree that for the graph API (or even otherwise), (C) and or (D) make sense
and seem most relevant.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17?email_source=notifications&email_token=AAEJPKJHXEX7ATM7T7XXFODP77O4NA5CNFSM4HJIDV42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2HGTUQ#issuecomment-512649682>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEJPKJOYC7GWCY3WSY6G6LP77O4NANCNFSM4HJIDV4Q>
.
|
One follow-up thought to @jbingham's point about "understanding the layering (frameworks, providers, platforms, hardware, etc.)." as well as the related issue of "Is a graph-builder API the right one?". One way of looking at it is: where do compilers, especially optimizations across an entire model (or across multiple operations), especially when they are hardware-specific optimizations fit in in the WebML picture/stack: do we want WebML to enable the implementation of such compilers/optimizers behind the WebML API (in the browser layer or beneath it) or do we want to enable the implementation of such compilers/optimizers outside (in which case, the compilers/optimizers would emit JavaScript code containing WebML)? In the first case, WebML serves as sort of the source for the compiler/optimizer, while in the second case WebML serves as the target for the compiler/optimizer. These are two different scenarios. |
I agree with @jbingham that the API should be more than just "custom ops", option B in his list. The point of the API is for web developers to access hardware accelerated capabilities that are not available in other APIs such as WebGL and WebGPU. Just doing custom operations means WebML would essentially be a "helper function" and not be very compelling over what TF.js and ONNX.js provide today. One approach is to do Option C in @jbingham list and structure the API in a similar manner to the DirectML Workflow but with WebGPU concepts instead of D3D12 concepts. The developer would put weights into input resources, bind the resources into input and output tensors, and record the operations into a command list. Executing the command list would dump the result into output buffers. In this model, to do "custom ops", the developer would interleave compute shaders (already existing concept in WebGPU) in between the operations defined by WebNN. We may be able to do this as an extension to WebGPU. |
I'm in support of something along those lines, but by itself it doesn't solve every use-case: it doesn't open us up for interop with CPU accelerators (e.g. tuned SIMD kernels) or standalone accelerators (e.g. Edge TPU), both of which seem important. At the same time, I think if we can extend that model to cover those too (CPU: WASM+SIMD, TPU: not sure), it would be very nice. |
I personally think we should explore Option C as long as what Kai stated is possible. The primary aspect webdevs want from the solution here is to be able to access the hardware for perf benefits if the device has it available. How we get there, graph vs lower level commands doesn't really matter as much to me as long as we can agree on the commands, set of ops and ensure that that hardware access is possible in an interoperable manner. I also, think that we need to come to conclusion regarding this before we spend too much more time on a graph API if we decide to go with the lower level commands. cc: @huningxin @anssiko |
@kainino0x if WASM will be adding support for SIMD kernels in the near future, then ML framework authors can choose whether to use those or a WebGPU extension to implement their inference engine. On the other hand, if we have the WebML API decide between "SIMD vs. compute shader", then WebML needs to be more high level than what I described above. "Custom ops" will need to be a first class citizen and framework authors will need to be prepared to provide either compute shaders or Javascript/WASM for WebML to do its job. |
WASM is indeed adding SIMD, but just like on GPU, it may not be able to be tuned for every chip and take advantage of every feature (due to having to be abstractable). (OTOH, SIMD is much simpler than CUDA, and maybe WASM SIMD can keep up with hardware in the long run, since it's smaller and easily emulated.) re: WebML API deciding, I was imagining explicitly exposing the "providers" (e.g. "CPU" or "GPU) to the app so it can choose between them. |
@kainino0x , if the SIMD aspect is not exposed to developers such that it can be specialized for all hardware then I agree it would be good to expose it as a WebML provider. |
Thanks @jbingham for a bunch of great writeups and ideas. Sorry I was offline for a bit. I'll go through them all and digest and reply. First comment: I agree with you 100% that a key part here is how we layer everything together. Having a solid drawing and concept around the layers will give us a ton of clarity. |
@walrusmcd , I happen to have a concept diagram of existing proposal. We may use it as a starting point. Remarks of the digit labels in the diagram:
|
As @gramalingam mentioned, WebNN API may enable the hardware-specific optimizations of graph compiler/optimizer. The "executing command list" approach seems not straight-forward to me to integrate that capability. Some standalone accelerators, e.g. Edge TPU, may require graph compilation before execution (more details in my next comment). This usage may not fit well into the "executing command list" approach. IMO, and as you mentioned, the "executing command list" approach could be a ML extension of WebGPU. Similar to DirectML on native, WebGPU with this extension would allow webdevs to interleave ML workloads and rendering workloads for high GPU efficient AI+gaming usage on web. |
Re @kainino0x
I agree that the programming model should support devices beyond GPU.
Current proposal only supports implicitly setting execution preference by "fast-answer", "sustained-speed" and "low-power". I think it makes sense to extend the support covering device provider enumeration and selection with appropriate permission mechanism. It may also require querying the capabilities of a device provider since different provider may support different ops, data types and architectures. A JS ML framework needs this info to identify WebNN supported sub-graph. For example, WebNN may support Edge TPU by a device provider backed by Edge TPU compiler and runtime. According to the doc, Edge TPU compiler could partition a model and compile the sub-graph with supported ops for Edge TPU runtime execution. The unsupported ops are still executed by framework kernels, like TF-Lite. For the web usage, a JS ML framework could partition the graph based on the capabilities of WebNN Edge TPU provider and delegate the supported sub-graph WebNN for Edge TPU compilation and execution. The unsupported operations could still be executed by framework kernels written in WebGPU compute shader or WebAssembly. This usage would require following functionalities of WebNN ("+" means supported in current proposal, "-" mans gaps):
|
It makes sense to select a provider via a hint like "low-power" if the caller has no custom ops. But if they have custom ops written e.g. in WebGPU, they have to be guaranteed to get a WebGPU capable provider (a GPU). |
If we have multiple providers, CPU and GPU, I would expect the provider selection would be done via a hint (as @kainino0x describes) combined with what is available on the system. If the GPU provider is available, then the web developer would need to describe their inputs and outputs in terms of If the CPU provider gets used, then the web developer would need to describe their inputs and outputs in terms of I think there will exist a set of graph optimizations that can be done in a device-neutral manner. The ML framework libraries should be able to handle these. If there exists graph optimizations that need device-specific information, then I think we have no choice but to add a graph API to the spec that sits on top of the operator API. I'm of two minds when it comes to how much we unify the two providers. If we provide a more unified API to developers, then the API speaks in terms of generic |
|
this issue has a resolution - any reason to keep it open? |
Thanks for noting this has indeed been addressed and we have a resolution on it. Closing. |
As raised in CG meeting, the first foundation spec only lists 32 operation types without information about how to use them.
We need to define the set of operations and their specification/semantics.
The set of operations could be derived from the use cases and corresponding models. For WebNN POC, there is a spreadsheet that lists supported models and their required operations. It can be used as a starting point.
By following the spirit of WebML CG charter, the specification will be implementable on top of existing major platform APIs, such as Android NNAPI, Windows DirectML, and macOS/iOS MPS/BNNS. So when specifying operations, the platform APIs mapping/support need to be looked into. For WebNN POC, there is another spreadsheet the captures the native API mapping of supported operations. It can also be leveraged.
We can file individual issue for each operation specification and use this one as the meta issue.
The text was updated successfully, but these errors were encountered: