-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow no-op graphs? #614
Comments
Open to hearing more thoughts that can sway this. Either direction, it would be good to document this, and if we prohibit nop graphs, to remark that inserting a dummy identity is an effective work-around. |
This constant-only graph seems useful, especially for GPU or NPU device. Once this graph compiled, the constant data would be uploaded to the dedicated device memory and it's output can be used by other graphs on the same device as input (via Today when ONNXRuntime WebNN EP inferencing transformer decoders, like Whisper, because WebNN doesn't support /cc @Honry |
Thanks for providing this use case, @huningxin! A few thoughts...
Taking a step back... isn't this the point of
In this case, there's no need to create an
Just to clarify, are you suggesting that the weights would be shared with the two decoder sub-graphs as inputs or constants? Currently
This relates back to #542 (FYI @bbernhar), since using an |
Yea, its for optimization. When you build a graph with constants directly, the runtime can reformat it AOT for better execution performance. But this also creates a copy (in a new format). You don't want web developers reading or writing to/from this data like normal inputs because its owned by the runtime and illegal for them to execute these "runtime owned" constants buffers. The usage flags would be need to more specific than Hope that helps. =) |
Thanks for the additional info, @bbernhar! I assume by "runtime owned" constant buffers you're referring to Looking back at @huningxin's proposal:
I'm still confused how compiling constant subgraphs will reduce data copies and memory duplication. Wouldn't the output of the constant sub-graphs just result in another data copy? According to the docs, the Here's my understanding of the constant subgraph flow. Please let me know if anything is incorrect! const constantBuilder = new MLGraphBuilder(context);
// Copies `someArrayBuffer`'s data into `constantBuilder`
const constantOperand = constantBuilder.constant(someArrayBuffer);
// Copies the data into `constantGraph`. This copy is an implementation detail,
// though it's hard to remove if `build()` can be called multiple times. See #567).
// Then the DML implementation reformats (another copy) this data into a
// "runtime owned" constant buffer
const constantGraph = await constantBuilder.build({'output': constantOperand});
// _copies_ the data to `mlBuffer`
// - The "runtime owned" constant buffer is tied to `constantGraph`, so
// presumably other `MLGraph` instances cannot use it?
// - I assume it's destroyed alongside all other `constantGraph` resources when
// `constantGraph` is GCed?
constantGraph.dispatch({inputs: /*none!*/{}, outputs: mlBuffer});
// Now, to use this constant data in other graphs...
const subGraphBuilder = new MLGraphBuilder(context);
// Build up some operands with `subGraphBuilder` to yield a `subGraph`
// ...
// Since we're passing `mlBuffer` as an input rather than a constant, we cannot
// take advantage of the AOT reformatting optimization
subGraph.dispatch({inputs: mlBuffer, outputs: ... }); Looking again at the
This means that sharing this buffer between
Agreed, I was just trying to emphasize that "output" implies writable whereas "constant" implies read-only, so it seems unlikely that we'd allow a given |
Yes, thanks for pointing this out.
@huningxin may correct me here but as I understand, if we dispatch a |
I meant the shared constant data only needs to be uploaded to the constant-graph once and there is only one copy of constant data in device memory that are shared by the two decoder graphs (no_past and with_past). I agreed it still needs data copy when passing the constant data from the constant-graph to the following decoder graphs as input. And in this scenario, the constant data is used by the decoder graphs as an input without
AFAIK, the persistent resource is initialized by a particular graph initializer. I am not sure a persistent resource initialized by one graph can be bound for other graphs. @fdwr ? |
It works. But as I mentioned, because now the constant data is passed as an input resource (without setting |
If |
I'm trying to square this comment
with this comment
Just to make sure I'm understanding correctly... Are you suggesting that the optimized constant data can be shared between these graphs? Or that the same constant data in the MLBuffer can be passed to both graphs, but then each graph will need to create its own optimized copy? I think you're saying the latter? So in summary:
Returning to the original question... Where do constant-only graphs fit into this picture?
I think this still holds? :) |
@huningxin: 🤔 It's not well defined - my teammate Justin Stoecker says he wouldn't rely on doing that. |
I meant the latter. As @fdwr mentioned, we should not rely on the former.
I'd say passing an |
I would be supportive of having "constant MLBuffer" usage (for the purposes of re-using constant input data). We could spec it such that after @a-sully @huningxin Shall we add this to the prototype TODO? |
SGTM 👍 |
SGTM! Thanks @bbernhar! |
Can we close this in favour of "constant MLBuffer"? Please check we have migrated relevant information from here over to the appropriate MLBuffer issue where that is tracked and cross-link to that spec issue, and to a Chromium prototyping issue too as appropriate. Specifically, I haven't hear any use cases not addressed by the constant MLBuffer proposal. We discussed this in https://www.w3.org/2024/04/18-webmachinelearning-minutes.html#t08 |
I'm content either way (like I said, inserting a dummy |
If there's no clear use case (that's not addressed by |
+1
There is another case that an input or a constant operand can be specified as an output although the graph has ops. a = builder.input('a', {dataType: 'float32'});
b = builder.relu(a);
graph = await builder.build({a, b});
// TypeError: Failed to execute 'build' on 'MLGraphBuilder': The operand with name "a" is not an output operand. |
Agreed that the current spec text and Chromium impl. are more restrictive than just the "no-op graphs". Are you calling out a use case (where we should reconsider the behavior) or just noting the restriction? I'd argue for keeping the restriction until a concrete use case turns up, but it's not a strongly held opinion. |
I was just noting this not only restricts the no-op graphs but also prevents from setting graph outputs to input/constant operands. I am not aware of any use cases of the latter behavior.
+1 |
Sounds like we've got consensus. Should we just close this out? Do we want to add a note to the spec e.g.:
|
|
Closing out SGTM! Let's also not forget to add appropriate WPT coverage (FYI @BruceDai ) |
In #603 a step was added build() to match the Chromium implementation, which errors out if an input operand or a constant operands is specified as an output.
@fdwr writes that this
But also :
Raising this per WG telecon discussion
The text was updated successfully, but these errors were encountered: