Allow no-op graphs? #614

inexorabletash · 2024-03-21T16:13:36Z

In #603 a step was added build() to match the Chromium implementation, which errors out if an input operand or a constant operands is specified as an output.

@fdwr writes that this

...would nullify the possibility of a nop graph (say one with just a constant and output), which conceptually seems valid to me, as one scenario is where you have a chained sequence of models with one step that can be an action or a nop, and it may be easier to express by always passing a valid (but empty) graph to that component which executes them all.

But also :

I'm not too worried about it either, since a caller can always insert a dummy identity node to satisfy this constraint.

Raising this per WG telecon discussion

The text was updated successfully, but these errors were encountered:

fdwr · 2024-03-21T16:40:13Z

Open to hearing more thoughts that can sway this. Either direction, it would be good to document this, and if we prohibit nop graphs, to remark that inserting a dummy identity is an effective work-around.

huningxin · 2024-03-25T03:12:57Z

say one with just a constant and output

This constant-only graph seems useful, especially for GPU or NPU device. Once this graph compiled, the constant data would be uploaded to the dedicated device memory and it's output can be used by other graphs on the same device as input (via MLBuffer) that would avoid cross-device data copy and memory duplication.

Today when ONNXRuntime WebNN EP inferencing transformer decoders, like Whisper, because WebNN doesn't support If operator, there will be two WebNN sub-graphs being built. One with past Key Value (KV) cache and another one without cache. The inference code would run the sub-graph without cache for the first iteration and run the sub-graph with past KV cache one for the following iterations. The two sub-graphs actually share some common weights. In the existing implementation, they are uploaded by the two sub-graphs separately and would cause memory duplication. If the constants-only graph is allowed, the common weights can be uploaded through a constants-only graph and be shared with the two decoder sub-graphs (via MLBuffer).

/cc @Honry

a-sully · 2024-03-25T21:28:14Z

Thanks for providing this use case, @huningxin! A few thoughts...

Once this graph compiled, the constant data would be uploaded to the dedicated device memory and it's output can be used by other graphs on the same device as input

Taking a step back... isn't this the point of writeBuffer?

const inputMlBuffer = context.createBuffer({size: someArrayBuffer.byteLength, usage: MLBufferUsage.INPUT});
mlContext.writeBuffer(inputMlBuffer, { srcData: someArrayBuffer});
graph.dispatch({inputs: inputMlBuffer, outputs});

In this case, there's no need to create an MLGraph just for the sake of loading constant values to the device memory... Is there something I'm missing here? :)

If the constants-only graph is allowed, the common weights can be uploaded through a constants-only graph and be shared with the two decoder sub-graphs (via MLBuffer).

Just to clarify, are you suggesting that the weights would be shared with the two decoder sub-graphs as inputs or constants?

Currently MLBuffer is only proposed as being used for inputs/outputs, though of course it would be nice to expand this to constants in the future. That being said, constants are, well, constant, so I don't expect we would allow something like this:

builder.constant(mlBuffer)
//...
graph.dispatch({inputs, outputs: mlBuffer});

This relates back to #542 (FYI @bbernhar), since using an MLBuffer as a constant() would presumably require creating it with "read-only" usage flags, whereas "output" tensors must be writable

bbernhar · 2024-03-25T22:17:08Z

In this case, there's no need to create an MLGraph just for the sake of loading constant values to the device memory... Is there something I'm missing here? :)

Yea, its for optimization. When you build a graph with constants directly, the runtime can reformat it AOT for better execution performance. But this also creates a copy (in a new format). You don't want web developers reading or writing to/from this data like normal inputs because its owned by the runtime and illegal for them to execute these "runtime owned" constants buffers.

The usage flags would be need to more specific than read-only (ie. graph or context /dispatch based constant).

Hope that helps. =)

a-sully · 2024-03-25T23:48:18Z

Thanks for the additional info, @bbernhar!

I assume by "runtime owned" constant buffers you're referring to DML_TENSOR_FLAG_OWNED_BY_DML in DML?

Looking back at @huningxin's proposal:

Once this graph compiled, the constant data would be uploaded to the dedicated device memory and it's output can be used by other graphs on the same device as input (via MLBuffer) that would avoid cross-device data copy and memory duplication

I'm still confused how compiling constant subgraphs will reduce data copies and memory duplication. Wouldn't the output of the constant sub-graphs just result in another data copy? According to the docs, the DML_TENSOR_FLAG_OWNED_BY_DML flag can only be used on input tensors. I do not expect we can create MLBuffers backed by this type of buffer (and as @bbernhar mentioned, these tensors should not be exposed to web developers)

Here's my understanding of the constant subgraph flow. Please let me know if anything is incorrect!

const constantBuilder = new MLGraphBuilder(context);

// Copies `someArrayBuffer`'s data into `constantBuilder`
const constantOperand = constantBuilder.constant(someArrayBuffer);

// Copies the data into `constantGraph`. This copy is an implementation detail,
// though it's hard to remove if `build()` can be called multiple times. See #567).
// Then the DML implementation reformats (another copy) this data into a
// "runtime owned" constant buffer
const constantGraph = await constantBuilder.build({'output': constantOperand});

// _copies_ the data to `mlBuffer`
//  - The "runtime owned" constant buffer is tied to `constantGraph`, so
//    presumably other `MLGraph` instances cannot use it?
//  - I assume it's destroyed alongside all other `constantGraph` resources when
//    `constantGraph` is GCed?
constantGraph.dispatch({inputs: /*none!*/{}, outputs: mlBuffer});

// Now, to use this constant data in other graphs...

const subGraphBuilder = new MLGraphBuilder(context);
// Build up some operands with `subGraphBuilder` to yield a `subGraph`
//  ...

// Since we're passing `mlBuffer` as an input rather than a constant, we cannot
// take advantage of the AOT reformatting optimization
subGraph.dispatch({inputs: mlBuffer, outputs: ... });

Looking again at the DML_TENSOR_FLAG_OWNED_BY_DML docs:

The effect of this flag is that DirectML makes a copy of the tensor data during initialization of an operator, storing it in the persistent resource.

This means that sharing this buffer between MLGraphs would require sharing that specific persistent resource, no?

The usage flags would be need to more specific than read-only (ie. graph or context /dispatch based constant).

Agreed, I was just trying to emphasize that "output" implies writable whereas "constant" implies read-only, so it seems unlikely that we'd allow a given MLBuffer to be used for both :)

bbernhar · 2024-03-26T01:10:22Z

I assume by "runtime owned" constant buffers you're referring to DML_TENSOR_FLAG_OWNED_BY_DML in DML?
This means that sharing this buffer between MLGraphs would require sharing that specific persistent resource, no?

Yes, thanks for pointing this out.

Wouldn't the output of the constant sub-graphs just result in another data copy?

@huningxin may correct me here but as I understand, if we dispatch a MLGraph of constants(), the dispatched output will stay on the device, allowing us to re-use the output MLBuffer, as normal input, into a another MLGraph, with the same constant data. This gets around requiring a special MLBuffer for constants().

huningxin · 2024-03-26T01:12:33Z

@a-sully

I'm still confused how compiling constant subgraphs will reduce data copies and memory duplication.

I meant the shared constant data only needs to be uploaded to the constant-graph once and there is only one copy of constant data in device memory that are shared by the two decoder graphs (no_past and with_past).

I agreed it still needs data copy when passing the constant data from the constant-graph to the following decoder graphs as input. And in this scenario, the constant data is used by the decoder graphs as an input without DML_TENSOR_FLAG_OWNED_BY_DML flag, this may lose some inference performance.

This means that sharing this buffer between MLGraphs would require sharing that specific persistent resource, no?

AFAIK, the persistent resource is initialized by a particular graph initializer. I am not sure a persistent resource initialized by one graph can be bound for other graphs. @fdwr ?

huningxin · 2024-03-26T01:16:43Z

@bbernhar

@huningxin may correct me here but as I understand, if we dispatch a MLGraph of constants(), the dispatched output will stay on the device, allowing us to re-use the output MLBuffer, as normal input, into a another MLGraph, with the same constant data. This gets around requiring a special MLBuffer for constants().

It works. But as I mentioned, because now the constant data is passed as an input resource (without setting DML_TENSOR_FLAG_OWNED_BY_DML flag), this would prevent DML from optimizing this resource for inference performance. This optimization happens at the graph initialization time rather than inference time.

huningxin · 2024-03-26T01:35:57Z

If builder.constant() can take a MLBuffer, builder.build() of graphs sharing the same weight may bind that MLBuffer for tensor set with DML_TENSOR_FLAG_OWNED_BY_DML flag at graph initialization. This would not only eliminate the cross-device data copies (only need to write to MLBuffer once) but also ensure the constant data being optimized by each graph initializer.

a-sully · 2024-03-26T22:02:14Z

I'm trying to square this comment

If builder.constant() can take a MLBuffer, builder.build() of graphs sharing the same weight may bind that MLBuffer for tensor set with DML_TENSOR_FLAG_OWNED_BY_DML flag at graph initialization.

with this comment

AFAIK, the persistent resource is initialized by a particular graph initializer. I am not sure a persistent resource initialized by one graph can be bound for other graphs. @fdwr ?

Just to make sure I'm understanding correctly... Are you suggesting that the optimized constant data can be shared between these graphs? Or that the same constant data in the MLBuffer can be passed to both graphs, but then each graph will need to create its own optimized copy? I think you're saying the latter?

So in summary:

passing an MLBuffer as an input() will allow the same data to be passed to multiple MLGraph instances (no additional copies). This data cannot be optimized during graph initialization because it's passed as an input rather than a constant
passing an MLBuffer as a constant() will ensure there's only one copy of the unoptimized constant data. Each MLGraph which this buffer is passed to may create its own optimized copy of the constant data during graph initialization.

Returning to the original question... Where do constant-only graphs fit into this picture?

Once this graph compiled, the constant data would be uploaded to the dedicated device memory and it's output can be used by other graphs on the same device as input

Taking a step back... isn't this the point of writeBuffer?

I think this still holds? :)

fdwr · 2024-03-26T22:31:52Z

I am not sure a persistent resource initialized by one graph can be bound for other graphs. @fdwr?

@huningxin: 🤔 It's not well defined - my teammate Justin Stoecker says he wouldn't rely on doing that.

huningxin · 2024-03-27T01:09:10Z

@a-sully

Just to make sure I'm understanding correctly... Are you suggesting that the optimized constant data can be shared between these graphs? Or that the same constant data in the MLBuffer can be passed to both graphs, but then each graph will need to create its own optimized copy? I think you're saying the latter?

I meant the latter. As @fdwr mentioned, we should not rely on the former.

Returning to the original question... Where do constant-only graphs fit into this picture?

I'd say passing an MLBuffer as a constant() is better than constant-only graph for my use case ("no_past" and "with_past" decoder graphs share common weight).

bbernhar · 2024-03-28T17:07:27Z

I would be supportive of having "constant MLBuffer" usage (for the purposes of re-using constant input data). We could spec it such that after builder.constant(mlBuffer), it becomes read-only (ex. no writeBuffer() or dispatch() allowed).

@a-sully @huningxin Shall we add this to the prototype TODO?

a-sully · 2024-03-28T21:13:22Z

SGTM 👍

huningxin · 2024-03-28T23:26:05Z

Shall we add this to the prototype TODO?

SGTM! Thanks @bbernhar!

anssiko · 2024-04-25T09:30:27Z

Can we close this in favour of "constant MLBuffer"?

Please check we have migrated relevant information from here over to the appropriate MLBuffer issue where that is tracked and cross-link to that spec issue, and to a Chromium prototyping issue too as appropriate.

Specifically, I haven't hear any use cases not addressed by the constant MLBuffer proposal. We discussed this in https://www.w3.org/2024/04/18-webmachinelearning-minutes.html#t08

fdwr · 2024-04-25T22:58:28Z

Can we close this in favour of "constant MLBuffer"?

I'm content either way (like I said, inserting a dummy identity suffices for my original concern). Deferring to Ningxin/Austin/Bryan/Josh.

a-sully · 2024-04-26T17:55:44Z

If there's no clear use case (that's not addressed by MLBuffer) then I propose not allowing no-op graphs. We can always reconsider this stance later

huningxin · 2024-04-27T13:09:41Z

@a-sully

If there's no clear use case (that's not addressed by MLBuffer) then I propose not allowing no-op graphs. We can always reconsider this stance later

+1

@inexorabletash

In #603 a step was added build() to match the Chromium implementation, which errors out if an input operand or a constant operands is specified as an output.

There is another case that an input or a constant operand can be specified as an output although the graph has ops.

a = builder.input('a', {dataType: 'float32'});
b = builder.relu(a);
graph = await builder.build({a, b});
// TypeError: Failed to execute 'build' on 'MLGraphBuilder': The operand with name "a" is not an output operand.

inexorabletash · 2024-04-27T23:16:29Z

Agreed that the current spec text and Chromium impl. are more restrictive than just the "no-op graphs".

Are you calling out a use case (where we should reconsider the behavior) or just noting the restriction? I'd argue for keeping the restriction until a concrete use case turns up, but it's not a strongly held opinion.

huningxin · 2024-04-27T23:55:57Z

@inexorabletash

Are you calling out a use case (where we should reconsider the behavior) or just noting the restriction?

I was just noting this not only restricts the no-op graphs but also prevents from setting graph outputs to input/constant operands. I am not aware of any use cases of the latter behavior.

I'd argue for keeping the restriction until a concrete use case turns up

+1

inexorabletash · 2024-04-29T18:02:23Z

Sounds like we've got consensus. Should we just close this out? Do we want to add a note to the spec e.g.:

NOTE: Specifying an input operand or constant operand as a graph output results in an error, as this is usually an incorrect usage of the API. Callers can work around this by introducing an identity() operator.

fdwr · 2024-04-29T20:43:06Z

Sounds like we've got consensus. Should we just close this out? Do we want to add a note to the spec e.g.:

Close out. 👍
Nice little note, just like you wrote. 👍

a-sully · 2024-04-29T21:03:07Z

Closing out SGTM! Let's also not forget to add appropriate WPT coverage (FYI @BruceDai )

Resolves webmachinelearning#614

Resolves #614

inexorabletash added the question label Mar 21, 2024

a-sully mentioned this issue Mar 22, 2024

Graph with no input #615

Closed

huningxin mentioned this issue Mar 27, 2024

Can an MLGraphBuilder be reused? #567

Closed

a-sully mentioned this issue Apr 16, 2024

Constant sequential filling operation needs output shape parameter #492

Closed

inexorabletash self-assigned this Apr 29, 2024

inexorabletash added a commit to inexorabletash/webnn that referenced this issue Apr 29, 2024

Add note about no-op graphs.

e2abe1c

Resolves webmachinelearning#614

inexorabletash mentioned this issue Apr 29, 2024

Add note about no-op graphs #665

Merged

a-sully mentioned this issue Apr 30, 2024

Do we need an MLConstantOperand? #668

Closed

fdwr closed this as completed in #665 Apr 30, 2024

fdwr pushed a commit that referenced this issue Apr 30, 2024

Add note about no-op graphs. (#665)

1c9948d

Resolves #614

a-sully mentioned this issue Jan 30, 2025

Support building graphs from MLTensor containing constants #760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow no-op graphs? #614

Allow no-op graphs? #614

inexorabletash commented Mar 21, 2024

fdwr commented Mar 21, 2024

huningxin commented Mar 25, 2024

a-sully commented Mar 25, 2024

bbernhar commented Mar 25, 2024

a-sully commented Mar 25, 2024

bbernhar commented Mar 26, 2024

huningxin commented Mar 26, 2024 •

edited

Loading

huningxin commented Mar 26, 2024 •

edited

Loading

huningxin commented Mar 26, 2024 •

edited

Loading

a-sully commented Mar 26, 2024

fdwr commented Mar 26, 2024 •

edited

Loading

huningxin commented Mar 27, 2024 •

edited

Loading

bbernhar commented Mar 28, 2024

a-sully commented Mar 28, 2024

huningxin commented Mar 28, 2024

anssiko commented Apr 25, 2024

fdwr commented Apr 25, 2024

a-sully commented Apr 26, 2024

huningxin commented Apr 27, 2024

inexorabletash commented Apr 27, 2024

huningxin commented Apr 27, 2024

inexorabletash commented Apr 29, 2024

fdwr commented Apr 29, 2024

a-sully commented Apr 29, 2024

Allow no-op graphs? #614

Allow no-op graphs? #614

Comments

inexorabletash commented Mar 21, 2024

fdwr commented Mar 21, 2024

huningxin commented Mar 25, 2024

a-sully commented Mar 25, 2024

bbernhar commented Mar 25, 2024

a-sully commented Mar 25, 2024

bbernhar commented Mar 26, 2024

huningxin commented Mar 26, 2024 • edited Loading

huningxin commented Mar 26, 2024 • edited Loading

huningxin commented Mar 26, 2024 • edited Loading

a-sully commented Mar 26, 2024

fdwr commented Mar 26, 2024 • edited Loading

huningxin commented Mar 27, 2024 • edited Loading

bbernhar commented Mar 28, 2024

a-sully commented Mar 28, 2024

huningxin commented Mar 28, 2024

anssiko commented Apr 25, 2024

fdwr commented Apr 25, 2024

a-sully commented Apr 26, 2024

huningxin commented Apr 27, 2024

inexorabletash commented Apr 27, 2024

huningxin commented Apr 27, 2024

inexorabletash commented Apr 29, 2024

fdwr commented Apr 29, 2024

a-sully commented Apr 29, 2024

huningxin commented Mar 26, 2024 •

edited

Loading

huningxin commented Mar 26, 2024 •

edited

Loading

huningxin commented Mar 26, 2024 •

edited

Loading

fdwr commented Mar 26, 2024 •

edited

Loading

huningxin commented Mar 27, 2024 •

edited

Loading