Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rounding type of pool2d operations #208

Merged
merged 2 commits into from
Sep 30, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -1553,13 +1553,20 @@ partial interface MLGraphBuilder {
### pooling operations ### {#api-mlgraphbuilder-pool2d}
Compute a *mean*, *L2 norm*, or *max* reduction operation across all the elements within the moving window over the input tensor. See the description of each type of reduction in [[#api-mlgraphbuilder-reduce]].
<script type=idl>
enum MLRoundingType {
"floor",
"ceil"
};
Comment on lines +1556 to +1559
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on #198. Rounding mode for tensor size calculation should be done above WebNN, similar to how it is done in ONNX Runtime above DirectML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchao1115 , thanks for your comments on #198.

However a caller may not calculate the tensor shape and rely on WebNN to do that, e.g. the OpenCV WebNN backend mentioned in #198. The MLOperand interface of WebNN doesn't allow query the shape. So if the caller doesn't infer the shape by itself, it would be hard to calculate the output shape with a desired rounding type from the input shape, especially when the input is an intermediate operand. That's why I propose to let the callers to just configure the rounding type instead of calculating the output shape at all by themselves.

I suppose this would map ONNX ceil_mode well. What do you think?

Copy link
Collaborator

@wchao1115 wchao1115 Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. The current WebNN design does allow dynamic shape inference, which would make it somewhat easier to call the API directly (since the caller needs not worry about correctly implementing shape inference for all cases themselves), but it does come at a cost of additional work on the implementer's side, and in theory some long-term maintenance cost of the API due to the additional policy that must be implemented for such a caller.

If we want to continue to allow dynamic shape inference in the WebNN API, then adding an optional output size's rounding mode would not be out of line. In that case, I would suggest that we also add outputSizes so if the framework callers already calculate the output size themselves (most frameworks do), then they can just ignore the rounding mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I would suggest that we also add outputSizes so if the framework callers already calculate the output size themselves (most frameworks do), then they can just ignore the rounding mode.

It sounds good to me.

@wchao1115 , according to DirectML backend implementation, how does pooling ops of DirectMLX, such as AveragePooling, support rounding mode? It calculates the output sizes inside and doesn't allow to configure neither rounding type nor output sizes.

Copy link
Collaborator

@wchao1115 wchao1115 Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DirectMLX is just a helper library to DirectML. It doesn't add any meaningful feature to it, just literally reduces typing and makes DirectML easier to access. The library also doesn't do shape inference at runtime, only at construction time, so if the framework that uses it supports runtime shape inference, it needs to handle that before passing it down to DirectMLX. Also wanted to point out that ONNX Runtime doesn't actually use DirectMLX, but TensorFlow does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wchao1115 .

outputSizes is added into the latest commit. Please take another look.


dictionary MLPool2dOptions {
sequence<long> windowDimensions;
sequence<long> padding;
sequence<long> strides;
sequence<long> dilations;
MLAutoPad autoPad = "explicit";
MLInputOperandLayout layout = "nchw";
MLRoundingType roundingType = "floor";
sequence<long> outputSizes;
};

partial interface MLGraphBuilder {
Expand Down Expand Up @@ -1594,10 +1601,18 @@ partial interface MLGraphBuilder {
"nhwc":
- input tensor: [batches, height, width, channels]
- output tensor: [batches, height, width, channels]
- *roundingType*: an {{MLRoundingType}}. The option specifies the rounding function used to compute the output shape.
- *outputSizes*: a sequence of long of length 2. The sizes of the two spacial dimensions of the output tensor. When the output sizes are explicitly specified, the options.roundingType is ignored. If not specified, the output sizes are automatically computed.

**Returns:** an {{MLOperand}}. The output 4-D tensor that contains the
result of the reduction. The logical shape is interpreted according to the
value of *layout*.
value of *layout*. More specifically, if the *options.roundingType* is *"floor"*, the spatial dimensions of the output tensor can be calculated as follow:

*output size = floor(1 + (input size - filter size + beginning padding + ending padding) / stride)*

or if *options.roundingType* is *"ceil"*:

*output size = ceil(1 + (input size - filter size + beginning padding + ending padding) / stride)*

<div class="note">
A *global* pooling operation such as one for the max pooling operation is a variant of pooling where the window dimensions is the spatial dimensions (last two dimensions) of the input shape, as follow.
Expand Down