Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define the algorithm of L2_Pool2d? #278

Closed
mingmingtasd opened this issue Jul 1, 2022 · 21 comments
Closed

How to define the algorithm of L2_Pool2d? #278

mingmingtasd opened this issue Jul 1, 2022 · 21 comments

Comments

@mingmingtasd
Copy link
Contributor

As you know, the algorithm of L2_Pool2d is based on the
Lp-normalization function which should be Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P).

But for L2_pool2d, I am not sure whether need to average the sum of elements as
Y =( (X1^2 + X2^2 + ... + Xn^2)/2) ^ (1/2) or directly use Lp-normalization function as
Y = (X1^2 + X2^2 + ... + Xn^2) ^ (1/2).

I find two papers: https://sci-hub.yncjkj.com/10.1109/cvpr.2011.5995370 and https://sci-hub.yncjkj.com/10.1109/tcsvt.2015.2461978, they describe the Lp-normalization as below:

Capture2

Capture

So I confirm that Lp-normalization function should be Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P), but I am still not sure whether need to averge the sum of elements for L2_pool2d. I go through some framwork API spec and find the description as below:

1. NNAPI ANEURALNETWORKS_L2_POOL_2D:
output[b, i, j, c] =
sqrt(sum_{di, dj} pow(input[b, strides[1] * i + di, strides[2] * j + dj, c], 2) /
sum(1))
2. ONNX LpPool:
LpPool consumes an input tensor X and applies Lp pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Lp pooling consisting of computing the Lp norm on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.
3. OpenVINO: Not Supported
4. DML DML_LP_POOLING_OPERATOR_DESC :
Computes the Lp-normalized value across the elements within the sliding window over the input tensor. The value of the P variable in the Lp-normalization function Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P), where X1 to Xn representing each of the values within the sliding window. In common use cases, this value is either set to 1 or 2, representing either the L1 or L2 normalization respectively.

So it seems that NNAPI ANEURALNETWORKS_L2_POOL_2D should do averge. But after verifying on DML, DML DML_LP_POOLING_OPERATOR_DESC doesn't averge. Thus the algroithm and implementation for l2_pool2d in these frameworks may be different.

@anssiko
Copy link
Member

anssiko commented Sep 1, 2022

@huningxin, please report your proposed approach in this issue for the WG to review when you've discussed this issue with @mingmingtasd.

@fdwr
Copy link
Collaborator

fdwr commented Feb 6, 2024

...I am not sure whether need to average the sum of elements...

@mingmingtasd : What sources are you seeing that average elements before the ^ (1/p)? Everything above appears consistent, including the papers, ONNX, DML, and NNAPI, all supporting Lebesgue Pooling for p = 2 as Y = (X1^2 + X2^2 + ... + Xn^2) ^ (1/2). The / sum(1) in NNAPI is odd but ignorable, as it doesn't affect the result.

@mingmingtasd
Copy link
Contributor Author

mingmingtasd commented Feb 6, 2024

...I am not sure whether need to average the sum of elements...

@mingmingtasd : What sources are you seeing that average elements before the ^ (1/p)? Everything above appears consistent, including the papers, ONNX, DML, and NNAPI, all supporting Lebesgue Pooling for p = 2 as Y = (X1^2 + X2^2 + ... + Xn^2) ^ (1/2). The / sum(1) in NNAPI is odd but ignorable.

Agree. The L2Pool2d should follow and be based on the Lp-normalization function(Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P)).

@fdwr
Copy link
Collaborator

fdwr commented Feb 11, 2024

@mingmingtasd Is there anything remaining unresolved on this one, or can it be closed?

@mingmingtasd
Copy link
Contributor Author

@mingmingtasd Is there anything remaining unresolved on this one, or can it be closed?

Let's close it, thanks.

@fujunwei
Copy link

fujunwei commented Jun 6, 2024

TFLite also need to average the count of sum elements ((X1^2 + X2^2 + ... + Xn^2)/n) ^ (1/2), here is the l2_pool2d's kernel, so how to keep compatibility between TFLite models and ONNX models?

@huningxin
Copy link
Contributor

@fujunwei

TFLite also need to average the count of sum elements ((X1^2 + X2^2 + ... + Xn^2)/n) ^ (1/2)

As discussed before, l2 normalization should be calculated by Y = (X1^2 + X2^2 + ... + Xn^2) ^ (1/2). Would this be an issue of TFLite implementation? @reillyeon

BTW, do we have implementation experience on CoreML's l2_pool? https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.pool.l2_pool @philloooo

@fdwr
Copy link
Collaborator

fdwr commented Aug 13, 2024

Would this be an issue of TFLite implementation?

🤔 If TFLite defines it that way, that may be a useful operation, but it's something besides L2 pooling 👀.

function poolLebesgue(input, axes, windowDimensions, padding, strides, dilations, exponent)
    // y = (x1^p + x2^p + ... + xn^p) ^ (1/p)    // y is the reduced output for all applicable inputs

    return root(poolSum(pow(input, exponent), axes, windowDimensions, padding, dilations), exponent)
endfunction

@a-sully
Copy link
Contributor

a-sully commented Aug 13, 2024

🤔 If TFLite defines it that way, that may be a useful operation, but it's something besides L2 pooling 👀.

Agreed. It seems like a TFLite WebNN backend will have to decompose l2pool2d... that begs the questions:

1. Is this decomposition expressible in WebNN?

Recall that we eventually need to clearly define all WebNN operators #462

return root(poolSum(pow(input, exponent), axes, windowDimensions, padding, dilations), exponent)

2. Is this decomposition expressible in TFLite?

Anything is possible - especially if a device type is not mandated #749 - but potentially with severe performance cliffs, especially for non-CPU backends (e.g.), and especially if the operator implementation has to be hand-rolled.

FWIW #689 has the same issue: if at least two backends (i.e. Core ML and DML) have consistent behavior, then maybe that's okay (at least once discrepancies like #180 are resolved!)

@a-sully
Copy link
Contributor

a-sully commented Aug 13, 2024

especially if the operator implementation has to be hand-rolled

I just realized #180 (comment) provides a decomposition of l2pool2d using sqrt(conv2d(pow(input, 2), filterOfOnes, {dilations...})) - thanks @fwdr!

@fdwr
Copy link
Collaborator

fdwr commented Aug 14, 2024

poolSum is not [an existing WebNN operator]

@a-sully The more time I devote to thinking of expressing aggregate operators in terms of more fundamental operators, the more I realize some primitive ops (like poolSum) are missing. Even if not directly useful by themselves for ML, they can be useful for composition of others. At least for poolSum (see here), there's an easy decomposition using convolution:

function poolSum(input, axes, windowDimensions, padding, strides, dilations)
    return poolGeneric(input, axes, windowDimensions, padding, strides, dilations, add, 0)
    // OR  convolve(input, filter = ones(windowDimensions), axes, windowDimensions, padding, strides, dilations)
endfunction
  1. Is this decomposition expressible in WebNN?

Yep.

  1. Is this decomposition expressible in TFLite?

Yep.

if at least two backends (i.e. Core ML and DML) have consistent behavior, then maybe that's okay

Well, these {ONNX, DML, NNAPI, the original paper} agree (and I suspect CoreML too).

@huningxin
Copy link
Contributor

@fdwr

function poolSum(input, axes, windowDimensions, padding, strides, dilations)
    return poolGeneric(input, axes, windowDimensions, padding, strides, dilations, add, 0)
    // OR  convolve(input, filter = ones(windowDimensions), axes, windowDimensions, padding, strides, dilations)
endfunction

Thanks for the decomposition! It's very helpful. IIUC, we may need to set the convolution groups to input channels, and make the all-ones filter in shape [groups, 1, windowDimensions.height, windowDimensions.width].

@a-sully
Copy link
Contributor

a-sully commented Aug 14, 2024

Well, these {ONNX, DML, NNAPI} agree (and I suspect CoreML too).

Not related to l2Pool2d, but some thoughts regarding operator compatibility across platforms:

  • I understand that ONNX and DML are different things, but in these discussions, I consider them to be ~one backend. They're cooperating layers of the same stack :)
  • NNAPI is being deprecated in Android 15. It should not factor into decisions about WebNN

@fujunwei
Copy link

fujunwei commented Aug 15, 2024

I filed the issue in TensorFlow, they also consider it's an issue of TFLite kernel implementation, maybe they will fix it later.

@fdwr
Copy link
Collaborator

fdwr commented Aug 15, 2024

I filed the issue in TensorFlow, they also consider it's an issue of TFLite kernel implementation, maybe they will fix it later.

Junwei: Thanks for filing. So it appears WebNN's TFLite backend would need a decomposition until any future TFLite fix.

potentially with severe performance cliffs ... especially if the operator implementation has to be hand-rolled

Austin: If it's any perf consolation, LP pooling is evidently not so common (only a few models in my little stash of hundreds of 1411 model files).

@huningxin
Copy link
Contributor

@a-sully

  • I understand that ONNX and DML are different things, but in these discussions, I consider them to be ~one backend. They're cooperating layers of the same stack :)

Regarding to our new op proposal check list, there are two aspects: Cross-framework support and Cross-platform implementability. I understand we usually study ONNX / ONNXRuntime as one example of frameworks, among TensorFlow and Pytorch etc., and investigate DML as one example of platform APIs, among TFLite and CoreML.

@a-sully
Copy link
Contributor

a-sully commented Aug 15, 2024

Austin: If it's any perf consolation, LP pooling is evidently not so common (only a few nodes in my little stash of hundreds of 1411 model files).

If that's the case and there's a straightforward decomposition which could be performed in "userspace"... is this operator needed in WebNN at all?

Put another way, if this operator wasn't already in the WebNN spec, would it pass the new op proposal checklist?

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Aug 19, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Aug 19, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Aug 20, 2024
@fdwr
Copy link
Collaborator

fdwr commented Aug 23, 2024

It seems like a TFLite WebNN backend will have to decompose l2pool2d..
...
copybara-service bot mentioned this issue 3 days ago

Woot, it appears that TFLite is already fixed (per https://github.com/tensorflow/tensorflow/pull/74079/files), which means it's just contingent on Chromium updating it's TF version. Given there will now be a direct call to TFLite with no decomposition, does that change the difficulty of implementing this?

is this operator needed in WebNN at all? ...
if this operator wasn't already in the WebNN spec, would it pass the new op proposal checklist?

I think it's still worth adding to the complete collective of pooling operations, and several implementations offer faster implementations than their decomposition, suggesting it's useful. Even if it's rare in my mini-model collection, I do see people asking questions about it on forums, indicating utility. Barring that one implementation bug (now fixed), the backends implement it consistently too (unlike potentially more dubious localResponseNormalization where multiple implementations have little complicating differences).

  • Use Case. What user scenarios or experiences will benefit from this operation, and why aren't existing operations sufficient?
    Computer vision and pattern recognition.
  • Sample models. What are specific models that enable the target use case? One or more sample models as references are required.
    VGG with L2 pooling in Geodesics of learned representations https://arxiv.org/abs/1511.06394
  • Cross-framework support. Is an identical or similar operation supported by multiple popular frameworks? What are they?
    Yes.
  • Cross-platform implementability. Is the operation implementable in more than one platform? What are they?
    Yes.

@a-sully
Copy link
Contributor

a-sully commented Aug 23, 2024

Thanks for thoroughly following through with this issue @fdwr. TFLite's alignment with the other platforms does improve the "Cross-framework support" line item. Seems reasonable to me 👍 Filed https://crbug.com/361717758 to track implementation in Chromium

Can we close this issue?

@fdwr
Copy link
Collaborator

fdwr commented Aug 23, 2024

Mingming, I'm closing it from the spec perspective, as Austin created a Chromium issue for it. 👍 Thank you for raising it.

@fdwr fdwr closed this as completed Aug 23, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Aug 23, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Sep 4, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Sep 5, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Sep 6, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants