Skip to content

Average pooling clamped divisor should be done on all conditions where the kernel can go out of bounds #4144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ivangarcia44
Copy link
Contributor

In the average pooling computation usually the divisor is the product of dimensions of the kernel. But sometimes the divisor computation needs to discount some elements of the computation (e.g., kernel window is clamped).

Below is the full condition that determines if the divisor is computed by the product of dimensions or uses the clamped divisor formula. The clamped divisor formula is computed in the createAvgPoolValueCountIncludePadFalseCase method, and if the condition is true below it escapes createAvgPoolValueCountIncludePadFalseCase and computes the divisor by just the product of kernel dimensions.

The formula previously was incomplete, which was not caught in torch-mlir because it did not have the tests covering these cases (addressed this in this change). The issue was caught in the IREE project - #4079.

In summary the clamped divisor is needed if count_include_pad is false and there is padding or if the count_include_pad is false, ceil_mode is true, and there is at least one non-unitary stride. The former clause is the key of this change. Previously clamped divisor computation was not done if there was no padding. But even when there is no padding, if the ceil_mode is true and strides are not unitary, the kernel window can go out of bounds, and therefore the divisor computation needs to be clamped. PyTorch does this (verified experimentally).
...
createAvgPoolValueCountIncludePadFalseCase(
bool ceilMode, bool countIncludePad, OpTy op,
...
SmallVectorImpl<int64_t> &strideInts,
SmallVectorImpl<int64_t> &paddingInts,
...) {
...
bool hasPadding =
!llvm::all_of(paddingInts, [](int64_t p) { return p == 0; });
bool allStridesUnitary =
llvm::all_of(strideInts, [](int64_t s) { return s == 1; });
bool canKernelWindowGoOutOfBounds =
hasPadding || (ceilMode && !allStridesUnitary);

if (countIncludePad || !canKernelWindowGoOutOfBounds) {
// These cases are not handled here.
return std::nullopt;
}
...
}

See https://pytorch.org/docs/stable/generated/torch.nn.functional.avg_pool2d.html for more information.

@AmosLewis
@rsuderman
@nirvedhmeshram)
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw

@ivangarcia44 ivangarcia44 marked this pull request as draft April 24, 2025 15:26
…rnel/stride/padding elements have to be processed in reversed order relative to the spatial dimensions.
@ivangarcia44 ivangarcia44 marked this pull request as ready for review April 24, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants