Average pooling clamped divisor should be done on all conditions where the kernel can go out of bounds #4144
+704
−67
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the average pooling computation usually the divisor is the product of dimensions of the kernel. But sometimes the divisor computation needs to discount some elements of the computation (e.g., kernel window is clamped).
Below is the full condition that determines if the divisor is computed by the product of dimensions or uses the clamped divisor formula. The clamped divisor formula is computed in the createAvgPoolValueCountIncludePadFalseCase method, and if the condition is true below it escapes createAvgPoolValueCountIncludePadFalseCase and computes the divisor by just the product of kernel dimensions.
The formula previously was incomplete, which was not caught in torch-mlir because it did not have the tests covering these cases (addressed this in this change). The issue was caught in the IREE project - #4079.
In summary the clamped divisor is needed if count_include_pad is false and there is padding or if the count_include_pad is false, ceil_mode is true, and there is at least one non-unitary stride. The former clause is the key of this change. Previously clamped divisor computation was not done if there was no padding. But even when there is no padding, if the ceil_mode is true and strides are not unitary, the kernel window can go out of bounds, and therefore the divisor computation needs to be clamped. PyTorch does this (verified experimentally).
...
createAvgPoolValueCountIncludePadFalseCase(
bool ceilMode, bool countIncludePad, OpTy op,
...
SmallVectorImpl<int64_t> &strideInts,
SmallVectorImpl<int64_t> &paddingInts,
...) {
...
bool hasPadding =
!llvm::all_of(paddingInts, [](int64_t p) { return p == 0; });
bool allStridesUnitary =
llvm::all_of(strideInts, [](int64_t s) { return s == 1; });
bool canKernelWindowGoOutOfBounds =
hasPadding || (ceilMode && !allStridesUnitary);
if (countIncludePad || !canKernelWindowGoOutOfBounds) {
// These cases are not handled here.
return std::nullopt;
}
...
}
See https://pytorch.org/docs/stable/generated/torch.nn.functional.avg_pool2d.html for more information.
@AmosLewis
@rsuderman
@nirvedhmeshram)
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw