Generalize noop_with_empty_axes handling across all Reduce operators#26436
Generalize noop_with_empty_axes handling across all Reduce operators#26436yuslepukhin merged 11 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
…h for all reduction operators) Fixes microsoft#26288 - Handle all Reduce* operators consistently when axes are empty and noop_with_empty_axes=1 - Apply both Pre-op and Post-op elementwise without performing reduction - Added comprehensive unit tests for ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, and ReduceLogSumExp covering empty-axes scenarios
…ith_empty_axes behavior
34ce393 to
70c5112
Compare
Fix CI failures on non-CPU EPs by explicitly configuring DefaultCpuExecutionProvider() and skipping when CPU EP is unavailable. Other EPs do not yet implement the spec-aligned behavior for noop_with_empty_axes, so limiting these tests to CPU keeps CI runs stable. Refs: microsoft#26288, PR microsoft#26436.
…nly the intended test update
|
@fdwr, I’ve pushed fixes for the failed checks but don’t have permissions to re-run them. |
@naomiOvad: Restarted CI's 🫡. |
|
cc @xadupre, @yuslepukhin, @skottmckay Could you please review my PR? |
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR generalizes noop_with_empty_axes handling across all CPU reduction operators to align with the ONNX specification. When noop_with_empty_axes=1 and axes=[], reduction operators should apply their element-wise transformations (such as square, abs, log) without performing reduction, or return the input unchanged for identity operations.
Key changes:
- Introduced a compile-time trait system (
ReduceAggTraits) to distinguish aggregators with element-wise transformations (PreOp/PostOp) from identity operations - Added
ApplyNoopEmptyAxesElementwise<AGG>()helper function that applies transformations element-wise or uses memcpy for identity cases - Updated
CommonReduce1LoopandCommonReduce2Loopsto invoke the new helper whenaxes=[]andnoop_with_empty_axes=1 - Fixed
CommonFastReduceCopyto return false instead of performing unsafe memcpy for empty axes cases
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc | Corrected existing test expectations and added comprehensive test coverage for ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, and ReduceLogSumExp with noop_with_empty_axes behavior |
| onnxruntime/core/providers/cpu/reduction/reduction_ops.h | Introduced ReduceAggTraits template system with specializations for L1, L2, SumSquare, LogSum, and LogSumExp aggregators |
| onnxruntime/core/providers/cpu/reduction/reduction_ops.cc | Implemented ApplyNoopEmptyAxesElementwise helper, updated common reduction paths, and fixed CommonFastReduceCopy to delegate empty axes handling upstream |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
I see that the Linux x64 training CI is consistently failing. |
|
1: [ RUN ] GradientCheckerTest.AddGrad |
|
I fixed the handling of axes when they are provided as an input tensor. Previously this case was not handled, so axes were incorrectly treated as empty , triggering the noop_with_empty_axes path and leading to the invalid reshape in GradientCheckerTest.AddGrad. |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
Thanks for the reviews! |
|
Hi, Just checking in — I see that all checks have passed. |
…icrosoft#26436) ### Description <!-- Describe your changes. --> This PR fixes the behavior of the reduction operators so it's aligned with the ONNX specification. [See ONNX ReduceSumSquare Specification](https://onnx.ai/onnx/operators/onnx__ReduceSumSquare.html) for the definition of noop_with_empty_axes and expected behavior when axes=[] or when the axes input is not provided. Main changes: - Added a new helper function ApplyNoopEmptyAxesElementwise<AGG>() to handle the case axes=[] and noop_with_empty_axes=1 for all reduction operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp, etc.). This function performs elementwise operations according to the aggregator type: If the aggregator defines Pre/Post operations (e.g., abs, square, sqrt, log), they are applied elementwise on each element of the input without reduction. Otherwise, a direct memory copy (memcpy) is performed to efficiently produce an identical output. - Introduced a compile-time trait system ReduceAggTraits to detect whether each aggregator defines a PreOp and/or PostOp. This allows compile-time specialization and avoids redundant runtime checks. - Updated the generic reduction paths (CommonReduce1Loop and CommonReduce2Loops) to invoke ApplyNoopEmptyAxesElementwise<AGG>() when axes=[] and noop_with_empty_axes=1. These paths are used by all reduction operators, ensuring consistent handling across the entire operators. - Fixed the conditional inside CommonFastReduceCopy: replaced the previous unconditional memcpy and return true with a safe return false, since empty-axes cases are now fully handled upstream by ApplyNoopEmptyAxesElementwise<AGG>(). This keeps the control flow explicit and prevents unintended fallback copies. - Added unit tests for all affected reduction operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp),and corrected one test that previously expected identity output but now correctly applies Pre/Post operations per the ONNX specification. These updates ensure spec-compliant behavior for all Reduce ops and slightly improve performance for identity cases. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? -->Before this fix, some Reduce operators (e.g., ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp) did not follow the ONNX spec when axes=[] and noop_with_empty_axes=1. Per the ONNX specification: No reduction should occur if axes is empty and noop_with_empty_axes=1. Operators with Pre/Post (e.g., abs, square, sqrt, log) must apply them elementwise. Others should return the input unchanged. Fixes microsoft#26288 Fixes microsoft#25095 **Note:** This fix currently applies to the CPU Execution Provider only. Other EPs will need a similar update to align with this behavior. <!-- -If it fixes an open issue, please link to the issue here. -->
Description
This PR fixes the behavior of the reduction operators so it's aligned with the ONNX specification.
See ONNX ReduceSumSquare Specification
for the definition of noop_with_empty_axes and expected behavior when axes=[] or when the axes input is not provided.
Main changes:
This function performs elementwise operations according to the aggregator type:
If the aggregator defines Pre/Post operations (e.g., abs, square, sqrt, log), they are applied elementwise on each element of the input without reduction.
Otherwise, a direct memory copy (memcpy) is performed to efficiently produce an identical output.
Introduced a compile-time trait system ReduceAggTraits to detect whether each aggregator defines a PreOp and/or PostOp.
This allows compile-time specialization and avoids redundant runtime checks.
Updated the generic reduction paths (CommonReduce1Loop and CommonReduce2Loops)
to invoke ApplyNoopEmptyAxesElementwise() when axes=[] and noop_with_empty_axes=1.
These paths are used by all reduction operators, ensuring consistent handling across the entire operators.
Fixed the conditional inside CommonFastReduceCopy: replaced the previous unconditional memcpy and return true with a safe return false, since empty-axes cases are now fully handled upstream by ApplyNoopEmptyAxesElementwise().
This keeps the control flow explicit and prevents unintended fallback copies.
Added unit tests for all affected reduction operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp),and corrected one test that previously expected identity output but now correctly applies Pre/Post operations per the ONNX specification.
These updates ensure spec-compliant behavior for all Reduce ops and slightly improve performance for identity cases.
Motivation and Context
Before this fix, some Reduce operators (e.g., ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp) did not follow the ONNX spec when axes=[] and noop_with_empty_axes=1.Per the ONNX specification:
No reduction should occur if axes is empty and noop_with_empty_axes=1.
Operators with Pre/Post (e.g., abs, square, sqrt, log) must apply them elementwise.
Others should return the input unchanged.
Fixes #26288
Fixes #25095
Note: This fix currently applies to the CPU Execution Provider only.
Other EPs will need a similar update to align with this behavior.