Skip to content

Generalize noop_with_empty_axes handling across all Reduce operators#26436

Merged
yuslepukhin merged 11 commits intomicrosoft:mainfrom
naomiOvad:fix/reduction-noop-empty-axes
Jan 5, 2026
Merged

Generalize noop_with_empty_axes handling across all Reduce operators#26436
yuslepukhin merged 11 commits intomicrosoft:mainfrom
naomiOvad:fix/reduction-noop-empty-axes

Conversation

@naomiOvad
Copy link
Contributor

@naomiOvad naomiOvad commented Oct 29, 2025

Description

This PR fixes the behavior of the reduction operators so it's aligned with the ONNX specification.
See ONNX ReduceSumSquare Specification
for the definition of noop_with_empty_axes and expected behavior when axes=[] or when the axes input is not provided.

Main changes:

  • Added a new helper function ApplyNoopEmptyAxesElementwise() to handle the case axes=[] and noop_with_empty_axes=1 for all reduction operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp, etc.).
    This function performs elementwise operations according to the aggregator type:

If the aggregator defines Pre/Post operations (e.g., abs, square, sqrt, log), they are applied elementwise on each element of the input without reduction.

Otherwise, a direct memory copy (memcpy) is performed to efficiently produce an identical output.

  • Introduced a compile-time trait system ReduceAggTraits to detect whether each aggregator defines a PreOp and/or PostOp.
    This allows compile-time specialization and avoids redundant runtime checks.

  • Updated the generic reduction paths (CommonReduce1Loop and CommonReduce2Loops)
    to invoke ApplyNoopEmptyAxesElementwise() when axes=[] and noop_with_empty_axes=1.
    These paths are used by all reduction operators, ensuring consistent handling across the entire operators.

  • Fixed the conditional inside CommonFastReduceCopy: replaced the previous unconditional memcpy and return true with a safe return false, since empty-axes cases are now fully handled upstream by ApplyNoopEmptyAxesElementwise().
    This keeps the control flow explicit and prevents unintended fallback copies.

  • Added unit tests for all affected reduction operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp),and corrected one test that previously expected identity output but now correctly applies Pre/Post operations per the ONNX specification.

These updates ensure spec-compliant behavior for all Reduce ops and slightly improve performance for identity cases.

Motivation and Context

Before this fix, some Reduce operators (e.g., ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp) did not follow the ONNX spec when axes=[] and noop_with_empty_axes=1.

Per the ONNX specification:

No reduction should occur if axes is empty and noop_with_empty_axes=1.

Operators with Pre/Post (e.g., abs, square, sqrt, log) must apply them elementwise.

Others should return the input unchanged.

Fixes #26288
Fixes #25095

Note: This fix currently applies to the CPU Execution Provider only.
Other EPs will need a similar update to align with this behavior.

@naomiOvad naomiOvad closed this Oct 29, 2025
@naomiOvad naomiOvad reopened this Oct 29, 2025
@naomiOvad naomiOvad marked this pull request as ready for review October 30, 2025 07:03
@naomiOvad
Copy link
Contributor Author

@microsoft-github-policy-service agree

…h for all reduction operators)

Fixes microsoft#26288

- Handle all Reduce* operators consistently when axes are empty and noop_with_empty_axes=1
- Apply both Pre-op and Post-op elementwise without performing reduction
- Added comprehensive unit tests for ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, and ReduceLogSumExp covering empty-axes scenarios
Fix CI failures on non-CPU EPs by explicitly configuring DefaultCpuExecutionProvider() and skipping when CPU EP is unavailable. Other EPs do not yet implement the spec-aligned behavior for noop_with_empty_axes, so limiting these tests to CPU keeps CI runs stable. Refs: microsoft#26288, PR microsoft#26436.
@naomiOvad
Copy link
Contributor Author

@fdwr, I’ve pushed fixes for the failed checks but don’t have permissions to re-run them.
Could you please re-run the CI checks? Thanks!

@fdwr
Copy link
Contributor

fdwr commented Nov 13, 2025

@fdwr, I’ve pushed fixes for the failed checks but don’t have permissions to re-run them. Could you please re-run the CI checks? Thanks!

@naomiOvad: Restarted CI's 🫡.

@naomiOvad
Copy link
Contributor Author

naomiOvad commented Dec 16, 2025

cc @xadupre, @yuslepukhin, @skottmckay Could you please review my PR?
Thanks!

@yuslepukhin
Copy link
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR generalizes noop_with_empty_axes handling across all CPU reduction operators to align with the ONNX specification. When noop_with_empty_axes=1 and axes=[], reduction operators should apply their element-wise transformations (such as square, abs, log) without performing reduction, or return the input unchanged for identity operations.

Key changes:

  • Introduced a compile-time trait system (ReduceAggTraits) to distinguish aggregators with element-wise transformations (PreOp/PostOp) from identity operations
  • Added ApplyNoopEmptyAxesElementwise<AGG>() helper function that applies transformations element-wise or uses memcpy for identity cases
  • Updated CommonReduce1Loop and CommonReduce2Loops to invoke the new helper when axes=[] and noop_with_empty_axes=1
  • Fixed CommonFastReduceCopy to return false instead of performing unsafe memcpy for empty axes cases

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc Corrected existing test expectations and added comprehensive test coverage for ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, and ReduceLogSumExp with noop_with_empty_axes behavior
onnxruntime/core/providers/cpu/reduction/reduction_ops.h Introduced ReduceAggTraits template system with specializations for L1, L2, SumSquare, LogSum, and LogSumExp aggregators
onnxruntime/core/providers/cpu/reduction/reduction_ops.cc Implemented ApplyNoopEmptyAxesElementwise helper, updated common reduction paths, and fixed CommonFastReduceCopy to delegate empty axes handling upstream

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yuslepukhin
Copy link
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@fdwr
Copy link
Contributor

fdwr commented Dec 18, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

yuslepukhin
yuslepukhin previously approved these changes Dec 18, 2025
Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@naomiOvad
Copy link
Contributor Author

I see that the Linux x64 training CI is consistently failing.
I'm currently investigating whether this is related to my change and trying to reproduce it locally.

@yuslepukhin
Copy link
Member

1: [ RUN ] GradientCheckerTest.AddGrad
1: 2025-12-19 01:55:18.034439132 [E:onnxruntime:Add, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'node1_Grad/Reshape_4' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:45 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) input_shape_size == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{2,3,2,3}, requested shape:{}

@naomiOvad
Copy link
Contributor Author

I fixed the handling of axes when they are provided as an input tensor. Previously this case was not handled, so axes were incorrectly treated as empty , triggering the noop_with_empty_axes path and leading to the invalid reshape in GradientCheckerTest.AddGrad.

Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fdwr
Copy link
Contributor

fdwr commented Dec 25, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@naomiOvad
Copy link
Contributor Author

Thanks for the reviews!
Let me know if there’s anything else I should do.

@naomiOvad
Copy link
Contributor Author

Hi, Just checking in — I see that all checks have passed.

@yuslepukhin yuslepukhin merged commit 838b17a into microsoft:main Jan 5, 2026
93 of 94 checks passed
alex-spacemit pushed a commit to spacemit-com/onnxruntime that referenced this pull request Jan 20, 2026
…icrosoft#26436)

### Description
<!-- Describe your changes. -->
This PR fixes the behavior of the reduction operators so it's aligned
with the ONNX specification.
[See ONNX ReduceSumSquare
Specification](https://onnx.ai/onnx/operators/onnx__ReduceSumSquare.html)
for the definition of noop_with_empty_axes and expected behavior when
axes=[] or when the axes input is not provided.

Main changes:

- Added a new helper function ApplyNoopEmptyAxesElementwise<AGG>() to
handle the case axes=[] and noop_with_empty_axes=1 for all reduction
operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum,
ReduceLogSumExp, etc.).
This function performs elementwise operations according to the
aggregator type:

If the aggregator defines Pre/Post operations (e.g., abs, square, sqrt,
log), they are applied elementwise on each element of the input without
reduction.

Otherwise, a direct memory copy (memcpy) is performed to efficiently
produce an identical output.

- Introduced a compile-time trait system ReduceAggTraits to detect
whether each aggregator defines a PreOp and/or PostOp.
This allows compile-time specialization and avoids redundant runtime
checks.

- Updated the generic reduction paths (CommonReduce1Loop and
CommonReduce2Loops)
to invoke ApplyNoopEmptyAxesElementwise<AGG>() when axes=[] and
noop_with_empty_axes=1.
These paths are used by all reduction operators, ensuring consistent
handling across the entire operators.

- Fixed the conditional inside CommonFastReduceCopy: replaced the
previous unconditional memcpy and return true with a safe return false,
since empty-axes cases are now fully handled upstream by
ApplyNoopEmptyAxesElementwise<AGG>().
This keeps the control flow explicit and prevents unintended fallback
copies.

- Added unit tests for all affected reduction operators
(ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp),and
corrected one test that previously expected identity output but now
correctly applies Pre/Post operations per the ONNX specification.

These updates ensure spec-compliant behavior for all Reduce ops and
slightly improve performance for identity cases.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
-->Before this fix, some Reduce operators (e.g., ReduceSumSquare,
ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp) did not follow the
ONNX spec when axes=[] and noop_with_empty_axes=1.

Per the ONNX specification:

No reduction should occur if axes is empty and noop_with_empty_axes=1.

Operators with Pre/Post (e.g., abs, square, sqrt, log) must apply them
elementwise.

Others should return the input unchanged.

Fixes microsoft#26288 
Fixes microsoft#25095

**Note:** This fix currently applies to the CPU Execution Provider only.
Other EPs will need a similar update to align with this behavior.
<!-- -If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spec changed for reduction ops when noop_with_empty_axes is True All the reduce ops cause an error with None axis and noop_with_empty_axes=1

3 participants