Skip to content

Fix integer overflow in TensorPrimitives IndexOf methods#124274

Open
pine919 wants to merge 7 commits intodotnet:mainfrom
pine919:fix-tensorprimitives-indexofmax
Open

Fix integer overflow in TensorPrimitives IndexOf methods#124274
pine919 wants to merge 7 commits intodotnet:mainfrom
pine919:fix-tensorprimitives-indexofmax

Conversation

@pine919
Copy link

@pine919 pine919 commented Feb 11, 2026

Fixes #124233

Description

IndexOfMax, IndexOfMin, IndexOfMaxMagnitude, and IndexOfMinMagnitude were using Vector<T> for index tracking, where T is the element type. This caused integer overflow when the array length exceeded the maximum value that T could represent.

For example:

  • A byte array with 258 elements would incorrectly return index 1 instead of 257
  • A short array with 32770 elements would incorrectly return index 1 instead of 32769

Changes

Changed index tracking to use Vector<int> throughout the implementation, preventing overflow regardless of array size.

  • Updated IIndexOfOperator<T> interface signatures to use Vector128/256/512<int> for index parameters
  • Modified all four IndexOf operators to use Vector<int> for index tracking
  • Simplified IndexLessThan helper methods to return Vector<int> directly
  • Removed redundant type conversion helper functions
  • Added regression test covering byte, sbyte, short, and ushort types

Testing

Added IndexOfMax_NoIntegerOverflow test that verifies correct behavior for arrays exceeding the element type's maximum value:

  • byte[258] → expects index 257
  • sbyte[258] → expects index 257
  • short[32770] → expects index 32769
  • ushort[65538] → expects index 65537

cc: @dotnet/area-system-numerics-tensors

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 11, 2026
@pine919
Copy link
Author

pine919 commented Feb 11, 2026

@dotnet-policy-service agree

@pine919
Copy link
Author

pine919 commented Feb 11, 2026

@dhartglassMSFT Please review my PR

Comment on lines 15 to 17
static abstract void Invoke(ref Vector128<T> result, Vector128<T> current, ref Vector128<int> resultIndex, Vector128<int> currentIndex);
static abstract void Invoke(ref Vector256<T> result, Vector256<T> current, ref Vector256<int> resultIndex, Vector256<int> currentIndex);
static abstract void Invoke(ref Vector512<T> result, Vector512<T> current, ref Vector512<int> resultIndex, Vector512<int> currentIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of extra churn here that makes it harder to see what is fix vs what is "cleanup" and which will make it harder to backport. Ideally you separate out such changes, particularly for bug fixes, so that we have the minimum churn possible.

Additionally, this particular approach is going to be more expensive and halve or quarter the throughput for the small integer types. It would likely be better to have an additional ref int baseIndex parameter that can be used to adjust resultIndex in the final step based on the T.MaxValue wraparound. This way the core of the logic doesn't change at all and instead we only have a correcting step at the end to account for the mod wraparound

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your time

This commit fixes integer overflow issues in IndexOf methods for byte and short types.

Changes:
- Reverted broken Vector<int> approach that couldn't track all elements
- Added unsigned comparison for short types to handle wrapped indices correctly
- Added overflow detection with scalar fallback for large arrays (>256 for byte, >65536 for short)
- Applied fixes to all IndexOf variants (Max, Min, MaxMagnitude, MinMagnitude)

The fix ensures correctness for arrays where indices exceed the storage capacity
of byte (>255) or short (>32767/65535) types while maintaining performance for
smaller arrays.
@pine919 pine919 force-pushed the fix-tensorprimitives-indexofmax branch from 5ff8f2f to e16748e Compare February 11, 2026 16:07
@stephentoub
Copy link
Member

🤖 Copilot Code Review — PR #124274

Holistic Assessment

Motivation: The bug is real and well-documented in issue #124233. The vectorized IndexOf* methods store indices in Vector<T> where T is the element type, which overflows when array length exceeds byte.MaxValue (256) or short.MaxValue (32768) / ushort.MaxValue (65536).

Approach: The PR takes a scalar-fallback approach rather than fixing the vectorized path to use wider index types. While this correctly fixes the bug, it abandons vectorization for larger arrays of small element types — a potentially significant performance regression for common use cases.

Summary: ⚠️ Needs Human Review. The fix is correct for the reported bug, but the approach may have unintended performance consequences that warrant maintainer input. There are also several issues with the implementation details.


Detailed Findings

⚠️ Performance Regression — Scalar fallback for large arrays

The fix forces scalar processing for arrays larger than 256 elements (byte/sbyte) or 65536 elements (short/ushort). This is a significant performance cliff:

  • byte[] arrays > 256 elements will use scalar path — this is a very small threshold
  • short[] arrays > 65536 elements will use scalar path

For context, a 65KB byte[] is common in many workloads. The performance difference between vectorized and scalar paths can be 4-16x depending on hardware. Consider instead widening the index vector to Vector<int> or Vector<long> to maintain vectorization while supporting larger indices.

Location: TensorPrimitives.IndexOfMax.cs lines 152-157, 243-248, 334-339

⚠️ Incomplete Fix — Byte case not fully addressed

The PR adds unsigned comparison for short types (Vector*.LessThan(indices1.AsUInt16(), indices2.AsUInt16())), but for byte it continues using signed AsByte(). While byte is unsigned by nature in C#, the IndexOfFinalAggregate method's return at line 98 still does:

return resultIndex.As<T, byte>().ToScalar();

This only returns a byte (0-255), but the indices in the vector could have wrapped around. For consistency with the short fix, this should cast to int:

return (int)resultIndex.As<T, byte>().ToScalar();

Although byte is unsigned so this won't produce negative results, it's inconsistent with the short path fix.

Location: TensorPrimitives.IIndexOfOperator.cs line 98

⚠️ Off-by-one threshold — 256 should be 255

The check uses > 256 for byte types, but a byte can represent values 0-255 (256 distinct values). An array of length 256 would have indices 0-255, which fit in a byte. The threshold should be > 255 (or equivalently >= 256).

Similarly, for unsigned short the threshold > 65536 is correct (indices 0-65535 fit), but the comment says "byte and short types" — signed short can only represent indices 0-32767 before going negative. The code treats short and ushort identically with threshold 65536, which is only correct for ushort.

For signed short, index 32768 would be stored as -32768, and unsigned comparison would treat it incorrectly. The threshold for short should be > 32767.

Location: TensorPrimitives.IndexOfMax.cs lines 152-157

⚠️ Test structure — Should use [Theory] pattern

The test uses a single [ConditionalFact] with runtime type checks instead of the preferred [Theory] with [InlineData] or [MemberData]. The existing test patterns in this file use [Fact] per type-parameterized test class, so the runtime if (typeof(T) == ...) checks effectively make this test run only for specific type instantiations while silently passing for others.

Additionally, the test only covers IndexOfMax but the bug affects IndexOfMin, IndexOfMaxMagnitude, and IndexOfMinMagnitude as well.

Location: TensorPrimitivesTests.cs lines 1137-1181

💡 Suggestion — Comment accuracy

The added comment says "For short/ushort, use unsigned comparison to handle indices up to 65535" but this is in the sizeof(T) == 2 block which also handles Half (floating-point). The comment should clarify this is for the index comparison, not the value comparison.

Location: TensorPrimitives.IIndexOfOperator.cs line 56


Summary

The PR correctly identifies and addresses a real bug, but the chosen approach (scalar fallback) has meaningful performance implications that should be discussed. A fix that maintains vectorization by widening the index storage type would be preferable if feasible. The implementation also has threshold accuracy issues (256 vs 255, signed vs unsigned short handling).

I recommend a maintainer weigh in on whether:

  1. The scalar fallback performance trade-off is acceptable, or
  2. A more complex fix that preserves vectorization is warranted

If the scalar fallback approach is accepted, the threshold values need correction.

- Separate signed short (>32768) and unsigned short (>65536) thresholds
- Add explicit int cast for byte case for consistency
- Improve comments to clarify index comparison vs value comparison
- Maintain correct behavior while addressing reviewer concerns
Change overflow protection from > to >= to correctly handle edge cases.
For byte types, max index is 255, so arrays with 256+ elements need
scalar path. Similarly for short (32768) and ushort (65536).

Fixes IndexOfMax_NoIntegerOverflow test failures on multiple platforms.
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Numerics.Tensors community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TensorPrimitives.IndexOfMax produces incorrect results with vectorized paths

4 participants