Skip to content

Conversation

@jmarble
Copy link

@jmarble jmarble commented Nov 18, 2025

Summary

Fixes #20262 by implementing transitive comparison for SORT_REGULAR. This ensures deterministic sort results when arrays contain mixed types, which previously could produce non-deterministic output due to non-transitive comparisons.

Highlights

  • Add transitive comparison functions (php_array_compare_transitive(), php_array_smart_strcmp_transitive(), etc.) with deterministic ordering:
    • Numeric types and numeric strings compare numerically
    • Non-numeric strings sort after numeric types and numeric strings
    • NaN sorts after all other numeric values (IEEE 754 totalOrder)
    • Arrays recurse through transitive comparison
    • Objects (same class) recurse through transitive property comparison
    • Enums sort by object handle (stable grouping for array_unique)
  • Wire php_array_key_compare_unstable_i and php_array_data_compare_unstable_i to use the transitive comparator. All functions using SORT_REGULAR now use the transitive path.
  • Add regression tests for array_unique() (scalars, objects, nested arrays), sort()/ksort() numeric-string edge cases, and enum ordering stability.

Performance Impact

Status: Unoptimized Implementation

This initial implementation prioritizes correctness and transitivity to fix the underlying stability issues. It is not yet optimized.

Overall, across 142 benchmarked operations, the changes result in an average ~1.5% performance improvement, with 92 operations faster and 50 slower. No regressions in standard comparison operations or sorts with other flags (e.g., SORT_NUMERIC).

However, there are known regressions in specific sort operations due to the overhead of the new dispatch logic:

  • Standard Key Sorts (ksort/krsort) on non-mixed keys: ~3–11% slower.

Even in this unoptimized state, the new architecture yields significant wins in common scenarios:

  • Integer Sorts: ~5–20% faster (higher gains in reverse sorts like rsort/arsort).
  • Associative Arrays (Mixed Alphanumeric Keys): >30% faster (up to ~50% in key sorts).

Roadmap:
Once the transitive comparison logic is approved, I will submit a follow-up PR with finely tuned optimizations. These changes are expected to eliminate the current regressions and dramatically improve performance across all remaining operations.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various comments and questions and this needs a rebase as I refactored the sorting code to remove a bunch of duplication.

@jmarble
Copy link
Author

jmarble commented Nov 18, 2025

@Girgias thank you for taking the time to provide the careful review! Looks like I was able to capture your sorting code refactor when I created this new branch. I'll push a fresh commit what I addressed in your code comments. Thanks again for the help!

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only do the fix for the transitivity.

Optimizations can be decided later, but currently it just pollutes the PR and makes it harder to review and merge.

@jmarble jmarble marked this pull request as draft November 21, 2025 16:15
@jmarble jmarble force-pushed the fix-php-array-sort-regular branch from 374a660 to 2ff1700 Compare November 21, 2025 22:03
@jmarble
Copy link
Author

jmarble commented Nov 21, 2025

@Girgias yes, I clearly got a bit carried away haha. I decided to reimplement and force push a clean commit. Sorry for the mess I made of this PR.

I have a bag full of optimizations we can save for a follow-up PR. One worth calling out would be to split zendi_smart_strcmp() so the transitive comparator doesn’t need to re-run the non-transitive fast paths. I also found an opportunity to add a single-bucket fast path in zend_compare_symbol_tables() which showed close to 1.25x speedup on array comparison.

@jmarble jmarble marked this pull request as ready for review November 21, 2025 22:48
Comment on lines 439 to 434
/* Mirrors zend_std_compare_objects(), but recurses via php_array_compare_transitive()
* so nested properties obey SORT_REGULAR's transitive ordering. */
static int php_array_compare_transitive_objects(zval *o1, zval *o2) /* {{{ */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think might make more sense is to create a zend_std_compare_objects_ex() function that takes a function pointer for the prop table comparison if this is identical.

As hopefully the compiler will inline the behaviour properly in zend_std_compare_objects() so that it should be equivalent. As for quite a bit I was trying to understand what the point of this is.

Copy link
Author

@jmarble jmarble Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just gave it a try, and benchmarked it. I saw a small, almost negligable, regression. I see Time-Weighted ΔMedian% increased ~0.9% (from -1.20% to -0.31%).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a different idea. I created a zend_object_compare_kind enum in zend_object_handlers.h and added zend_std_compare_objects_ex(), so the standard object comparator can flip between zend_compare() and a transitive variant (zend_compare_transitive() without going through a function-pointer callback.

To make that transitive mode reusable everywhere, I moved the SORT_REGULAR compare logic into Zend itself (zend_compare_transitive(), plus zend_compare_symbol_tables_transitive() and the enum-aware helpers).

This design showed a negligible difference (within measurement noise) in my benchmarks compared to the current implementation.

I'm happy to push another commit with this change if you'd like to see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that you use the function pointer approach. Because duplicating a lot of code when only the continuation changes is not something I'm a fan of.

@jmarble jmarble marked this pull request as draft November 23, 2025 21:51
@jmarble jmarble force-pushed the fix-php-array-sort-regular branch from 2ff1700 to 9026917 Compare November 24, 2025 06:11
@jmarble jmarble marked this pull request as ready for review November 25, 2025 01:52
@jmarble jmarble marked this pull request as draft November 25, 2025 23:00
Add transitive comparison functions with deterministic ordering:
  - Numeric types and numeric strings compare numerically
  - Non-numeric strings sort after numeric types and numeric strings
  - NaN sorts after all other numeric values
  - Arrays recurse through transitive comparison
  - Objects (same class) recurse through transitive property comparison
  - Enums sort by object handle (stable grouping for array_unique)

Fixes phpGH-20262
@jmarble jmarble force-pushed the fix-php-array-sort-regular branch from 8b258f6 to fcede41 Compare November 26, 2025 06:47
@jmarble jmarble marked this pull request as ready for review November 26, 2025 08:05
@jmarble jmarble requested a review from dstogov as a code owner November 26, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_unique() with SORT_REGULAR returns duplicate values

3 participants