[libc] Improve qsort #120450

Voultapher · 2024-12-18T16:41:24Z

This PR improves the libc qsort and qsort_r implementation in a variety of ways:

Vastly improved performance, 2+x improvement in many scenarios, see below.
Improved usage safety, the current implementation allows for trivial out-of-bounds read and write UB if the comparison function does not implement a strict weak ordering. I've written about this in more depth here. For reference glibc qsort considers this a safety critical bug and does bounds checking. The current implementation contains code like this while ((compare_i = array.elem_compare(i, pivot)) < 0) ++i; which goes out-of-bounds if the comparison function does not implement a strict weak ordering. The new implementation either does bounds checking or loop bounds independent of the comparison result.
Better worst case algorithmic complexity. The current implementation is a classical quicksort implementation with a worst case expected time to sort the data of O(N^2). The new implementation improves that to O(N x log(N)) by limiting the recursion depth and switching to heapsort if a limit is hit, this idea comes from introsort. In addition the new implementation achieves O(N x log(K)) where K is the number of distinct elements by using equal element filtering as pioneered by pdqsort.

There is one downside, binary-size. The current implementation is very basic and uses dynamic dispatch to re-use the same implementation for qsort and qsort_r. In practice with default -O3 settings on x86-64 the binary-size qsort and qsort_r contribute in total to libc goes from 1kB to 18kB, this is a net 17kB increase. It's possible to reduce binary-size further but doing so comes at the price of significant performance reduction. The new implementation - idisort (ipnsort derived introsort sort) - makes rather conservative choices, with a higher binary-size budget things like efficient full ascending and descending handling or more size specializations are possible.

For some background, together with @orlp I worked on improving the Rust standard library sort implementations which was merged earlier this year and is part of stable Rust for several month now. The ideas in idisort are largely derived from the learnings gained by developing ipnsort. The major performance difference comes from having an opaque comparison function as forced by the qsort interface, which limits the kind of optimizations that can be done in code an by the compiler.

Benchmark setup

Linux 6.11
clang version 18.1.8
rustc 1.85.0-nightly (28fc2ba71 2024-11-24)
AMD Ryzen 9 5900X 12-Core Processor (Zen 3 micro-architecture)
CPU boost enabled.

Benchmark results

The used methodology is explained in detail here.

The current LLVM libc qsort implementation is referenced as c_llvm_libc_unstable, the new implementation is referenced as c_idisort_unstable. There are other implementations included for a comparison of the state of the art.

This graph compares the performance of two implementations against each other, values above zero imply speedups for the new implementation and values below zero imply slowdowns. Each dot represents one input length for a specific pattern combination.

Expand this for additional benchmark results with other types:

random_p5 at an input length of 22k peaks at a 558x performance improvement.
Keep in mind this is an artifical test. But generally large types see quite bad performance in the current implementation.

This PR is best reviewed by individual commits, they represent logical increments that are all individually buildable and correct. The commit messages explain the reasoning behind changes in more depth.

This is my first time contributing to the LLVM libc so please point out structural mistakes.

This changes serves both immediate purposes and lays the groundwork for further changes. Previously the qsort implementation used a comparator construct that dispatched the comparison function through a tagged union, this incurs an additional branch on each comparison to check the tag `comp_type`. There are also structural changes to the quick_sort implementation to lay the groundwork for future changes to the partition and equal element handling logic. The comparator also did an equal idx comparison for each comparison. While this may sound like a nice optimization for expensive to compare types, a well constructed implementation will never compare an element to itself. In effect this change reduces the required comparison for integer sorting from 4 to 2 per comparison operation in the implementation. Performance impact, the tested types i32, u64, string and f128 improve by ~1.5x, and 1k by ~2.4x for the random pattern. Equal element patterns see an algorithmic degradation, but this will be addressed by a future change. Nice to have, the `[[clang::no_sanitize("function")]]` is no longer needed.

This commit ports the glidesort recursive median pivot selection. Without a dedicated small-sort this pivot selection is worse for cheap to compare types, but it is a precursor to optimally leverage pdqsort style equal element filtering.

This commits the technique introduced by Orson Peters in pdqsort to efficiently handle inputs with repeated elements for example as found in zipfian or low cardinality distributions. The random_z1 pattern sees a ~3.5x improvement in performance for input length 10k across the different types. This change implies O(N x log(K)) for inputs with K distinct elements.

The combination of a high quality pivot selection function with equal element filtering as added in the previous commit makes for quicksort that in practice hardly ever hits quadratic run-time scaling, but malicious inputs and accidents are possible. By limiting the recursion depth and switching to heapsort - introsort the second i in idisort - we can guarantee a worst case expected time to sort the data of O(N x log(N)).

By itself this optimization only nets a ~1.2x improvement for types like i32, u64 and f128, but it unlocks ~2x improvements for partitioning. The cost is a modest increase in binary-size of libc, `unstable_sort` is only instantiated twice, which raises the total from 2 to 8 instantiations in the library. But since the fixed size instantiations have much cheaper swap routines they only add ~1.5k each vs the ~4.5k generic instantiation. In effect binary size for qsort and qsort_r goes from ~9k to ~18k.

Uses a simplified version of the branchless partition scheme used in ipnsort. This achieves a ~2.5x perf improvement for i32, u64 and f128 for random patterns. This is mainly achieved by avoiding branch misprediction penalties.

The new impl is better for types < 100 bytes and worse for very large types like 1k, but still not bad, plus it has smaller binary-size footprint and avoids a surprising performance cliff past `STACK_ARRAY_SIZE` as well as heuristics. Also remove inline "hints" it's the wrong way to do it and the situation is simple enough that the compiler should now best and it's the wrong way to hint it anyway.

The pivot selection will pick 0 as the pivot index if the array length is < 8, which is a common recursion leaf case. This isn't a big perf win more on the order 1-2% but's it's trivial and more or less free.

github-actions · 2024-12-18T16:41:44Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-12-18T16:42:21Z

@llvm/pr-subscribers-libc

Author: Lukas Bergdoll (Voultapher)

Changes

This PR improves the libc qsort and qsort_r implementation in a variety of ways:

Vastly improved performance, 2+x improvement in many scenarios, see below.
Improved usage safety, the current implementation allows for trivial
out-of-bounds read and write UB if the comparison function does not implement
a strict weak ordering. I've written about this in more depth
here.
For reference glibc qsort considers this a safety critical
bug and does bounds
checking. The current implementation contains code like this while ((compare_i = array.elem_compare(i, pivot)) < 0) ++i; which goes
out-of-bounds if the comparison function does not implement a strict weak
ordering. The new implementation either does bounds checking or loop bounds
independent of the comparison result.
Better worst case algorithmic complexity. The current implementation is a classical quicksort implementation with a worst case expected time to sort the data of O(N^2). The new implementation improves that to O(N x log(N)) by limiting the recursion depth and switching to heapsort if a limit is hit, this idea comes from introsort. In addition the new implementation achieves O(N x log(K)) where K is the number of distinct elements by using equal element filtering as pioneered by pdqsort.

There is one downside, binary-size. The current implementation is very basic and uses dynamic dispatch to re-use the same implementation for qsort and qsort_r. In practice with default -O3 settings on x86-64 the binary-size qsort and qsort_r contribute in total to libc goes from 1kB to 18kB, this is a net 17kB increase. It's possible to reduce binary-size further but doing so comes at the price of significant reduction in performance. The new implementation-- idisort (ipnsort derived introsort sort) - makes rather conservative choices, with a higher binary-size budget things like efficient full ascending and descending handling or more size specializations are possible.

For some background, together with @orlp I worked on improving the Rust standard library sort implementations which was merged earlier this year and is part of stable Rust for several month now. The ideas in idisort are largely derived from the learnings gained by developing ipnsort. The major performance difference comes from having an opaque comparison function as forced by the qsort interface, which limits the kind of optimizations that can be done in code an by the compiler.

Benchmark setup

Linux 6.11
clang version 18.1.8
rustc 1.85.0-nightly (28fc2ba71 2024-11-24)
AMD Ryzen 9 5900X 12-Core Processor (Zen 3 micro-architecture)
CPU boost enabled.

Benchmark results

The used methodology is explained in detail here.

The current LLVM libc qsort implementation is referenced as c_llvm_libc_unstable, the new implementation is referenced as c_idisort_unstable. There are other implementations included for a comparison of the state of the art.

This graph compares the performance of two implementations against each other, values above zero imply speedups for the new implementation and values below zero imply slowdowns. Each dot represents one input length for a specific pattern combination.

Expand this for additional benchmark results with other types:

random_p5 at an input length of 22k peaks at a 558x performance improvement.
Keep in mind this is an artifical test. But generally large types see quite bad performance in the current implementation.

This PR is best reviewed by individual commits, they represent logical increments that are all individually buildable and correct. The commit messages explain the reasoning behind changes in more depth.

This is my first time contributing to the LLVM libc so please point out structural mistakes.

Patch is 39.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120450.diff

14 Files Affected:

(modified) libc/src/stdlib/heap_sort.h (+6-6)
(modified) libc/src/stdlib/qsort.cpp (+5-8)
(modified) libc/src/stdlib/qsort_data.h (+128-69)
(added) libc/src/stdlib/qsort_pivot.h (+92)
(modified) libc/src/stdlib/qsort_r.cpp (+5-6)
(modified) libc/src/stdlib/qsort_util.h (+37-2)
(modified) libc/src/stdlib/quick_sort.h (+144-56)
(modified) libc/test/src/stdlib/CMakeLists.txt (+1-13)
(modified) libc/test/src/stdlib/SortingTest.h (+60-103)
(modified) libc/test/src/stdlib/heap_sort_test.cpp (+15-4)
(modified) libc/test/src/stdlib/qsort_r_test.cpp (+2-2)
(modified) libc/test/src/stdlib/qsort_test.cpp (+9-5)
(removed) libc/test/src/stdlib/quick_sort_test.cpp (-16)
(modified) utils/bazel/llvm-project-overlay/libc/test/src/stdlib/BUILD.bazel (+2-10)

diff --git a/libc/src/stdlib/heap_sort.h b/libc/src/stdlib/heap_sort.h
index ccb9ec5f82149e..de08984497782a 100644
--- a/libc/src/stdlib/heap_sort.h
+++ b/libc/src/stdlib/heap_sort.h
@@ -18,11 +18,12 @@ namespace internal {
 // A simple in-place heapsort implementation.
 // Follow the implementation in https://en.wikipedia.org/wiki/Heapsort.
 
-LIBC_INLINE void heap_sort(const Array &array) {
-  size_t end = array.size();
+template <typename A, typename F>
+void heap_sort(const A &array, const F &is_less) {
+  size_t end = array.len();
   size_t start = end / 2;
 
-  auto left_child = [](size_t i) -> size_t { return 2 * i + 1; };
+  const auto left_child = [](size_t i) -> size_t { return 2 * i + 1; };
 
   while (end > 1) {
     if (start > 0) {
@@ -40,12 +41,11 @@ LIBC_INLINE void heap_sort(const Array &array) {
     while (left_child(root) < end) {
       size_t child = left_child(root);
       // If there are two children, set child to the greater.
-      if (child + 1 < end &&
-          array.elem_compare(child, array.get(child + 1)) < 0)
+      if ((child + 1 < end) && is_less(array.get(child), array.get(child + 1)))
         ++child;
 
       // If the root is less than the greater child
-      if (array.elem_compare(root, array.get(child)) >= 0)
+      if (!is_less(array.get(root), array.get(child)))
         break;
 
       // Swap the root with the greater child and continue sifting down.
diff --git a/libc/src/stdlib/qsort.cpp b/libc/src/stdlib/qsort.cpp
index 65a63c239f5c0d..1402f7963c389e 100644
--- a/libc/src/stdlib/qsort.cpp
+++ b/libc/src/stdlib/qsort.cpp
@@ -18,14 +18,11 @@ namespace LIBC_NAMESPACE_DECL {
 LLVM_LIBC_FUNCTION(void, qsort,
                    (void *array, size_t array_size, size_t elem_size,
                     int (*compare)(const void *, const void *))) {
-  if (array == nullptr || array_size == 0 || elem_size == 0)
-    return;
-  internal::Comparator c(compare);
-
-  auto arr = internal::Array(reinterpret_cast<uint8_t *>(array), array_size,
-                             elem_size, c);
-
-  internal::sort(arr);
+  internal::unstable_sort(
+      array, array_size, elem_size,
+      [compare](const void *a, const void *b) noexcept -> bool {
+        return compare(a, b) < 0;
+      });
 }
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdlib/qsort_data.h b/libc/src/stdlib/qsort_data.h
index c529d55ca46ffd..f7446ec2bb637d 100644
--- a/libc/src/stdlib/qsort_data.h
+++ b/libc/src/stdlib/qsort_data.h
@@ -11,96 +11,155 @@
 
 #include "src/__support/CPP/cstddef.h"
 #include "src/__support/macros/config.h"
+#include "src/string/memory_utils/inline_memcpy.h"
+#include "src/string/memory_utils/inline_memmove.h"
 
 #include <stdint.h>
 
 namespace LIBC_NAMESPACE_DECL {
 namespace internal {
 
-using Compare = int(const void *, const void *);
-using CompareWithState = int(const void *, const void *, void *);
-
-enum class CompType { COMPARE, COMPARE_WITH_STATE };
-
-struct Comparator {
-  union {
-    Compare *comp_func;
-    CompareWithState *comp_func_r;
-  };
-  const CompType comp_type;
-
-  void *arg;
-
-  Comparator(Compare *func)
-      : comp_func(func), comp_type(CompType::COMPARE), arg(nullptr) {}
-
-  Comparator(CompareWithState *func, void *arg_val)
-      : comp_func_r(func), comp_type(CompType::COMPARE_WITH_STATE),
-        arg(arg_val) {}
-
-#if defined(__clang__)
-  // Recent upstream changes to -fsanitize=function find more instances of
-  // function type mismatches. One case is with the comparator passed to this
-  // class. Libraries will tend to pass comparators that take pointers to
-  // varying types while this comparator expects to accept const void pointers.
-  // Ideally those tools would pass a function that strictly accepts const
-  // void*s to avoid UB, or would use qsort_r to pass their own comparator.
-  [[clang::no_sanitize("function")]]
-#endif
-  int comp_vals(const void *a, const void *b) const {
-    if (comp_type == CompType::COMPARE) {
-      return comp_func(a, b);
-    } else {
-      return comp_func_r(a, b, arg);
-    }
+// Returns the max amount of bytes deemed reasonable - based on the target
+// properties - for use in local stack arrays.
+constexpr size_t max_stack_array_size() {
+  // Uses target pointer size as heuristic how much memory is available and
+  // unlikely to run into stack overflow and perf problems.
+  constexpr size_t ptr_diff_size = sizeof(ptrdiff_t);
+
+  if constexpr (ptr_diff_size >= 8) {
+    return 4096;
   }
-};
 
-class Array {
-  uint8_t *array;
-  size_t array_size;
+  if constexpr (ptr_diff_size == 4) {
+    return 512;
+  }
+
+  // 8-bit platforms are just not gonna work well with libc, qsort
+  // won't be the problem.
+  // 16-bit platforms ought to be able to store 64 bytes on the stack.
+  return 64;
+}
+
+class ArrayGenericSize {
+  uint8_t *array_base;
+  size_t array_len;
   size_t elem_size;
-  Comparator compare;
+
+  uint8_t *get_internal(size_t i) const noexcept {
+    return array_base + (i * elem_size);
+  }
 
 public:
-  Array(uint8_t *a, size_t s, size_t e, Comparator c)
-      : array(a), array_size(s), elem_size(e), compare(c) {}
-
-  uint8_t *get(size_t i) const { return array + i * elem_size; }
-
-  void swap(size_t i, size_t j) const {
-    uint8_t *elem_i = get(i);
-    uint8_t *elem_j = get(j);
-    for (size_t b = 0; b < elem_size; ++b) {
-      uint8_t temp = elem_i[b];
-      elem_i[b] = elem_j[b];
-      elem_j[b] = temp;
-    }
+  ArrayGenericSize(uint8_t *a, size_t s, size_t e) noexcept
+      : array_base(a), array_len(s), elem_size(e) {}
+
+  static constexpr bool has_fixed_size() { return false; }
+
+  void *get(size_t i) const noexcept {
+    return reinterpret_cast<void *>(get_internal(i));
   }
 
-  int elem_compare(size_t i, const uint8_t *other) const {
-    // An element must compare equal to itself so we don't need to consult the
-    // user provided comparator.
-    if (get(i) == other)
-      return 0;
-    return compare.comp_vals(get(i), other);
+  void swap(size_t i, size_t j) const noexcept {
+    // It's possible to use 8 byte blocks with `uint64_t`, but that
+    // generates more machine code as the remainder loop gets
+    // unrolled, plus 4 byte operations are more likely to be
+    // efficient on a wider variety of hardware. On x86 LLVM tends
+    // to unroll the block loop again into 2 16 byte swaps per
+    // iteration which is another reason that 4 byte blocks yields
+    // good performance even for big types.
+    using block_t = uint32_t;
+    constexpr size_t BLOCK_SIZE = sizeof(block_t);
+
+    uint8_t *elem_i = get_internal(i);
+    uint8_t *elem_j = get_internal(j);
+
+    const size_t elem_size_rem = elem_size % BLOCK_SIZE;
+    const block_t *elem_i_block_end =
+        reinterpret_cast<block_t *>(elem_i + (elem_size - elem_size_rem));
+
+    block_t *elem_i_block = reinterpret_cast<block_t *>(elem_i);
+    block_t *elem_j_block = reinterpret_cast<block_t *>(elem_j);
+
+    while (elem_i_block != elem_i_block_end) {
+      block_t tmp = *elem_i_block;
+      *elem_i_block = *elem_j_block;
+      *elem_j_block = tmp;
+      elem_i_block += 1;
+      elem_j_block += 1;
+    }
+
+    elem_i = reinterpret_cast<uint8_t *>(elem_i_block);
+    elem_j = reinterpret_cast<uint8_t *>(elem_j_block);
+    for (size_t n = 0; n < elem_size_rem; ++n) {
+      uint8_t tmp = elem_i[n];
+      elem_i[n] = elem_j[n];
+      elem_j[n] = tmp;
+    }
   }
 
-  size_t size() const { return array_size; }
+  size_t len() const noexcept { return array_len; }
 
-  // Make an Array starting at index |i| and size |s|.
-  LIBC_INLINE Array make_array(size_t i, size_t s) const {
-    return Array(get(i), s, elem_size, compare);
+  // Make an Array starting at index |i| and length |s|.
+  ArrayGenericSize make_array(size_t i, size_t s) const noexcept {
+    return ArrayGenericSize(get_internal(i), s, elem_size);
   }
 
-  // Reset this Array to point at a different interval of the same items.
-  LIBC_INLINE void reset_bounds(uint8_t *a, size_t s) {
-    array = a;
-    array_size = s;
+  // Reset this Array to point at a different interval of the same
+  // items starting at index |i|.
+  void reset_bounds(size_t i, size_t s) noexcept {
+    array_base = get_internal(i);
+    array_len = s;
   }
 };
 
-using SortingRoutine = void(const Array &);
+// Having a specialized Array type for sorting that knowns at
+// compile-time what the size of the element is, allows for much more
+// efficient swapping and for cheaper offset calculations.
+template <size_t ELEM_SIZE> class ArrayFixedSize {
+  uint8_t *array_base;
+  size_t array_len;
+
+  uint8_t *get_internal(size_t i) const noexcept {
+    return array_base + (i * ELEM_SIZE);
+  }
+
+public:
+  ArrayFixedSize(uint8_t *a, size_t s) noexcept : array_base(a), array_len(s) {}
+
+  // Beware this function is used a heuristic for cheap to swap types, so
+  // instantiating `ArrayFixedSize` with `ELEM_SIZE > 100` is probably a bad
+  // idea perf wise.
+  static constexpr bool has_fixed_size() { return true; }
+
+  void *get(size_t i) const noexcept {
+    return reinterpret_cast<void *>(get_internal(i));
+  }
+
+  void swap(size_t i, size_t j) const noexcept {
+    alignas(32) uint8_t tmp[ELEM_SIZE];
+
+    uint8_t *elem_i = get_internal(i);
+    uint8_t *elem_j = get_internal(j);
+
+    inline_memcpy(tmp, elem_i, ELEM_SIZE);
+    inline_memmove(elem_i, elem_j, ELEM_SIZE);
+    inline_memcpy(elem_j, tmp, ELEM_SIZE);
+  }
+
+  size_t len() const noexcept { return array_len; }
+
+  // Make an Array starting at index |i| and length |s|.
+  ArrayFixedSize<ELEM_SIZE> make_array(size_t i, size_t s) const noexcept {
+    return ArrayFixedSize<ELEM_SIZE>(get_internal(i), s);
+  }
+
+  // Reset this Array to point at a different interval of the same
+  // items starting at index |i|.
+  void reset_bounds(size_t i, size_t s) noexcept {
+    array_base = get_internal(i);
+    array_len = s;
+  }
+};
 
 } // namespace internal
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdlib/qsort_pivot.h b/libc/src/stdlib/qsort_pivot.h
new file mode 100644
index 00000000000000..68716d01c47a0b
--- /dev/null
+++ b/libc/src/stdlib/qsort_pivot.h
@@ -0,0 +1,92 @@
+//===-- Implementation header for qsort utilities ---------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_STDLIB_QSORT_PIVOT_H
+#define LLVM_LIBC_SRC_STDLIB_QSORT_PIVOT_H
+
+#include "src/stdlib/qsort_pivot.h"
+
+#include <stdint.h>
+
+namespace LIBC_NAMESPACE_DECL {
+namespace internal {
+
+// Recursively select a pseudomedian if above this threshold.
+constexpr size_t PSEUDO_MEDIAN_REC_THRESHOLD = 64;
+
+// Selects a pivot from `array`. Algorithm taken from glidesort by Orson Peters.
+//
+// This chooses a pivot by sampling an adaptive amount of points, approximating
+// the quality of a median of sqrt(n) elements.
+template <typename A, typename F>
+size_t choose_pivot(const A &array, const F &is_less) {
+  const size_t len = array.len();
+
+  if (len < 8) {
+    return 0;
+  }
+
+  const size_t len_div_8 = len / 8;
+
+  const size_t a = 0;             // [0, floor(n/8))
+  const size_t b = len_div_8 * 4; // [4*floor(n/8), 5*floor(n/8))
+  const size_t c = len_div_8 * 7; // [7*floor(n/8), 8*floor(n/8))
+
+  if (len < PSEUDO_MEDIAN_REC_THRESHOLD) {
+    return median3(array, a, b, c, is_less);
+  } else {
+    return median3_rec(array, a, b, c, len_div_8, is_less);
+  }
+}
+
+// Calculates an approximate median of 3 elements from sections a, b, c, or
+// recursively from an approximation of each, if they're large enough. By
+// dividing the size of each section by 8 when recursing we have logarithmic
+// recursion depth and overall sample from f(n) = 3*f(n/8) -> f(n) =
+// O(n^(log(3)/log(8))) ~= O(n^0.528) elements.
+template <typename A, typename F>
+size_t median3_rec(const A &array, size_t a, size_t b, size_t c, size_t n,
+                   const F &is_less) {
+  if (n * 8 >= PSEUDO_MEDIAN_REC_THRESHOLD) {
+    const size_t n8 = n / 8;
+    a = median3_rec(array, a, a + (n8 * 4), a + (n8 * 7), n8, is_less);
+    b = median3_rec(array, b, b + (n8 * 4), b + (n8 * 7), n8, is_less);
+    c = median3_rec(array, c, c + (n8 * 4), c + (n8 * 7), n8, is_less);
+  }
+  return median3(array, a, b, c, is_less);
+}
+
+/// Calculates the median of 3 elements.
+template <typename A, typename F>
+size_t median3(const A &array, size_t a, size_t b, size_t c, const F &is_less) {
+  const void *a_ptr = array.get(a);
+  const void *b_ptr = array.get(b);
+  const void *c_ptr = array.get(c);
+
+  const bool x = is_less(a_ptr, b_ptr);
+  const bool y = is_less(a_ptr, c_ptr);
+  if (x == y) {
+    // If x=y=0 then b, c <= a. In this case we want to return max(b, c).
+    // If x=y=1 then a < b, c. In this case we want to return min(b, c).
+    // By toggling the outcome of b < c using XOR x we get this behavior.
+    const bool z = is_less(b_ptr, c_ptr);
+    if (z ^ x) {
+      return c;
+    } else {
+      return b;
+    }
+  } else {
+    // Either c <= a < b or b <= a < c, thus a is our median.
+    return a;
+  }
+}
+
+} // namespace internal
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_STDLIB_QSORT_PIVOT_H
diff --git a/libc/src/stdlib/qsort_r.cpp b/libc/src/stdlib/qsort_r.cpp
index bf61a40e847341..1d340560bcbe12 100644
--- a/libc/src/stdlib/qsort_r.cpp
+++ b/libc/src/stdlib/qsort_r.cpp
@@ -19,13 +19,12 @@ LLVM_LIBC_FUNCTION(void, qsort_r,
                    (void *array, size_t array_size, size_t elem_size,
                     int (*compare)(const void *, const void *, void *),
                     void *arg)) {
-  if (array == nullptr || array_size == 0 || elem_size == 0)
-    return;
-  internal::Comparator c(compare, arg);
-  auto arr = internal::Array(reinterpret_cast<uint8_t *>(array), array_size,
-                             elem_size, c);
 
-  internal::sort(arr);
+  internal::unstable_sort(
+      array, array_size, elem_size,
+      [compare, arg](const void *a, const void *b) noexcept -> bool {
+        return compare(a, b, arg) < 0;
+      });
 }
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdlib/qsort_util.h b/libc/src/stdlib/qsort_util.h
index d42adde06d9762..ee3f737926f530 100644
--- a/libc/src/stdlib/qsort_util.h
+++ b/libc/src/stdlib/qsort_util.h
@@ -27,11 +27,46 @@
 namespace LIBC_NAMESPACE_DECL {
 namespace internal {
 
+template <typename A, typename F> void sort_inst(A &array, const F &is_less) {
 #if LIBC_QSORT_IMPL == LIBC_QSORT_QUICK_SORT
-constexpr auto sort = quick_sort;
+  quick_sort(array, is_less);
 #elif LIBC_QSORT_IMPL == LIBC_QSORT_HEAP_SORT
-constexpr auto sort = heap_sort;
+  heap_sort(array, is_less);
 #endif
+}
+
+template <typename F>
+void unstable_sort(void *array, size_t array_len, size_t elem_size,
+                   const F &is_less) {
+  if (array == nullptr || array_len == 0 || elem_size == 0) {
+    return;
+  }
+
+  uint8_t *array_base = reinterpret_cast<uint8_t *>(array);
+
+  switch (elem_size) {
+  case 4: {
+    auto arr_fixed_size = internal::ArrayFixedSize<4>(array_base, array_len);
+    sort_inst(arr_fixed_size, is_less);
+    return;
+  }
+  case 8: {
+    auto arr_fixed_size = internal::ArrayFixedSize<8>(array_base, array_len);
+    sort_inst(arr_fixed_size, is_less);
+    return;
+  }
+  case 16: {
+    auto arr_fixed_size = internal::ArrayFixedSize<16>(array_base, array_len);
+    sort_inst(arr_fixed_size, is_less);
+    return;
+  }
+  default:
+    auto arr_generic_size =
+        internal::ArrayGenericSize(array_base, array_len, elem_size);
+    sort_inst(arr_generic_size, is_less);
+    return;
+  }
+}
 
 } // namespace internal
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdlib/quick_sort.h b/libc/src/stdlib/quick_sort.h
index 82b90a7d511d99..13cce7ee5d86cd 100644
--- a/libc/src/stdlib/quick_sort.h
+++ b/libc/src/stdlib/quick_sort.h
@@ -9,84 +9,172 @@
 #ifndef LLVM_LIBC_SRC_STDLIB_QUICK_SORT_H
 #define LLVM_LIBC_SRC_STDLIB_QUICK_SORT_H
 
-#include "src/__support/macros/attributes.h"
+#include "src/__support/CPP/cstddef.h"
+#include "src/__support/big_int.h"
 #include "src/__support/macros/config.h"
-#include "src/stdlib/qsort_data.h"
+#include "src/stdlib/qsort_pivot.h"
 
 #include <stdint.h>
 
 namespace LIBC_NAMESPACE_DECL {
 namespace internal {
 
-// A simple quicksort implementation using the Hoare partition scheme.
-LIBC_INLINE size_t partition(const Array &array) {
-  const size_t array_size = array.size();
-  size_t pivot_index = array_size / 2;
-  uint8_t *pivot = array.get(pivot_index);
-  size_t i = 0;
-  size_t j = array_size - 1;
+// Branchless Lomuto partition based on the implementation by Lukas
+// Bergdoll and Orson Peters
+// https://github.com/Voultapher/sort-research-rs/blob/main/writeup/lomcyc_partition/text.md.
+// Simplified to avoid having to stack allocate.
+template <typename A, typename F>
+size_t partition_lomuto_branchless(const A &array, const void *pivot,
+                                   const F &is_less) {
+  const size_t array_len = array.len();
+
+  size_t left = 0;
+  size_t right = 0;
+
+  while (right < array_len) {
+    const bool right_is_lt = is_less(array.get(right), pivot);
+    array.swap(left, right);
+    left += static_cast<size_t>(right_is_lt);
+    right += 1;
+  }
+
+  return left;
+}
+
+// Optimized for large types that are expensive to move. Not optimized
+// for integers. It's possible to use a cyclic permutation here for
+// large types as done in ipnsort but the advantages of this are limited
+// as `is_less` is a small wrapper around a call to a function pointer
+// and won't incur much binary-size overhead. The other reason to use
+// cyclic permutation is to have more efficient swapping, but we don't
+// know the element size so this isn't applicable here either.
+template <typename A, typename F>
+size_t partition_hoare_branchy(const A &array, const void *pivot,
+                               const F &is_less) {
+  const size_t array_len = array.len();
+
+  size_t left = 0;
+  size_t right = array_len;
 
   while (true) {
-    int compare_i, compare_j;
-
-    while ((compare_i = array.elem_compare(i, pivot)) < 0)
-      ++i;
-    while ((compare_j = array.elem_compare(j, pivot)) > 0)
-      --j;
-
-    // At some point i will crossover j so we will definitely break out of
-    // this while loop.
-    if (i >= j)
-      return j + 1;
-
-    array.swap(i, j);
-
-    // The pivot itself might have got swapped so we will update the pivot.
-    if (i == pivot_index) {
-      pivot = array.get(j);
-      pivot_index = j;
-    } else if (j == pivot_index) {
-      pivot = array.get(i);
-      pivot_index = i;
+    while (left < right && is_less(array.get(left), pivot))
+      ++left;
+
+    while (true) {
+      --right;
+      if (left >= right || is_less(array.get(right), pivot)) {
+        break;
+      }
     }
 
-    if (compare_i == 0 && compare_j == 0) {
-      // If we do not move the pointers, we will end up with an
-      // infinite loop as i and j will be stuck without advancing.
-      ++i;
-      --j;
-    }
+    if (left >= right)
+      break;
+
+    array.swap(left, right);
+    ++left;
+  }
+
+  return left;
+}
+
+template <typename A, typename F>
+size_t partition(const A &array, size_t pivot_index, const F &is_less) {
+  // Place the pivot at the beginning of the array.
+  if (pivot_index != 0) {
+    array.swap(0, pivot_index);
   }
+
+  const A array_without_pivot = array.make_array(1, array.len() - 1);
+  const void *pivot = array.get(0);
+
+  size_t num_lt;
+  if constexpr (A::has_fixed_size()) {
+    // Branchless Lomuto avoid branch misprediction penalties, but
+    // it also swaps more often which only is faster if the swap a
+    // constant operation.
+    num_lt = partition_lomuto_branchless(array_without_pivot, pivot, is_less);
+  } else {
+    num_lt = partition_hoare_branchy(array_without_pivot, pivot, is_less);
+  }
+
+  // Place the pivot between the two partitions.
+  array.swap(0, num_lt);
+
+  return num_lt;
 }
 
-LIBC_INLINE void quick_sort(Array array) {
+template <typename A, typename F>
+void quick_sort_impl(A &array, const void *ancestor_pivot, size_t limit,
+       ...
[truncated]

libc/src/stdlib/qsort_data.h

michaelrj-google

Thanks for the patch! Improving the qsort algorithm would be very appreciated. I've left some comments, and I hope we can land this soon.

libc/src/stdlib/qsort_data.h

libc/src/stdlib/quick_sort.h

libc/src/stdlib/qsort_util.h

libc/test/src/stdlib/SortingTest.h

libc/test/src/stdlib/quick_sort_test.cpp

libc/src/stdlib/quick_sort.h

libc/src/stdlib/qsort_util.h

libc/src/stdlib/quick_sort.h

libc/test/src/stdlib/quick_sort_test.cpp

libc/test/src/stdlib/SortingTest.h

SchrodingerZhu

Thank you very much for the patch. I added some comments during my review. I haven't finished reviewing later parts but the code and results look very promising to me.

libc/src/stdlib/qsort_data.h

libc/src/stdlib/qsort_util.h

libc/src/stdlib/qsort_data.h

libc/src/stdlib/qsort_r.cpp

Voultapher · 2024-12-19T17:07:58Z

I've combined all the style and minor changes into a first commit. I'll tackle the incorrect swap impl next.

There might be platform where addressing the user provided array of unknown alignment as `uint32_t` would be incorrect. Generally the previous code violates strict aliasing and thus invokes UB. By using the "magic" properties of `memcpy` we can sidestep the issue by letting it do the load and store for us. In practice this means on platforms where addressing the memory as 4 byte regions is safe and efficient to do - which is most widely used platforms - the compiler can insert the appropriate `mov` instructions inline and if a platform does not support this operation it will likely insert a call to the `memcpy` function.

Voultapher · 2024-12-19T17:35:55Z

I've also fixed the strict aliasing violation, this should be everything that was raised in the review so far, please take another look.

SchrodingerZhu

LGTM with some minor comments.

SchrodingerZhu · 2024-12-19T18:46:29Z

libc/src/stdlib/qsort_pivot.h

+  const size_t len = array.len();
+
+  if (len < 8) {
+    return 0;


This brace can also be removed.

libc/src/stdlib/qsort_pivot.h

libc/src/stdlib/heap_sort.h

libc/src/stdlib/qsort_util.h

michaelrj-google

There are some changes I'd like to make before this gets merged. Typing them up now.

michaelrj-google

This is getting close to complete, I think after one more round of review it'll be good to go. I'm hopeful we can get this landed before I go on break for the holidays.

libc/src/stdlib/qsort_data.h

libc/src/stdlib/qsort_r.cpp

libc/src/stdlib/qsort_util.h

libc/test/src/stdlib/heap_sort_test.cpp

libc/test/src/stdlib/quick_sort_test.cpp

libc/test/src/stdlib/SortingTest.h

libc/src/stdlib/qsort_util.h

libc/src/stdlib/qsort_r.cpp

libc/test/src/stdlib/SortingTest.h

michaelrj-google

Looks pretty much done, I've left one last comment but it should be fine to land after addressing that.

libc/src/stdlib/qsort_util.h

Voultapher · 2024-12-29T14:11:40Z

I think this should be it now, no more unresolved topics as far as I can tell.

SchrodingerZhu · 2024-12-29T19:55:36Z

merge according to previous approval. Thank you again for the patch!

github-actions · 2024-12-29T19:56:02Z

@Voultapher Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvm-ci · 2024-12-29T19:57:57Z

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian-dbg running on libc-x86_64-debian while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/93/builds/12783

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[ RUN      ] LlvmLibcQsortTest.SameElementTwoElementArray
[       OK ] LlvmLibcQsortTest.SameElementTwoElementArray (2 us)
[ RUN      ] LlvmLibcQsortTest.SingleElementArray
[       OK ] LlvmLibcQsortTest.SingleElementArray (1 us)
[ RUN      ] LlvmLibcQsortTest.DifferentElemSizeArray
[       OK ] LlvmLibcQsortTest.DifferentElemSizeArray (34 ms)
Ran 17 tests.  PASS: 17  FAIL: 0
@@@BUILD_STEP libc-fuzzer@@@
Running: ninja libc-fuzzer
[1/3] Building CXX object libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o
FAILED: libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o 
/usr/bin/clang++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -D_DEBUG -I/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg/llvm-project/libc -isystem /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -fsanitize=fuzzer -g -std=gnu++17 -MD -MT libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o -MF libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o.d -o libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o -c /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg/llvm-project/libc/fuzzing/stdlib/heap_sort_fuzz.cpp
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg/llvm-project/libc/fuzzing/stdlib/heap_sort_fuzz.cpp:37:40: error: no member named 'Array' in namespace '__llvm_libc_20_0_0_git::internal'
  auto arr = LIBC_NAMESPACE::internal::Array(
             ~~~~~~~~~~~~~~~~~~~~~~~~~~^
1 error generated.
[2/3] Linking CXX executable libc/fuzzing/stdlib/libc.fuzzing.stdlib.qsort_fuzz
ninja: build stopped: subcommand failed.
['ninja', 'libc-fuzzer'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 164, in step
    yield
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 155, in main
    run_command(['ninja', 'libc-fuzzer'])
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 179, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', 'libc-fuzzer']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@

llvm-ci · 2024-12-29T19:58:19Z

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian-gcc-fullbuild-dbg running on libc-x86_64-debian-fullbuild while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/131/builds/13071

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[ RUN      ] LlvmLibcStrtouint32Test.DecodeInOtherBases
[       OK ] LlvmLibcStrtouint32Test.DecodeInOtherBases (515 ms)
[ RUN      ] LlvmLibcStrtouint32Test.CleanBaseSixteenDecode
[       OK ] LlvmLibcStrtouint32Test.CleanBaseSixteenDecode (75 us)
[ RUN      ] LlvmLibcStrtouint32Test.MessyBaseSixteenDecode
[       OK ] LlvmLibcStrtouint32Test.MessyBaseSixteenDecode (3 us)
[ RUN      ] LlvmLibcStrtouint32Test.AutomaticBaseSelection
[       OK ] LlvmLibcStrtouint32Test.AutomaticBaseSelection (5 us)
Ran 14 tests.  PASS: 14  FAIL: 0
[1107/1116] Building CXX object libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o
FAILED: libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o 
/usr/bin/g++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -D_DEBUG -I/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc -isystem /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wimplicit-fallthrough -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -g -fpie -DLIBC_FULL_BUILD -ffreestanding -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -fext-numeric-literals -Wno-pedantic -std=gnu++17 -MD -MT libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o -MF libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o.d -o libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o -c /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/quick_sort_test.cpp
In file included from /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/quick_sort_test.cpp:9:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h: In lambda function:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:310:34: error: ‘ARRAY_INITIAL_VALS’ is not captured
  310 |         const uint8_t elem_val = ARRAY_INITIAL_VALS[elem_i];
      |                                  ^~~~~~~~~~~~~~~~~~
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:304:32: note: the lambda has no capture-default
  304 |     const auto fill_buf = [&buf](size_t elem_size) {
      |                                ^
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:289:23: note: ‘constexpr const uint8_t ARRAY_INITIAL_VALS [50]’ declared here
  289 |     constexpr uint8_t ARRAY_INITIAL_VALS[] = {
      |                       ^~~~~~~~~~~~~~~~~~
[1108/1116] Building CXX object libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o
FAILED: libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o 
/usr/bin/g++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -D_DEBUG -I/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc -isystem /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wimplicit-fallthrough -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -g -fpie -DLIBC_FULL_BUILD -ffreestanding -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -fext-numeric-literals -Wno-pedantic -std=gnu++17 -MD -MT libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o -MF libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o.d -o libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o -c /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/heap_sort_test.cpp
In file included from /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/heap_sort_test.cpp:9:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h: In lambda function:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:310:34: error: ‘ARRAY_INITIAL_VALS’ is not captured
  310 |         const uint8_t elem_val = ARRAY_INITIAL_VALS[elem_i];
      |                                  ^~~~~~~~~~~~~~~~~~
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:304:32: note: the lambda has no capture-default
  304 |     const auto fill_buf = [&buf](size_t elem_size) {
      |                                ^
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:289:23: note: ‘constexpr const uint8_t ARRAY_INITIAL_VALS [50]’ declared here
  289 |     constexpr uint8_t ARRAY_INITIAL_VALS[] = {
      |                       ^~~~~~~~~~~~~~~~~~
[1109/1116] Building CXX object libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.qsort_r_test.__unit__.__build__.dir/qsort_r_test.cpp.o
[1110/1116] Running unit test libc.test.src.time.nanosleep_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcNanosleep.SmokeTest
[       OK ] LlvmLibcNanosleep.SmokeTest (134 us)
Ran 1 tests.  PASS: 1  FAIL: 0
ninja: build stopped: subcommand failed.
['ninja', 'libc-unit-tests'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 164, in step
    yield
Step 8 (libc-unit-tests) failure: libc-unit-tests (failure)
...
[ RUN      ] LlvmLibcStrtouint32Test.DecodeInOtherBases
[       OK ] LlvmLibcStrtouint32Test.DecodeInOtherBases (515 ms)
[ RUN      ] LlvmLibcStrtouint32Test.CleanBaseSixteenDecode
[       OK ] LlvmLibcStrtouint32Test.CleanBaseSixteenDecode (75 us)
[ RUN      ] LlvmLibcStrtouint32Test.MessyBaseSixteenDecode
[       OK ] LlvmLibcStrtouint32Test.MessyBaseSixteenDecode (3 us)
[ RUN      ] LlvmLibcStrtouint32Test.AutomaticBaseSelection
[       OK ] LlvmLibcStrtouint32Test.AutomaticBaseSelection (5 us)
Ran 14 tests.  PASS: 14  FAIL: 0
[1107/1116] Building CXX object libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o
FAILED: libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o 
/usr/bin/g++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -D_DEBUG -I/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc -isystem /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wimplicit-fallthrough -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -g -fpie -DLIBC_FULL_BUILD -ffreestanding -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -fext-numeric-literals -Wno-pedantic -std=gnu++17 -MD -MT libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o -MF libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o.d -o libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.quick_sort_test.__unit__.__build__.dir/quick_sort_test.cpp.o -c /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/quick_sort_test.cpp
In file included from /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/quick_sort_test.cpp:9:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h: In lambda function:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:310:34: error: ‘ARRAY_INITIAL_VALS’ is not captured
  310 |         const uint8_t elem_val = ARRAY_INITIAL_VALS[elem_i];
      |                                  ^~~~~~~~~~~~~~~~~~
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:304:32: note: the lambda has no capture-default
  304 |     const auto fill_buf = [&buf](size_t elem_size) {
      |                                ^
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:289:23: note: ‘constexpr const uint8_t ARRAY_INITIAL_VALS [50]’ declared here
  289 |     constexpr uint8_t ARRAY_INITIAL_VALS[] = {
      |                       ^~~~~~~~~~~~~~~~~~
[1108/1116] Building CXX object libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o
FAILED: libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o 
/usr/bin/g++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -D_DEBUG -I/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc -isystem /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wimplicit-fallthrough -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -g -fpie -DLIBC_FULL_BUILD -ffreestanding -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -fext-numeric-literals -Wno-pedantic -std=gnu++17 -MD -MT libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o -MF libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o.d -o libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__unit__.__build__.dir/heap_sort_test.cpp.o -c /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/heap_sort_test.cpp
In file included from /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/heap_sort_test.cpp:9:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h: In lambda function:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:310:34: error: ‘ARRAY_INITIAL_VALS’ is not captured
  310 |         const uint8_t elem_val = ARRAY_INITIAL_VALS[elem_i];
      |                                  ^~~~~~~~~~~~~~~~~~
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:304:32: note: the lambda has no capture-default
  304 |     const auto fill_buf = [&buf](size_t elem_size) {
      |                                ^
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/stdlib/SortingTest.h:289:23: note: ‘constexpr const uint8_t ARRAY_INITIAL_VALS [50]’ declared here
  289 |     constexpr uint8_t ARRAY_INITIAL_VALS[] = {
      |                       ^~~~~~~~~~~~~~~~~~
[1109/1116] Building CXX object libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.qsort_r_test.__unit__.__build__.dir/qsort_r_test.cpp.o
[1110/1116] Running unit test libc.test.src.time.nanosleep_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcNanosleep.SmokeTest
[       OK ] LlvmLibcNanosleep.SmokeTest (134 us)
Ran 1 tests.  PASS: 1  FAIL: 0
ninja: build stopped: subcommand failed.
['ninja', 'libc-unit-tests'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 164, in step
    yield

llvm-ci · 2024-12-29T19:58:31Z

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian running on libc-x86_64-debian while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/43/builds/12974

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[ RUN      ] LlvmLibcQsortTest.SameElementTwoElementArray
[       OK ] LlvmLibcQsortTest.SameElementTwoElementArray (1 us)
[ RUN      ] LlvmLibcQsortTest.SingleElementArray
[       OK ] LlvmLibcQsortTest.SingleElementArray (1 us)
[ RUN      ] LlvmLibcQsortTest.DifferentElemSizeArray
[       OK ] LlvmLibcQsortTest.DifferentElemSizeArray (8 ms)
Ran 17 tests.  PASS: 17  FAIL: 0
@@@BUILD_STEP libc-fuzzer@@@
Running: ninja libc-fuzzer
[1/3] Building CXX object libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o
FAILED: libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o 
/usr/bin/clang++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -I/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian/llvm-project/libc -isystem /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fsanitize=fuzzer -O3 -DNDEBUG -std=gnu++17 -MD -MT libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o -MF libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o.d -o libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o -c /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian/llvm-project/libc/fuzzing/stdlib/heap_sort_fuzz.cpp
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian/llvm-project/libc/fuzzing/stdlib/heap_sort_fuzz.cpp:37:40: error: no member named 'Array' in namespace '__llvm_libc_20_0_0_git::internal'
  auto arr = LIBC_NAMESPACE::internal::Array(
             ~~~~~~~~~~~~~~~~~~~~~~~~~~^
1 error generated.
[2/3] Linking CXX executable libc/fuzzing/stdlib/libc.fuzzing.stdlib.qsort_fuzz
ninja: build stopped: subcommand failed.
['ninja', 'libc-fuzzer'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 164, in step
    yield
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 155, in main
    run_command(['ninja', 'libc-fuzzer'])
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 179, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', 'libc-fuzzer']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@

llvm-ci · 2024-12-29T19:59:12Z

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian-dbg-asan running on libc-x86_64-debian while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/147/builds/12612

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[1000/1000] Running unit test libc.test.src.stdio.snprintf_test.__unit__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSNPrintfTest.CutOff
[       OK ] LlvmLibcSNPrintfTest.CutOff (2442 ms)
[ RUN      ] LlvmLibcSNPrintfTest.NoCutOff
[       OK ] LlvmLibcSNPrintfTest.NoCutOff (18 us)
Ran 2 tests.  PASS: 2  FAIL: 0
@@@BUILD_STEP libc-fuzzer@@@
Running: ninja libc-fuzzer
[1/3] Building CXX object libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o
FAILED: libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o 
/usr/bin/clang++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -D_DEBUG -I/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/llvm-project/libc -isystem /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fno-omit-frame-pointer -fsanitize=address -fdiagnostics-color -fsanitize=fuzzer -g -std=gnu++17 -MD -MT libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o -MF libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o.d -o libc/fuzzing/stdlib/CMakeFiles/libc.fuzzing.stdlib.heap_sort_fuzz.dir/heap_sort_fuzz.cpp.o -c /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/llvm-project/libc/fuzzing/stdlib/heap_sort_fuzz.cpp
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/llvm-project/libc/fuzzing/stdlib/heap_sort_fuzz.cpp:37:40: error: no member named 'Array' in namespace '__llvm_libc_20_0_0_git::internal'
  auto arr = LIBC_NAMESPACE::internal::Array(
             ~~~~~~~~~~~~~~~~~~~~~~~~~~^
1 error generated.
[2/3] Linking CXX executable libc/fuzzing/stdlib/libc.fuzzing.stdlib.qsort_fuzz
ninja: build stopped: subcommand failed.
['ninja', 'libc-fuzzer'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 164, in step
    yield
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 155, in main
    run_command(['ninja', 'libc-fuzzer'])
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 179, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/llvm-libc-buildbot/home/sivachandra/libc-x86_64-debian/libc-x86_64-debian-dbg-asan/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', 'libc-fuzzer']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@

SchrodingerZhu · 2024-12-29T20:35:39Z

@Voultapher can u make a quick fix to the build error? If you are not available right now, I will need to revert the patch and reland this with your fix.

This reverts commit d2e71c9.

SchrodingerZhu · 2024-12-29T21:03:35Z

I will go ahead and revert this to keep the bots happy.
Looking forward to your new patch with fix.

Reverts #120450

Voultapher · 2025-01-05T19:11:51Z

@michaelrj-google from what I know Google has extensive performance monitoring in production, once this change reaches systems it would be interesting to get some feedback if it actually improves real world cases. A lot of the perf critical sorting is done via vqsort as far as I understand, but this could still have some impact would be great to know - with the appropriate lack of details of course.

SchrodingerZhu · 2025-01-06T07:23:40Z

@Voultapher In case you are interested:

I did not mention too much details about __builtin_memcpy in the comment before since both functions work as expected.

Basically, in GNU extension, many libc function has its builtin counterparts. They are used to do libcall optimizations. So semantically, calling __builtin_XXX should be understood as the same as calling the c function ::XXX. However, due to lib call optimizations, all of them are usually transformed into compiler intrinsics. For trivial case such as fixed size copy, they are optimized into direct hardware instructions. For large sizes or the cases where it is hard to decide the size or the most efficient copy algorithm, it falls back to a call to memcpy.

__builtin_inline_memcpy is a clang-specific intrinsic. It supports only static sizes and it does not have the same semantic of ::memcpy. It guarantees that no libcall will be ever emitted. Hence it is a good candidate if one wants to use builtin intrinsics to implement memcpy itself.
The LLVM compiler will decide the most proper hardware instructions to do the copy according to the cost model . E.g. on x86, small sized objects are copied using mov or sse/avx2/avx512. And larger sizes are copied using rep movsb if erms is available or just loops on older machines.

Voultapher · 2025-01-06T09:17:31Z

@SchrodingerZhu thanks for the explanation.

michaelrj-google · 2025-01-06T22:18:20Z

@Voultapher I'll be sure to keep you up to date when we roll this out (though that may take a while, the testing process is slow). It will be replacing an older version of glibc's qsort so I'm optimistic there will be decent performance gains. Anyone internally who is specifically using another implementation (like vqsort) won't be affected, since we're just replacing the C symbol.

Reverts llvm/llvm-project#120450

Voultapher added 8 commits December 14, 2024 14:46

Use higher quality pivot selection

5dfe9bb

This commit ports the glidesort recursive median pivot selection. Without a dedicated small-sort this pivot selection is worse for cheap to compare types, but it is a precursor to optimally leverage pdqsort style equal element filtering.

Avoid unnecessary pivot swap for common recursion case

efb9857

The pivot selection will pick 0 as the pivot index if the array length is < 8, which is a common recursion leaf case. This isn't a big perf win more on the order 1-2% but's it's trivial and more or less free.

Voultapher requested review from rupprecht and keith as code owners December 18, 2024 16:41

llvmbot added libc bazel "Peripheral" support tier build system: utils/bazel labels Dec 18, 2024

Voultapher commented Dec 18, 2024

View reviewed changes

libc/src/stdlib/qsort_data.h Outdated Show resolved Hide resolved

SchrodingerZhu requested review from nickdesaulniers, michaelrj-google and lntue December 18, 2024 17:36

michaelrj-google reviewed Dec 18, 2024

View reviewed changes

lntue changed the title ~~Improve qsort~~ [libc] Improve qsort Dec 18, 2024

michaelrj-google reviewed Dec 19, 2024

View reviewed changes

SchrodingerZhu reviewed Dec 19, 2024

View reviewed changes

libc/src/stdlib/qsort_r.cpp Outdated Show resolved Hide resolved

Apply review comments

aecca52

SchrodingerZhu approved these changes Dec 19, 2024

View reviewed changes

nickdesaulniers reviewed Dec 19, 2024

View reviewed changes

libc/src/stdlib/qsort_util.h Outdated Show resolved Hide resolved

nickdesaulniers self-requested a review December 19, 2024 19:10

michaelrj-google requested changes Dec 19, 2024

View reviewed changes

michaelrj-google reviewed Dec 19, 2024

View reviewed changes

libc/src/stdlib/qsort_r.cpp Outdated Show resolved Hide resolved

libc/test/src/stdlib/SortingTest.h Outdated Show resolved Hide resolved

Apply review comments

f4b20ba

michaelrj-google approved these changes Dec 20, 2024

View reviewed changes

libc/src/stdlib/qsort_util.h Outdated Show resolved Hide resolved

Apply review comments

0e62c90

SchrodingerZhu merged commit d2e71c9 into llvm:main Dec 29, 2024
11 checks passed

SchrodingerZhu pushed a commit that referenced this pull request Dec 29, 2024

Revert "[libc] Improve qsort (#120450)"

1237893

This reverts commit d2e71c9.

SchrodingerZhu mentioned this pull request Dec 29, 2024

Revert "[libc] Improve qsort" #121303

Merged

SchrodingerZhu added a commit that referenced this pull request Dec 29, 2024

Revert "[libc] Improve qsort" (#121303)

0b96f1c

Reverts #120450

Voultapher mentioned this pull request Jan 2, 2025

[libc] Improve qsort (with build fix) #121482

Merged

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025

Automerge: Revert "[libc] Improve qsort" (#121303)

cfea350

Reverts llvm/llvm-project#120450

This was referenced Jan 31, 2025

How to improve driftsort with dynamically sized data Voultapher/sort-research-rs#30

Closed

Further improvements to quicksort implementation of qsort #111495

Closed

[libc] Improve qsort #120450

[libc] Improve qsort #120450

Uh oh!

Conversation

Voultapher commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark setup

Benchmark results

Uh oh!

github-actions bot commented Dec 18, 2024

Uh oh!

llvmbot commented Dec 18, 2024

Benchmark setup

Benchmark results

Uh oh!

Uh oh!

michaelrj-google left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SchrodingerZhu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Voultapher commented Dec 19, 2024

Uh oh!

Voultapher commented Dec 19, 2024

Uh oh!

SchrodingerZhu left a comment

Choose a reason for hiding this comment

Uh oh!

SchrodingerZhu Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelrj-google left a comment

Choose a reason for hiding this comment

Uh oh!

michaelrj-google left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelrj-google left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Voultapher commented Dec 29, 2024

Voultapher commented Dec 18, 2024 •

edited

Loading

SchrodingerZhu left a comment •

edited

Loading

SchrodingerZhu commented Jan 6, 2025 •

edited

Loading