sort: immediately compare whole lines if they parse as numbers #7567

MoSal · 2025-03-25T07:09:53Z

Numeric sort can be relatively slow on inputs that are wholly or
mostly numbers. This is more clear when comparing with the speed of
GeneralNumeric.

This change parses whole lines as f64 and stores that info in
LineData. This is faster than doing the parsing two lines at
a time in compare_by().

Benchmarks

# mimalloc = "0.1.44"
# snmalloc-rs = { version = "0.3.8", features = ["native-cpu", "lto"] }
shuf -i 1-1000000 -n 1000000 > shuffled.txt
hyperfine --warmup=4 -r10 '<sort_cmd> -n /tmp/shuffled.txt'

Before

default_release

Benchmark 1: /tmp/before_coreutils_defaults sort -n /tmp/shuffled.txt
  Time (mean ± σ):     363.2 ms ±  10.2 ms    [User: 1906.8 ms, System: 15.9 ms]
  Range (min … max):   350.3 ms … 380.8 ms    10 runs

codegen-units=1 -C target-cpu=native

Benchmark 1: /tmp/before_coreutils_native sort -n /tmp/shuffled.txt
  Time (mean ± σ):     357.9 ms ±   9.8 ms    [User: 1892.2 ms, System: 18.5 ms]
  Range (min … max):   343.9 ms … 375.8 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=mimalloc

Benchmark 1: /tmp/before_coreutils_native_mimalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     342.5 ms ±   5.8 ms    [User: 1810.1 ms, System: 19.9 ms]
  Range (min … max):   332.1 ms … 351.3 ms    10 runs

-C target-cpu=native, global_allocator=snmalloc

Benchmark 1: /tmp/before_coreutils_native_snmalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     339.1 ms ±   7.5 ms    [User: 1809.9 ms, System: 14.1 ms]
  Range (min … max):   332.3 ms … 351.7 ms    10 runs

After

default_release

Benchmark 1: /tmp/fixed_coreutils_defaults sort -n /tmp/shuffled.txt
  Time (mean ± σ):     173.2 ms ±   5.0 ms    [User: 535.1 ms, System: 19.3 ms]
  Range (min … max):   168.1 ms … 182.6 ms    10 runs

codegen-units=1 -C target-cpu=native

Benchmark 1: /tmp/fixed_coreutils_native sort -n /tmp/shuffled.txt
  Time (mean ± σ):     172.3 ms ±   3.8 ms    [User: 535.3 ms, System: 19.9 ms]
  Range (min … max):   166.2 ms … 179.7 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=mimalloc

Benchmark 1: /tmp/fixed_coreutils_native_mimalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     168.6 ms ±   2.0 ms    [User: 529.3 ms, System: 19.7 ms]
  Range (min … max):   164.2 ms … 171.4 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=snmalloc

Benchmark 1: /tmp/fixed_coreutils_native_snmalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     165.8 ms ±   3.5 ms    [User: 528.6 ms, System: 11.7 ms]
  Range (min … max):   162.1 ms … 173.7 ms    10 runs

GNU

gcc -march=x86-64 -mtune=generic -O2 ... (Arch package)

Benchmark 1: sort -n /tmp/shuffled.txt
 Time (mean ± σ):     197.8 ms ±   3.4 ms    [User: 891.4 ms, System: 22.3 ms]
 Range (min … max):   193.5 ms … 202.7 ms    10 runs

clang -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg-llvm/coreutils/usr/bin/sort -n /tmp/readme.txt
 Time (mean ± σ):     189.7 ms ±   7.5 ms    [User: 825.8 ms, System: 19.9 ms]
 Range (min … max):   182.4 ms … 209.1 ms    10 runs

gcc -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg/coreutils/usr/bin/sort -n /tmp/shuffled.txt
 Time (mean ± σ):     182.8 ms ±   5.9 ms    [User: 807.3 ms, System: 22.6 ms]
 Range (min … max):   173.6 ms … 194.8 ms    10 runs

`sort -g` Numbers for comparison

GNU

gcc -march=x86-64 -mtune=generic -O2 ... (Arch package)

Benchmark 1: sort -g /tmp/shuffled.txt
  Time (mean ± σ):     713.8 ms ±  14.5 ms    [User: 3943.0 ms, System: 36.0 ms]
  Range (min … max):   687.3 ms … 737.3 ms    10 runs

clang -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg-llvm/coreutils/usr/bin/sort -g /tmp/shuffled.txt
  Time (mean ± σ):     694.7 ms ±  15.0 ms    [User: 3798.5 ms, System: 43.2 ms]
  Range (min … max):   668.0 ms … 712.8 ms    10 runs

gcc -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg/coreutils/usr/bin/sort -g /tmp/shuffled.txt
  Time (mean ± σ):     693.9 ms ±   9.8 ms    [User: 3810.0 ms, System: 38.7 ms]
  Range (min … max):   676.5 ms … 709.6 ms    10 runs

uutils

default_release

Benchmark 1: /tmp/fixed_coreutils_defaults sort -g /tmp/shuffled.txt
  Time (mean ± σ):     256.5 ms ±   5.4 ms    [User: 945.8 ms, System: 15.7 ms]
  Range (min … max):   248.7 ms … 266.4 ms    10 runs

codegen-units=1 -C target-cpu=native

Benchmark 1: /tmp/fixed_coreutils_native sort -g /tmp/shuffled.txt
  Time (mean ± σ):     255.8 ms ±   5.0 ms    [User: 952.6 ms, System: 15.9 ms]
  Range (min … max):   249.2 ms … 263.7 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=mimalloc

Benchmark 1: /tmp/fixed_coreutils_native_mimalloc sort -g /tmp/shuffled.txt
  Time (mean ± σ):     245.1 ms ±   7.4 ms    [User: 926.3 ms, System: 18.7 ms]
  Range (min … max):   236.1 ms … 258.0 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=snmalloc

Benchmark 1: /tmp/fixed_coreutils_native_snmalloc sort -g /tmp/shuffled.txt
  Time (mean ± σ):     240.3 ms ±   6.2 ms    [User: 922.7 ms, System: 11.2 ms]
  Range (min … max):   230.3 ms … 249.2 ms    10 runs

sylvestre · 2025-03-25T07:58:38Z

Could you please run hyperfine with all the commands at once ?
see
https://github.com/uutils/coreutils/blob/main/docs/src/performance.md

and what clang/gcc have to do here? :)

github-actions · 2025-03-25T08:03:59Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

MoSal · 2025-03-25T08:16:27Z

Could you please run hyperfine with all the commands at once ? see https://github.com/uutils/coreutils/blob/main/docs/src/performance.md

% hyperfine --warmup 3 '/tmp/gnu-sort -n /tmp/shuffled.txt' '/tmp/before_coreutils sort -n /tmp/shuffled.txt' '/tmp/after_coreutils sort -n /tmp/shuffled.txt'
Benchmark 1: /tmp/gnu-sort -n /tmp/shuffled.txt
  Time (mean ± σ):     198.2 ms ±   5.8 ms    [User: 884.6 ms, System: 22.0 ms]
  Range (min … max):   187.3 ms … 207.4 ms    15 runs

Benchmark 2: /tmp/before_coreutils sort -n /tmp/shuffled.txt
  Time (mean ± σ):     361.3 ms ±   8.7 ms    [User: 1898.7 ms, System: 18.9 ms]
  Range (min … max):   350.4 ms … 375.3 ms    10 runs

Benchmark 3: /tmp/after_coreutils sort -n /tmp/shuffled.txt
  Time (mean ± σ):     175.1 ms ±   6.7 ms    [User: 536.8 ms, System: 21.6 ms]
  Range (min … max):   169.3 ms … 197.0 ms    16 runs

Summary
  /tmp/after_coreutils sort -n /tmp/shuffled.txt ran
    1.13 ± 0.05 times faster than /tmp/gnu-sort -n /tmp/shuffled.txt
    2.06 ± 0.09 times faster than /tmp/before_coreutils sort -n /tmp/shuffled.txt

sylvestre · 2025-03-25T08:39:17Z

well done
how did you generate shuffled.txt ?

could you please add your benchmark: https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md

MoSal · 2025-03-25T08:52:46Z

how did you generate shuffled.txt ?

A simple

shuf -i 1-1000000 -n 1000000 > /tmp/shuffled.txt

or

seq 1 1000000 | sort -R > /tmp/shuffled.txt`

sylvestre · 2025-03-25T08:58:39Z

thanks :)

sylvestre · 2025-03-25T12:20:51Z

please update the .md and we are good!

MoSal · 2025-03-25T16:37:40Z

please update the .md and we are good!

I'm sorry, but I'm not sure what you're referring to.

dezgeg · 2025-03-25T16:59:52Z

What happens with integer numbers that cannot be represented precisely as f64? For example 123456789012345678 and 123456789012345679 should parse into identical f64.

MoSal · 2025-03-25T18:01:45Z

@dezgeg

Odering::Equal is not trusted if the lines are not fully (stringly) equal. This is commented in the change.

        if let Some(cmp) = a_f64.partial_cmp(b_f64) {
            // don't trust `Ordering::Equal` if lines are not fully equal
            if cmp != Ordering::Equal || a.line == b.line {
                return if global_settings.reverse {
                    cmp.reverse()
                } else {
                    cmp
                };
            }
        }

sylvestre · 2025-03-26T07:48:45Z

@MoSal could you please add your benchmark: https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md :)
(not the result but the way you generated the file + the hyperfine command)

Numeric sort can be relatively slow on inputs that are wholly or mostly numbers. This is more clear when comparing with the speed of GeneralNumeric. This change parses whole lines as f64 and stores that info in `LineData`. This is faster than doing the parsing two lines at a time in `compare_by()`. # Benchmarks `shuf -i 1-1000000 -n 1000000 > /tmp/shuffled.txt` % hyperfine --warmup 3 \ '/tmp/gnu-sort -n /tmp/shuffled.txt' '/tmp/before_coreutils sort -n /tmp/shuffled.txt' '/tmp/after_coreutils sort -n /tmp/shuffled.txt' Benchmark 1: /tmp/gnu-sort -n /tmp/shuffled.txt Time (mean ± σ): 198.2 ms ± 5.8 ms [User: 884.6 ms, System: 22.0 ms] Range (min … max): 187.3 ms … 207.4 ms 15 runs Benchmark 2: /tmp/before_coreutils sort -n /tmp/shuffled.txt Time (mean ± σ): 361.3 ms ± 8.7 ms [User: 1898.7 ms, System: 18.9 ms] Range (min … max): 350.4 ms … 375.3 ms 10 runs Benchmark 3: /tmp/after_coreutils sort -n /tmp/shuffled.txt Time (mean ± σ): 175.1 ms ± 6.7 ms [User: 536.8 ms, System: 21.6 ms] Range (min … max): 169.3 ms … 197.0 ms 16 runs Summary /tmp/after_coreutils sort -n /tmp/shuffled.txt ran 1.13 ± 0.05 times faster than /tmp/gnu-sort -n /tmp/shuffled.txt 2.06 ± 0.09 times faster than /tmp/before_coreutils sort -n /tmp/shuffled.txt Signed-off-by: Mohammad AlSaleh <CE.Mohammad.AlSaleh@gmail.com>

Signed-off-by: Mohammad AlSaleh <CE.Mohammad.AlSaleh@gmail.com>

MoSal · 2025-03-26T09:15:31Z

@MoSal could you please add your benchmark: https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md :) (not the result but the way you generated the file + the hyperfine command)

Done. Also shortened the original commit message, replacing all the redundant benchmarks with the single hyperfine run.

RenjiSann · 2025-04-01T10:14:46Z

Thank you for your contribution !

MoSal force-pushed the faster_sort_n branch from 35a94bf to 937a4bc Compare March 25, 2025 07:25

MoSal force-pushed the faster_sort_n branch from 937a4bc to af045cd Compare March 26, 2025 09:08

MoSal added 2 commits March 26, 2025 12:12

sort: expand numeric sort section in BENCHMARKING.md a bit

410da77

Signed-off-by: Mohammad AlSaleh <CE.Mohammad.AlSaleh@gmail.com>

MoSal force-pushed the faster_sort_n branch from af045cd to 410da77 Compare March 26, 2025 09:13

RenjiSann merged commit fb16585 into uutils:main Apr 1, 2025
67 of 68 checks passed

BrewTestBot mentioned this pull request May 24, 2025

uutils-coreutils 0.1.0 Homebrew/homebrew-core#224645

Merged

moonfruit mentioned this pull request May 26, 2025

uutils-selected 0.1.0 moonfruit/homebrew-tap#243

Closed

Uh oh!

sort: immediately compare whole lines if they parse as numbers #7567

sort: immediately compare whole lines if they parse as numbers #7567

Uh oh!

Conversation

MoSal commented Mar 25, 2025

Benchmarks

Before

After

GNU

sort -g Numbers for comparison

GNU

uutils

Uh oh!

sylvestre commented Mar 25, 2025

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

MoSal commented Mar 25, 2025

Uh oh!

sylvestre commented Mar 25, 2025

Uh oh!

MoSal commented Mar 25, 2025

Uh oh!

sylvestre commented Mar 25, 2025

Uh oh!

sylvestre commented Mar 25, 2025

Uh oh!

MoSal commented Mar 25, 2025

Uh oh!

dezgeg commented Mar 25, 2025

Uh oh!

MoSal commented Mar 25, 2025

Uh oh!

sylvestre commented Mar 26, 2025

Uh oh!

MoSal commented Mar 26, 2025

Uh oh!

Uh oh!

RenjiSann commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`sort -g` Numbers for comparison