Skip to content

Conversation

@MoSal
Copy link
Contributor

@MoSal MoSal commented Mar 25, 2025

Numeric sort can be relatively slow on inputs that are wholly or
mostly numbers. This is more clear when comparing with the speed of
GeneralNumeric.

This change parses whole lines as f64 and stores that info in
LineData. This is faster than doing the parsing two lines at
a time in compare_by().

Benchmarks

# mimalloc = "0.1.44"
# snmalloc-rs = { version = "0.3.8", features = ["native-cpu", "lto"] }
shuf -i 1-1000000 -n 1000000 > shuffled.txt
hyperfine --warmup=4 -r10 '<sort_cmd> -n /tmp/shuffled.txt'

Before

default_release

Benchmark 1: /tmp/before_coreutils_defaults sort -n /tmp/shuffled.txt
  Time (mean ± σ):     363.2 ms ±  10.2 ms    [User: 1906.8 ms, System: 15.9 ms]
  Range (min … max):   350.3 ms … 380.8 ms    10 runs

codegen-units=1 -C target-cpu=native

Benchmark 1: /tmp/before_coreutils_native sort -n /tmp/shuffled.txt
  Time (mean ± σ):     357.9 ms ±   9.8 ms    [User: 1892.2 ms, System: 18.5 ms]
  Range (min … max):   343.9 ms … 375.8 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=mimalloc

Benchmark 1: /tmp/before_coreutils_native_mimalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     342.5 ms ±   5.8 ms    [User: 1810.1 ms, System: 19.9 ms]
  Range (min … max):   332.1 ms … 351.3 ms    10 runs

-C target-cpu=native, global_allocator=snmalloc

Benchmark 1: /tmp/before_coreutils_native_snmalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     339.1 ms ±   7.5 ms    [User: 1809.9 ms, System: 14.1 ms]
  Range (min … max):   332.3 ms … 351.7 ms    10 runs

After

default_release

Benchmark 1: /tmp/fixed_coreutils_defaults sort -n /tmp/shuffled.txt
  Time (mean ± σ):     173.2 ms ±   5.0 ms    [User: 535.1 ms, System: 19.3 ms]
  Range (min … max):   168.1 ms … 182.6 ms    10 runs

codegen-units=1 -C target-cpu=native

Benchmark 1: /tmp/fixed_coreutils_native sort -n /tmp/shuffled.txt
  Time (mean ± σ):     172.3 ms ±   3.8 ms    [User: 535.3 ms, System: 19.9 ms]
  Range (min … max):   166.2 ms … 179.7 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=mimalloc

Benchmark 1: /tmp/fixed_coreutils_native_mimalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     168.6 ms ±   2.0 ms    [User: 529.3 ms, System: 19.7 ms]
  Range (min … max):   164.2 ms … 171.4 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=snmalloc

Benchmark 1: /tmp/fixed_coreutils_native_snmalloc sort -n /tmp/shuffled.txt
  Time (mean ± σ):     165.8 ms ±   3.5 ms    [User: 528.6 ms, System: 11.7 ms]
  Range (min … max):   162.1 ms … 173.7 ms    10 runs

GNU

gcc -march=x86-64 -mtune=generic -O2 ... (Arch package)

Benchmark 1: sort -n /tmp/shuffled.txt
 Time (mean ± σ):     197.8 ms ±   3.4 ms    [User: 891.4 ms, System: 22.3 ms]
 Range (min … max):   193.5 ms … 202.7 ms    10 runs

clang -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg-llvm/coreutils/usr/bin/sort -n /tmp/readme.txt
 Time (mean ± σ):     189.7 ms ±   7.5 ms    [User: 825.8 ms, System: 19.9 ms]
 Range (min … max):   182.4 ms … 209.1 ms    10 runs

gcc -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg/coreutils/usr/bin/sort -n /tmp/shuffled.txt
 Time (mean ± σ):     182.8 ms ±   5.9 ms    [User: 807.3 ms, System: 22.6 ms]
 Range (min … max):   173.6 ms … 194.8 ms    10 runs

sort -g Numbers for comparison

GNU

gcc -march=x86-64 -mtune=generic -O2 ... (Arch package)

Benchmark 1: sort -g /tmp/shuffled.txt
  Time (mean ± σ):     713.8 ms ±  14.5 ms    [User: 3943.0 ms, System: 36.0 ms]
  Range (min … max):   687.3 ms … 737.3 ms    10 runs

clang -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg-llvm/coreutils/usr/bin/sort -g /tmp/shuffled.txt
  Time (mean ± σ):     694.7 ms ±  15.0 ms    [User: 3798.5 ms, System: 43.2 ms]
  Range (min … max):   668.0 ms … 712.8 ms    10 runs

gcc -march=native -O3 -pipe -fstack-protector-strong -fno-plt

Benchmark 1: /tmp/arch_coreutils/pkg/coreutils/usr/bin/sort -g /tmp/shuffled.txt
  Time (mean ± σ):     693.9 ms ±   9.8 ms    [User: 3810.0 ms, System: 38.7 ms]
  Range (min … max):   676.5 ms … 709.6 ms    10 runs

uutils

default_release

Benchmark 1: /tmp/fixed_coreutils_defaults sort -g /tmp/shuffled.txt
  Time (mean ± σ):     256.5 ms ±   5.4 ms    [User: 945.8 ms, System: 15.7 ms]
  Range (min … max):   248.7 ms … 266.4 ms    10 runs

codegen-units=1 -C target-cpu=native

Benchmark 1: /tmp/fixed_coreutils_native sort -g /tmp/shuffled.txt
  Time (mean ± σ):     255.8 ms ±   5.0 ms    [User: 952.6 ms, System: 15.9 ms]
  Range (min … max):   249.2 ms … 263.7 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=mimalloc

Benchmark 1: /tmp/fixed_coreutils_native_mimalloc sort -g /tmp/shuffled.txt
  Time (mean ± σ):     245.1 ms ±   7.4 ms    [User: 926.3 ms, System: 18.7 ms]
  Range (min … max):   236.1 ms … 258.0 ms    10 runs

codegen-units=1 -C target-cpu=native, global_allocator=snmalloc

Benchmark 1: /tmp/fixed_coreutils_native_snmalloc sort -g /tmp/shuffled.txt
  Time (mean ± σ):     240.3 ms ±   6.2 ms    [User: 922.7 ms, System: 11.2 ms]
  Range (min … max):   230.3 ms … 249.2 ms    10 runs

@sylvestre
Copy link
Contributor

Could you please run hyperfine with all the commands at once ?
see
https://github.com/uutils/coreutils/blob/main/docs/src/performance.md

and what clang/gcc have to do here? :)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

@MoSal
Copy link
Contributor Author

MoSal commented Mar 25, 2025

Could you please run hyperfine with all the commands at once ? see https://github.com/uutils/coreutils/blob/main/docs/src/performance.md

% hyperfine --warmup 3 '/tmp/gnu-sort -n /tmp/shuffled.txt' '/tmp/before_coreutils sort -n /tmp/shuffled.txt' '/tmp/after_coreutils sort -n /tmp/shuffled.txt'
Benchmark 1: /tmp/gnu-sort -n /tmp/shuffled.txt
  Time (mean ± σ):     198.2 ms ±   5.8 ms    [User: 884.6 ms, System: 22.0 ms]
  Range (min … max):   187.3 ms … 207.4 ms    15 runs

Benchmark 2: /tmp/before_coreutils sort -n /tmp/shuffled.txt
  Time (mean ± σ):     361.3 ms ±   8.7 ms    [User: 1898.7 ms, System: 18.9 ms]
  Range (min … max):   350.4 ms … 375.3 ms    10 runs

Benchmark 3: /tmp/after_coreutils sort -n /tmp/shuffled.txt
  Time (mean ± σ):     175.1 ms ±   6.7 ms    [User: 536.8 ms, System: 21.6 ms]
  Range (min … max):   169.3 ms … 197.0 ms    16 runs

Summary
  /tmp/after_coreutils sort -n /tmp/shuffled.txt ran
    1.13 ± 0.05 times faster than /tmp/gnu-sort -n /tmp/shuffled.txt
    2.06 ± 0.09 times faster than /tmp/before_coreutils sort -n /tmp/shuffled.txt

@sylvestre
Copy link
Contributor

well done
how did you generate shuffled.txt ?

could you please add your benchmark: https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md

@MoSal
Copy link
Contributor Author

MoSal commented Mar 25, 2025

how did you generate shuffled.txt ?

A simple

shuf -i 1-1000000 -n 1000000 > /tmp/shuffled.txt

or

seq 1 1000000 | sort -R > /tmp/shuffled.txt`

@sylvestre
Copy link
Contributor

thanks :)

@sylvestre
Copy link
Contributor

please update the .md and we are good!

@MoSal
Copy link
Contributor Author

MoSal commented Mar 25, 2025

please update the .md and we are good!

I'm sorry, but I'm not sure what you're referring to.

@dezgeg
Copy link
Contributor

dezgeg commented Mar 25, 2025

What happens with integer numbers that cannot be represented precisely as f64? For example 123456789012345678 and 123456789012345679 should parse into identical f64.

@MoSal
Copy link
Contributor Author

MoSal commented Mar 25, 2025

@dezgeg

Odering::Equal is not trusted if the lines are not fully (stringly) equal. This is commented in the change.

        if let Some(cmp) = a_f64.partial_cmp(b_f64) {
            // don't trust `Ordering::Equal` if lines are not fully equal
            if cmp != Ordering::Equal || a.line == b.line {
                return if global_settings.reverse {
                    cmp.reverse()
                } else {
                    cmp
                };
            }
        }

@sylvestre
Copy link
Contributor

@MoSal could you please add your benchmark: https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md :)
(not the result but the way you generated the file + the hyperfine command)

MoSal added 2 commits March 26, 2025 12:12
 Numeric sort can be relatively slow on inputs that are wholly or
 mostly numbers. This is more clear when comparing with the speed of
 GeneralNumeric.

 This change parses whole lines as f64 and stores that info in
 `LineData`. This is faster than doing the parsing two lines at
 a time in `compare_by()`.

 # Benchmarks

 `shuf -i 1-1000000 -n 1000000 > /tmp/shuffled.txt`

 % hyperfine --warmup 3 \
     '/tmp/gnu-sort -n /tmp/shuffled.txt'
     '/tmp/before_coreutils sort -n /tmp/shuffled.txt'
     '/tmp/after_coreutils sort -n /tmp/shuffled.txt'
 Benchmark 1: /tmp/gnu-sort -n /tmp/shuffled.txt
   Time (mean ± σ):     198.2 ms ±   5.8 ms    [User: 884.6 ms, System: 22.0 ms]
   Range (min … max):   187.3 ms … 207.4 ms    15 runs

 Benchmark 2: /tmp/before_coreutils sort -n /tmp/shuffled.txt
   Time (mean ± σ):     361.3 ms ±   8.7 ms    [User: 1898.7 ms, System: 18.9 ms]
   Range (min … max):   350.4 ms … 375.3 ms    10 runs

 Benchmark 3: /tmp/after_coreutils sort -n /tmp/shuffled.txt
   Time (mean ± σ):     175.1 ms ±   6.7 ms    [User: 536.8 ms, System: 21.6 ms]
   Range (min … max):   169.3 ms … 197.0 ms    16 runs

 Summary
   /tmp/after_coreutils sort -n /tmp/shuffled.txt ran
     1.13 ± 0.05 times faster than /tmp/gnu-sort -n /tmp/shuffled.txt
     2.06 ± 0.09 times faster than /tmp/before_coreutils sort -n /tmp/shuffled.txt

Signed-off-by: Mohammad AlSaleh <CE.Mohammad.AlSaleh@gmail.com>
Signed-off-by: Mohammad AlSaleh <CE.Mohammad.AlSaleh@gmail.com>
@MoSal
Copy link
Contributor Author

MoSal commented Mar 26, 2025

@MoSal could you please add your benchmark: https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md :) (not the result but the way you generated the file + the hyperfine command)

Done. Also shortened the original commit message, replacing all the redundant benchmarks with the single hyperfine run.

@RenjiSann RenjiSann merged commit fb16585 into uutils:main Apr 1, 2025
67 of 68 checks passed
@RenjiSann
Copy link
Collaborator

Thank you for your contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants