Benchmark is wrong #52

qezz · 2019-08-02T19:57:34Z

Issue

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

Experiment

Here is my run for a simple thing

➜  ~/tmp echo '{"hello": "world"}' > test.txt

➜  ~/tmp cat test.txt
{"hello": "world"}

➜  ~/tmp hyperfine 'sd "(\w+)" "$1$1" test.txt'
Benchmark #1: sd "(\w+)" "$1$1" test.txt
  Time (mean ± σ):       6.7 ms ±   1.1 ms    [User: 3.2 ms, System: 1.8 ms]
  Range (min … max):     5.6 ms …  12.2 ms    245 runs

➜  ~/tmp cat test.txt
{"": ""}

Please pay attention to the second cat output.
This is the reason why almost every run of sd is so fast (except the first one) — it doesn't do anything but just reading the file.

The following command should be used to compete with sed:

hyperfine 'sd -p "(\w+)" "\$1\$1" test.txt > /dev/null'

Please note the escaped groups \$1 and the preview option -p

Experiment Results

Here are my results for a 120 MB file

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

➜  ~/tmp hyperfine \
'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: sed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):      5.724 s ±  0.056 s    [User: 5.489 s, System: 0.146 s]
  Range (min … max):    5.656 s …  5.849 s    10 runs

Benchmark #2: sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):      2.614 s ±  0.034 s    [User: 2.493 s, System: 0.084 s]
  Range (min … max):    2.569 s …  2.676 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.590 s ±  0.216 s    [User: 12.087 s, System: 0.303 s]
  Range (min … max):   12.403 s … 13.150 s    10 runs

Summary
  'sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null' ran
    2.19 ± 0.04 times faster than 'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null'
    4.82 ± 0.10 times faster than 'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

Thoughts

~~Even if we fixed the benchmark, I do think that we are capped with pipe throughput.~~

UPD: Ok, apparently pipe is not a problem.

Platform

MBP 2015, 2.7 GHz Intel Core i5

The text was updated successfully, but these errors were encountered:

qezz · 2019-08-02T23:55:47Z

So, an important update!

Even my benchmark above is broken - sed on mac is not the same as on Linux. Therefore, I switched to gsed

➜  ~/tmp hyperfine \
'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):     39.251 s ±  2.217 s    [User: 37.303 s, System: 0.765 s]
  Range (min … max):   37.511 s … 43.916 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):     37.544 s ±  0.723 s    [User: 36.282 s, System: 0.594 s]
  Range (min … max):   36.911 s … 38.991 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.599 s ±  0.183 s    [User: 12.076 s, System: 0.307 s]
  Range (min … max):   12.430 s … 12.940 s    10 runs

Summary
  'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null' ran
    2.98 ± 0.07 times faster than 'gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null'
    3.12 ± 0.18 times faster than 'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null'

sd is about 3 times faster, than gsed, but still is not like the advertised 11x.

chmln · 2019-08-03T01:27:23Z

Hello @qezz

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

The command for the benchmark was written before the -p option was introduced.

I'm glad you tried to replicate the results. I will investigate potential performance regressions as soon as I get some free time.

Linus789 · 2021-05-07T16:15:18Z

I tried to replicate the results as well, even with the commit 324fd1c where the benchmarks were added to the README.md. But with no success, I can’t reach the advertised 11x either. As @qezz already mentioned, it seems like the benchmark is wrong:

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

My benchmark for the commit 324fd1c:

Benchmark #1: sed -i -E "s:(\w+):\1\1:g" dump.json
  Time (mean ± σ):      7.791 s ±  0.076 s    [User: 7.583 s, System: 0.166 s]
  Range (min … max):    7.723 s …  7.935 s    10 runs
 
Benchmark #2: sed -i 's:\(\w\+\):\1\1:g' dump.json
  Time (mean ± σ):      7.877 s ±  0.157 s    [User: 7.672 s, System: 0.160 s]
  Range (min … max):    7.712 s …  8.121 s    10 runs
 
Benchmark #3: sd -i "(\w+)" "\$1\$1" dump.json
  Time (mean ± σ):      4.292 s ±  0.040 s    [User: 3.983 s, System: 0.271 s]
  Range (min … max):    4.240 s …  4.372 s    10 runs
 
Summary
  'sd -i "(\w+)" "\$1\$1" dump.json' ran
    1.82 ± 0.02 times faster than 'sed -i -E "s:(\w+):\1\1:g" dump.json'
    1.84 ± 0.04 times faster than 'sed -i 's:\(\w\+\):\1\1:g' dump.json'

CosmicHorrorDev self-assigned this May 12, 2023

CosmicHorrorDev added the M-needs triage Meta: Maintainer label me! label May 17, 2023

CosmicHorrorDev added this to the v0.8.0 Release milestone May 17, 2023

CosmicHorrorDev modified the milestones: v1.0.0 Release, v1.1.0 Release Aug 19, 2023

zamazan4ik mentioned this issue Oct 7, 2023

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #237

Open

CosmicHorrorDev added C-docs Category: Anything and everything related to documentation and removed M-needs triage Meta: Maintainer label me! labels Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark is wrong #52

Benchmark is wrong #52

qezz commented Aug 2, 2019 •

edited

Loading

qezz commented Aug 2, 2019

chmln commented Aug 3, 2019

Linus789 commented May 7, 2021

Benchmark is wrong #52

Benchmark is wrong #52

Comments

qezz commented Aug 2, 2019 • edited Loading

Issue

Experiment

Experiment Results

Thoughts

Platform

qezz commented Aug 2, 2019

chmln commented Aug 3, 2019

Linus789 commented May 7, 2021

qezz commented Aug 2, 2019 •

edited

Loading