Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark is wrong #52

Open
qezz opened this issue Aug 2, 2019 · 3 comments
Open

Benchmark is wrong #52

qezz opened this issue Aug 2, 2019 · 3 comments
Assignees
Labels
C-docs Category: Anything and everything related to documentation

Comments

@qezz
Copy link

qezz commented Aug 2, 2019

Issue

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

Experiment

Here is my run for a simple thing

➜  ~/tmp echo '{"hello": "world"}' > test.txt

➜  ~/tmp cat test.txt
{"hello": "world"}

➜  ~/tmp hyperfine 'sd "(\w+)" "$1$1" test.txt'
Benchmark #1: sd "(\w+)" "$1$1" test.txt
  Time (mean ± σ):       6.7 ms ±   1.1 ms    [User: 3.2 ms, System: 1.8 ms]
  Range (min … max):     5.6 ms …  12.2 ms    245 runs

➜  ~/tmp cat test.txt
{"": ""}

Please pay attention to the second cat output.
This is the reason why almost every run of sd is so fast (except the first one) — it doesn't do anything but just reading the file.

The following command should be used to compete with sed:

hyperfine 'sd -p "(\w+)" "\$1\$1" test.txt > /dev/null'

Please note the escaped groups \$1 and the preview option -p

Experiment Results

Here are my results for a 120 MB file

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

➜  ~/tmp hyperfine \
'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: sed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):      5.724 s ±  0.056 s    [User: 5.489 s, System: 0.146 s]
  Range (min … max):    5.656 s …  5.849 s    10 runs

Benchmark #2: sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):      2.614 s ±  0.034 s    [User: 2.493 s, System: 0.084 s]
  Range (min … max):    2.569 s …  2.676 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.590 s ±  0.216 s    [User: 12.087 s, System: 0.303 s]
  Range (min … max):   12.403 s … 13.150 s    10 runs

Summary
  'sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null' ran
    2.19 ± 0.04 times faster than 'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null'
    4.82 ± 0.10 times faster than 'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

Thoughts

Even if we fixed the benchmark, I do think that we are capped with pipe throughput.

UPD: Ok, apparently pipe is not a problem.

Platform

MBP 2015, 2.7 GHz Intel Core i5

@qezz
Copy link
Author

qezz commented Aug 2, 2019

So, an important update!

Even my benchmark above is broken - sed on mac is not the same as on Linux. Therefore, I switched to gsed

➜  ~/tmp hyperfine \
'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):     39.251 s ±  2.217 s    [User: 37.303 s, System: 0.765 s]
  Range (min … max):   37.511 s … 43.916 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):     37.544 s ±  0.723 s    [User: 36.282 s, System: 0.594 s]
  Range (min … max):   36.911 s … 38.991 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.599 s ±  0.183 s    [User: 12.076 s, System: 0.307 s]
  Range (min … max):   12.430 s … 12.940 s    10 runs

Summary
  'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null' ran
    2.98 ± 0.07 times faster than 'gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null'
    3.12 ± 0.18 times faster than 'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null'

sd is about 3 times faster, than gsed, but still is not like the advertised 11x.

@chmln
Copy link
Owner

chmln commented Aug 3, 2019

Hello @qezz

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

The command for the benchmark was written before the -p option was introduced.

I'm glad you tried to replicate the results. I will investigate potential performance regressions as soon as I get some free time.

@Linus789
Copy link
Contributor

Linus789 commented May 7, 2021

I tried to replicate the results as well, even with the commit 324fd1c where the benchmarks were added to the README.md. But with no success, I can’t reach the advertised 11x either. As @qezz already mentioned, it seems like the benchmark is wrong:

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

My benchmark for the commit 324fd1c:

Benchmark #1: sed -i -E "s:(\w+):\1\1:g" dump.json
  Time (mean ± σ):      7.791 s ±  0.076 s    [User: 7.583 s, System: 0.166 s]
  Range (min … max):    7.723 s …  7.935 s    10 runs
 
Benchmark #2: sed -i 's:\(\w\+\):\1\1:g' dump.json
  Time (mean ± σ):      7.877 s ±  0.157 s    [User: 7.672 s, System: 0.160 s]
  Range (min … max):    7.712 s …  8.121 s    10 runs
 
Benchmark #3: sd -i "(\w+)" "\$1\$1" dump.json
  Time (mean ± σ):      4.292 s ±  0.040 s    [User: 3.983 s, System: 0.271 s]
  Range (min … max):    4.240 s …  4.372 s    10 runs
 
Summary
  'sd -i "(\w+)" "\$1\$1" dump.json' ran
    1.82 ± 0.02 times faster than 'sed -i -E "s:(\w+):\1\1:g" dump.json'
    1.84 ± 0.04 times faster than 'sed -i 's:\(\w\+\):\1\1:g' dump.json'

@CosmicHorrorDev CosmicHorrorDev self-assigned this May 12, 2023
@CosmicHorrorDev CosmicHorrorDev added the M-needs triage Meta: Maintainer label me! label May 17, 2023
@CosmicHorrorDev CosmicHorrorDev added this to the v0.8.0 Release milestone May 17, 2023
@CosmicHorrorDev CosmicHorrorDev added C-docs Category: Anything and everything related to documentation and removed M-needs triage Meta: Maintainer label me! labels Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-docs Category: Anything and everything related to documentation
Projects
None yet
Development

No branches or pull requests

4 participants