Skip to content

Benchmark

Yasuhiro Yamada edited this page Jun 25, 2020 · 4 revisions

Precondition for benchmark

Environment

  • Platform: AWS t3.medium (vCPU x 2, Memory 4 GiB)

  • Storage: EBS volume gp2 / 200 GiB (600 IOPS)

$ cat /etc/issue
Ubuntu 18.04.1 LTS \n \l

$ uname -r -v -m -o
5.3.0-1019-aws #21~18.04.1-Ubuntu SMP Mon May 11 12:33:03 UTC 2020 x86_64 GNU/Linux
$ sed --version
sed (GNU sed) 4.4
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ zsh --version
zsh 5.4.2 (x86_64-ubuntu-linux-gnu)

teip built by cargo build --release --target x86_64-unknown-linux-musl.

$ teip --version
teip: 1.2.0

$ ldd $(which teip)
        not a dynamic executable

File to be processed

$ wc test_secure
  1078333  13068857 104857674 test_secure

$ cat test_secure
May 26 03:19:26 localhost sshd[17872]: Received disconnect from 192.0.2.152 port 29864:11:  [preauth]
May 26 03:19:26 localhost sshd[17872]: Disconnected from 192.0.2.78 port 29864 [preauth]
May 26 03:21:10 localhost sshd[17927]: Invalid user amavis1 from 192.0.2.148 port 53364
May 26 03:21:10 localhost sshd[17927]: input_userauth_request: invalid user amavis1 [preauth]
May 26 03:21:10 localhost sshd[17927]: Received disconnect from 192.0.2.189 port 53364:11: Bye Bye [preauth]
...

$ grep -oE '([0-9]{1,3}\.){3}[0-9]{1,3}' test_secure | wc -l
761231

How benchmarking

Measuring the time to take all IP addresses in the file to be masked.

  • Replace all the IP address in the file with @@@.@@@.@@@.@@@, like this.
May 26 03:19:26 localhost sshd[17872]: Received disconnect from @@@.@@@.@@@.@@@ port 29864:11:  [preauth]
May 26 03:19:26 localhost sshd[17872]: Disconnected from @@@.@@@.@@@.@@@ port 29864 [preauth]
...
  • Print the result to /dev/null during the benchmark

  • Clear the page cache before hand.

  • The regular expression ([0-9]{1,3}\.){3}[0-9]{1,3} is used to match the IP address

  • Input is given by the redirection < test_secure on Zsh

  • time and pv commands are used to measure the actual processing time

  • Try three times and calculate the average

  • Here are the cases for benchmarking

  • (1) awk(gsub)

$ awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure
  • (2) sed(s//)
$ sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure
  • (3) teip + awk(gsub)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure
  • (4) teip + sed(s//)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure

But they may be unfair benchmarks for teip. Because the last two cases use the same regular expression twice. Try the following two cases which uses the regular expression only once per execution. The target commands are just printing @@@.@@@.@@@.@@@.

  • (5) teip + sed(i text)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure
  • (6) teip + awk(only print)
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure
  • Check that all the results are same before the benchmark
$ awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure > by_awk
$ sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure > by_sed
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure > by_teip_awk
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure > by_teip_sed
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure > by_teip_awk2
$ teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure > by_teip_sed2
$ md5sum by_*
f6a06ada3478e650a01731325f262508  by_awk
f6a06ada3478e650a01731325f262508  by_sed
f6a06ada3478e650a01731325f262508  by_teip_awk
f6a06ada3478e650a01731325f262508  by_teip_awk2
f6a06ada3478e650a01731325f262508  by_teip_sed
f6a06ada3478e650a01731325f262508  by_teip_sed2

Benchmark result

case 1st(sec) 2nd(sec) 3rd(sec) mean(sec) MiB/sec
awk(gsub) 8.753 8.204 8.212 8.390 11.919
sed(s//) 5.430 5.436 5.312 5.393 18.544
teip + awk(gsub) 4.248 4.383 4.288 4.306 23.222
teip + sed(s//) 3.871 3.886 3.628 3.795 26.350
teip + awk(only print) 2.099 2.303 1.916 2.106 47.483
teip + sed(i text) 1.798 1.831 1.878 1.836 54.476
  • The mean value rounded to the third decimal place.
  • MiB/s ... 104857674 / 2^20 / mean

Here are details.

awk(gsub)

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' test_secure | pv >/dev/null
 103MiB 0:00:08 [11.8MiB/s] [                  <=>                                                                                  ]
awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}'   8.41s user 0.17s system 98% cpu 8.753 total
pv > /dev/null  0.08s user 0.32s system 4% cpu 8.752 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' test_secure | pv >/dev/null
 103MiB 0:00:08 [12.6MiB/s] [                  <=>                                                                                  ]
awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}'   7.96s user 0.19s system 99% cpu 8.204 total
pv > /dev/null  0.05s user 0.30s system 4% cpu 8.203 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' test_secure | pv >/dev/null
 103MiB 0:00:08 [12.6MiB/s] [                  <=>                                                                                  ]
awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}'   7.94s user 0.17s system 98% cpu 8.212 total
pv > /dev/null  0.03s user 0.19s system 2% cpu 8.210 total

sed(s//)

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
 103MiB 0:00:05 [19.0MiB/s] [            <=>                                                                                        ]
sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure  5.21s user 0.19s system 99% cpu 5.430 total
pv > /dev/null  0.06s user 0.35s system 7% cpu 5.428 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
 103MiB 0:00:05 [19.0MiB/s] [            <=>                                                                                        ]
sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure  5.26s user 0.16s system 99% cpu 5.436 total
pv > /dev/null  0.08s user 0.35s system 7% cpu 5.436 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
 103MiB 0:00:05 [19.5MiB/s] [            <=>                                                                                        ]
sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure  5.11s user 0.20s system 99% cpu 5.312 total
pv > /dev/null  0.12s user 0.23s system 6% cpu 5.312 total

teip + awk(gsub)

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure | pv > /dev/null
 103MiB 0:00:04 [24.4MiB/s] [          <=>                                                                                          ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk  < test_secure  3.11s user 0.21s system 78% cpu 4.248 total
pv > /dev/null  0.02s user 0.10s system 2% cpu 4.247 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure | pv > /dev/null
 103MiB 0:00:04 [23.6MiB/s] [          <=>                                                                                          ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk  < test_secure  3.25s user 0.20s system 78% cpu 4.383 total
pv > /dev/null  0.01s user 0.08s system 1% cpu 4.382 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{gsub("([0-9]{1,3}\\.){3}[0-9]{1,3}","@@@.@@@.@@@.@@@",$0);print}' < test_secure | pv > /dev/null
 103MiB 0:00:04 [24.1MiB/s] [          <=>                                                                                          ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk  < test_secure  3.08s user 0.23s system 77% cpu 4.288 total
pv > /dev/null  0.02s user 0.09s system 2% cpu 4.288 total

teip + sed(s//)

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
 103MiB 0:00:03 [27.1MiB/s] [        <=>                                                                                            ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r  < test_secure  3.26s user 0.22s system 89% cpu 3.871 total
pv > /dev/null  0.02s user 0.09s system 2% cpu 3.869 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
 103MiB 0:00:03 [26.6MiB/s] [        <=>                                                                                            ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r  < test_secure  3.45s user 0.16s system 92% cpu 3.886 total
pv > /dev/null  0.03s user 0.10s system 3% cpu 3.886 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/@@@.@@@.@@@.@@@/g' < test_secure | pv > /dev/null
 103MiB 0:00:03 [28.5MiB/s] [        <=>                                                                                            ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -r  < test_secure  3.31s user 0.17s system 96% cpu 3.628 total
pv > /dev/null  0.02s user 0.10s system 3% cpu 3.628 total

teip + awk(only print)

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure | pv > /dev/null
 103MiB 0:00:02 [49.4MiB/s] [      <=>                                                                                              ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' <   2.64s user 0.23s system 136% cpu 2.099 total
pv > /dev/null  0.03s user 0.07s system 4% cpu 2.099 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure | pv > /dev/null
 103MiB 0:00:02 [44.9MiB/s] [      <=>                                                                                              ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' <   3.09s user 0.18s system 141% cpu 2.303 total
pv > /dev/null  0.04s user 0.07s system 4% cpu 2.303 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' < test_secure | pv > /dev/null
 103MiB 0:00:01 [54.1MiB/s] [    <=>                                                                                                ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- awk '{print "@@@.@@@.@@@.@@@"}' <   2.59s user 0.26s system 148% cpu 1.916 total
pv > /dev/null  0.02s user 0.10s system 5% cpu 1.916 total

teip + sed(i text)

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure | pv > /dev/null
 103MiB 0:00:01 [57.6MiB/s] [    <=>                                                                                                ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' <   2.46s user 0.21s system 148% cpu 1.798 total
pv > /dev/null  0.02s user 0.09s system 6% cpu 1.797 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure | pv > /dev/null
 103MiB 0:00:01 [56.7MiB/s] [    <=>                                                                                                ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' <   2.52s user 0.19s system 147% cpu 1.831 total
pv > /dev/null  0.03s user 0.09s system 6% cpu 1.830 total
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' < test_secure | pv > /dev/null
 103MiB 0:00:01 [55.2MiB/s] [    <=>                                                                                                ]
teip -og '([0-9]{1,3}\.){3}[0-9]{1,3}' -- sed -n 'i@@@.@@@.@@@.@@@' <   2.46s user 0.27s system 145% cpu 1.878 total
pv > /dev/null  0.04s user 0.08s system 6% cpu 1.878 total