Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scan and search benchmark #111

Merged
merged 1 commit into from
Nov 3, 2024
Merged

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Oct 21, 2024

Why?

To improve the parsing process, I would like to add benchmarks for all parsing processes.

scan

  • scan_full(regexp, false, true) == StringScanner#check
  • scan_full(regexp, false, false) == StringScanner#match?

CRuby

$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)    10.558M i/s -     10.848M times in 1.027445s (94.71ns/i)
          check(str)    13.368M i/s -     13.782M times in 1.030978s (74.80ns/i)
         match?(reg)    16.080M i/s -     16.247M times in 1.010340s (62.19ns/i)
         match?(str)    23.336M i/s -     23.501M times in 1.007088s (42.85ns/i)
Calculating -------------------------------------
          check(reg)    11.601M i/s -     31.675M times in 2.730287s (86.20ns/i)
          check(str)    15.217M i/s -     40.104M times in 2.635475s (65.72ns/i)
         match?(reg)    18.781M i/s -     48.241M times in 2.568662s (53.25ns/i)
         match?(str)    29.441M i/s -     70.007M times in 2.377840s (33.97ns/i)

Comparison:
         match?(str):  29441324.5 i/s
         match?(reg):  18780543.7 i/s - 1.57x  slower
          check(str):  15217130.1 i/s - 1.93x  slower
          check(reg):  11601371.2 i/s - 2.54x  slower

JRuby

$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)     8.129M i/s -      8.090M times in 0.995222s (123.02ns/i)
          check(str)    16.691M i/s -     16.616M times in 0.995519s (59.91ns/i)
         match?(reg)     8.979M i/s -      9.001M times in 1.002440s (111.37ns/i)
         match?(str)    26.138M i/s -     26.011M times in 0.995150s (38.26ns/i)
Calculating -------------------------------------
          check(reg)    11.808M i/s -     24.387M times in 2.065238s (84.69ns/i)
          check(str)    31.762M i/s -     50.072M times in 1.576495s (31.48ns/i)
         match?(reg)    13.944M i/s -     26.936M times in 1.931719s (71.71ns/i)
         match?(str)    50.872M i/s -     78.414M times in 1.541392s (19.66ns/i)

Comparison:
         match?(str):  50872250.2 i/s
          check(str):  31761544.3 i/s - 1.60x  slower
         match?(reg):  13944219.6 i/s - 3.65x  slower
          check(reg):  11808244.1 i/s - 4.31x  slower

search

  • search_full(regexp, false, true) == StringScanner#check_until
  • search_full(regexp, false, false) == StringScanner#exist?
$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     9.338M i/s -      9.456M times in 1.012573s (107.09ns/i)
    check_until(str)    11.385M i/s -     11.979M times in 1.052173s (87.83ns/i)
         exist?(reg)    13.416M i/s -     13.517M times in 1.007532s (74.54ns/i)
         exist?(str)    17.976M i/s -     18.677M times in 1.038981s (55.63ns/i)
Calculating -------------------------------------
    check_until(reg)    10.297M i/s -     28.015M times in 2.720634s (97.11ns/i)
    check_until(str)    12.684M i/s -     34.156M times in 2.692853s (78.84ns/i)
         exist?(reg)    15.184M i/s -     40.249M times in 2.650786s (65.86ns/i)
         exist?(str)    21.426M i/s -     53.928M times in 2.517008s (46.67ns/i)

Comparison:
         exist?(str):  21425527.1 i/s
         exist?(reg):  15183679.9 i/s - 1.41x  slower
    check_until(str):  12684053.7 i/s - 1.69x  slower
    check_until(reg):  10297134.8 i/s - 2.08x  slower

JRuby

$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     7.646M i/s -      7.649M times in 1.000381s (130.78ns/i)
    check_until(str)    13.075M i/s -     13.010M times in 0.995048s (76.48ns/i)
         exist?(reg)     8.728M i/s -      8.684M times in 0.994921s (114.57ns/i)
         exist?(str)    20.609M i/s -     20.514M times in 0.995399s (48.52ns/i)
Calculating -------------------------------------
    check_until(reg)     9.371M i/s -     22.939M times in 2.447900s (106.71ns/i)
    check_until(str)    22.760M i/s -     39.225M times in 1.723414s (43.94ns/i)
         exist?(reg)    11.758M i/s -     26.185M times in 2.226997s (85.05ns/i)
         exist?(str)    34.564M i/s -     61.827M times in 1.788749s (28.93ns/i)

Comparison:
         exist?(str):  34564306.2 i/s
    check_until(str):  22759878.4 i/s - 1.52x  slower
         exist?(reg):  11757927.4 i/s - 2.94x  slower
    check_until(reg):   9371009.3 i/s - 3.69x  slower

naitoh added a commit to naitoh/rexml that referenced this pull request Oct 21, 2024
## Why?
`StringScanner#match?()` is faster than `StringScanner#check()`.

See: ruby/strscan#111

## Benchmark
```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     18.849      19.403        32.551       34.728 i/s -     100.000 times in 5.305314s 5.153743s 3.072111s 2.879488s
                 sax     27.706      29.435        48.126       52.247 i/s -     100.000 times in 3.609376s 3.397367s 2.077880s 1.913973s
                pull     31.817      33.907        56.941       58.925 i/s -     100.000 times in 3.142961s 2.949250s 1.756193s 1.697082s
              stream     31.120      33.186        52.530       55.816 i/s -     100.000 times in 3.213334s 3.013325s 1.903689s 1.791600s

Comparison:
                              dom
         after(YJIT):        34.7 i/s
        before(YJIT):        32.6 i/s - 1.07x  slower
               after:        19.4 i/s - 1.79x  slower
              before:        18.8 i/s - 1.84x  slower

                              sax
         after(YJIT):        52.2 i/s
        before(YJIT):        48.1 i/s - 1.09x  slower
               after:        29.4 i/s - 1.78x  slower
              before:        27.7 i/s - 1.89x  slower

                             pull
         after(YJIT):        58.9 i/s
        before(YJIT):        56.9 i/s - 1.03x  slower
               after:        33.9 i/s - 1.74x  slower
              before:        31.8 i/s - 1.85x  slower

                           stream
         after(YJIT):        55.8 i/s
        before(YJIT):        52.5 i/s - 1.06x  slower
               after:        33.2 i/s - 1.68x  slower
              before:        31.1 i/s - 1.79x  slower
```

- YJIT=ON : 1.03x - 1.09x faster
- YJIT=OFF : 1.03x - 1.06x faster
@kou
Copy link
Member

kou commented Oct 22, 2024

I agree that we should add this benchmark but can we use more meaningful file name than full?
full is too implementation specific and meaningless for this case. (scan_full()/search_full() mean that they accept "full" options. They don't mean any features.)

@naitoh
Copy link
Contributor Author

naitoh commented Oct 22, 2024

@kou

but can we use more meaningful file name than full?

How about the following file name?

  • scan_search.yaml

@kou
Copy link
Member

kou commented Oct 22, 2024

It's better than "full".

Do we need to use one file for all cases?
If we use 2 files for check/match? and check_until/exist?, can we use better name than "scan_search"?

@naitoh naitoh force-pushed the add_full_benchmark branch from abb447d to 056c258 Compare October 22, 2024 23:38
@naitoh
Copy link
Contributor Author

naitoh commented Oct 22, 2024

I have renamed it to scan_and_search.yaml.

Do we need to use one file for all cases?

In JRuby, there are differences in results for each execution, so I would like to execution everything in one file.

JRuby

$ benchmark-driver benchmark/scan_and_search.yaml
Warming up --------------------------------------
         check(reg1)     7.249M i/s -      7.207M times in 0.994265s (137.95ns/i)
         check(str1)    19.899M i/s -     19.847M times in 0.997392s (50.25ns/i)
   check_until(reg2)     7.918M i/s -      7.918M times in 0.999911s (126.29ns/i)
   check_until(str2)    15.581M i/s -     15.463M times in 0.992425s (64.18ns/i)
        match?(reg1)     8.717M i/s -      8.742M times in 1.002787s (114.71ns/i)
        match?(str1)    25.952M i/s -     25.689M times in 0.989893s (38.53ns/i)
        exist?(reg2)     8.715M i/s -      8.700M times in 0.998353s (114.75ns/i)
        exist?(str2)    20.071M i/s -     19.938M times in 0.993373s (49.82ns/i)
Calculating -------------------------------------
         check(reg1)    12.048M i/s -     21.747M times in 1.804991s (83.00ns/i)
         check(str1)    30.335M i/s -     59.697M times in 1.967916s (32.97ns/i)
   check_until(reg2)     9.803M i/s -     23.755M times in 2.423189s (102.01ns/i)
   check_until(str2)    24.277M i/s -     46.743M times in 1.925398s (41.19ns/i)
        match?(reg1)    13.778M i/s -     26.152M times in 1.898151s (72.58ns/i)
        match?(str1)    51.994M i/s -     77.855M times in 1.497389s (19.23ns/i)
        exist?(reg2)    11.644M i/s -     26.144M times in 2.245393s (85.88ns/i)
        exist?(str2)    32.540M i/s -     60.213M times in 1.850455s (30.73ns/i)

Comparison:
        match?(str1):  51994015.6 i/s 
        exist?(str2):  32539799.8 i/s - 1.60x  slower
         check(str1):  30335149.7 i/s - 1.71x  slower
   check_until(str2):  24277229.9 i/s - 2.14x  slower
        match?(reg1):  13777722.9 i/s - 3.77x  slower
         check(reg1):  12048062.8 i/s - 4.32x  slower
        exist?(reg2):  11643598.9 i/s - 4.47x  slower
   check_until(reg2):   9803153.3 i/s - 5.30x  slower

$ benchmark-driver benchmark/scan_and_search.yaml
Warming up --------------------------------------
         check(reg1)     8.691M i/s -      8.643M times in 0.994453s (115.06ns/i)
         check(str1)    19.273M i/s -     19.116M times in 0.991856s (51.89ns/i)
   check_until(reg2)     7.308M i/s -      7.276M times in 0.995534s (136.83ns/i)
   check_until(str2)    15.596M i/s -     15.458M times in 0.991141s (64.12ns/i)
        match?(reg1)     8.627M i/s -      8.608M times in 0.997741s (115.91ns/i)
        match?(str1)    25.974M i/s -     25.865M times in 0.995829s (38.50ns/i)
        exist?(reg2)     8.417M i/s -      8.475M times in 1.006861s (118.80ns/i)
        exist?(str2)     9.009M i/s -      8.976M times in 0.996368s (111.00ns/i)
Calculating -------------------------------------
         check(reg1)    11.344M i/s -     26.073M times in 2.298316s (88.15ns/i)
         check(str1)    35.082M i/s -     57.819M times in 1.648119s (28.50ns/i)
   check_until(reg2)     9.956M i/s -     21.925M times in 2.202211s (100.44ns/i)
   check_until(str2)    23.878M i/s -     46.788M times in 1.959452s (41.88ns/i)
        match?(reg1)    11.412M i/s -     25.882M times in 2.267979s (87.63ns/i)
        match?(str1)    49.088M i/s -     77.921M times in 1.587374s (20.37ns/i)
        exist?(reg2)    11.628M i/s -     25.252M times in 2.171684s (86.00ns/i)
        exist?(str2)    32.934M i/s -     27.027M times in 0.820653s (30.36ns/i)

Comparison:
        match?(str1):  49087994.1 i/s 
         check(str1):  35081736.9 i/s - 1.40x  slower
        exist?(str2):  32933597.9 i/s - 1.49x  slower
   check_until(str2):  23878238.9 i/s - 2.06x  slower
        exist?(reg2):  11627809.5 i/s - 4.22x  slower
        match?(reg1):  11411983.2 i/s - 4.30x  slower
         check(reg1):  11344492.2 i/s - 4.33x  slower
   check_until(reg2):   9955951.4 i/s - 4.93x  slower

@naitoh naitoh changed the title Add full benchmark Add scan and search benchmark Oct 22, 2024
@kou
Copy link
Member

kou commented Oct 23, 2024

Do we need to compare check and check_until (match? and exist?)?
They are different operations and use cases. (check/match? check only at the current scan pointer and check_until/exist? check from the current scan pointer.)

I think that JRuby's unstable results show a different problem. We may need to use more long target string (and/or pattern) to make the target operations the main operation in the benchmark.

naitoh added a commit to naitoh/rexml that referenced this pull request Oct 26, 2024
## Why?
`StringScanner#match?()` is faster than `StringScanner#check()`.

See: ruby/strscan#111

## Benchmark
```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     18.849      19.403        32.551       34.728 i/s -     100.000 times in 5.305314s 5.153743s 3.072111s 2.879488s
                 sax     27.706      29.435        48.126       52.247 i/s -     100.000 times in 3.609376s 3.397367s 2.077880s 1.913973s
                pull     31.817      33.907        56.941       58.925 i/s -     100.000 times in 3.142961s 2.949250s 1.756193s 1.697082s
              stream     31.120      33.186        52.530       55.816 i/s -     100.000 times in 3.213334s 3.013325s 1.903689s 1.791600s

Comparison:
                              dom
         after(YJIT):        34.7 i/s
        before(YJIT):        32.6 i/s - 1.07x  slower
               after:        19.4 i/s - 1.79x  slower
              before:        18.8 i/s - 1.84x  slower

                              sax
         after(YJIT):        52.2 i/s
        before(YJIT):        48.1 i/s - 1.09x  slower
               after:        29.4 i/s - 1.78x  slower
              before:        27.7 i/s - 1.89x  slower

                             pull
         after(YJIT):        58.9 i/s
        before(YJIT):        56.9 i/s - 1.03x  slower
               after:        33.9 i/s - 1.74x  slower
              before:        31.8 i/s - 1.85x  slower

                           stream
         after(YJIT):        55.8 i/s
        before(YJIT):        52.5 i/s - 1.06x  slower
               after:        33.2 i/s - 1.68x  slower
              before:        31.1 i/s - 1.79x  slower
```

- YJIT=ON : 1.03x - 1.09x faster
- YJIT=OFF : 1.03x - 1.06x faster
naitoh added a commit to naitoh/rexml that referenced this pull request Oct 26, 2024
## Why?
`StringScanner#match?()` is faster than `StringScanner#check()`.

See: ruby/strscan#111

## Benchmark
```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     18.819      19.362        32.846       34.708 i/s -     100.000 times in 5.313905s 5.164791s 3.044500s 2.881200s
                 sax     28.188      29.982        48.386       52.554 i/s -     100.000 times in 3.547597s 3.335304s 2.066732s 1.902809s
                pull     31.962      33.902        57.868       60.662 i/s -     100.000 times in 3.128689s 2.949690s 1.728071s 1.648467s
              stream     31.436      33.030        52.808       56.647 i/s -     100.000 times in 3.181095s 3.027574s 1.893635s 1.765304s

Comparison:
                              dom
         after(YJIT):        34.7 i/s
        before(YJIT):        32.8 i/s - 1.06x  slower
               after:        19.4 i/s - 1.79x  slower
              before:        18.8 i/s - 1.84x  slower

                              sax
         after(YJIT):        52.6 i/s
        before(YJIT):        48.4 i/s - 1.09x  slower
               after:        30.0 i/s - 1.75x  slower
              before:        28.2 i/s - 1.86x  slower

                             pull
         after(YJIT):        60.7 i/s
        before(YJIT):        57.9 i/s - 1.05x  slower
               after:        33.9 i/s - 1.79x  slower
              before:        32.0 i/s - 1.90x  slower

                           stream
         after(YJIT):        56.6 i/s
        before(YJIT):        52.8 i/s - 1.07x  slower
               after:        33.0 i/s - 1.72x  slower
              before:        31.4 i/s - 1.80x  slower

```

- YJIT=ON : 1.06x - 1.09x faster
- YJIT=OFF : 1.02x - 1.06x faster

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
kou added a commit to ruby/rexml that referenced this pull request Oct 27, 2024
## Why?
`StringScanner#match?` is faster than `StringScanner#check`.

See: ruby/strscan#111

## Benchmark
```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     18.819      19.362        32.846       34.708 i/s -     100.000 times in 5.313905s 5.164791s 3.044500s 2.881200s
                 sax     28.188      29.982        48.386       52.554 i/s -     100.000 times in 3.547597s 3.335304s 2.066732s 1.902809s
                pull     31.962      33.902        57.868       60.662 i/s -     100.000 times in 3.128689s 2.949690s 1.728071s 1.648467s
              stream     31.436      33.030        52.808       56.647 i/s -     100.000 times in 3.181095s 3.027574s 1.893635s 1.765304s

Comparison:
                              dom
         after(YJIT):        34.7 i/s
        before(YJIT):        32.8 i/s - 1.06x  slower
               after:        19.4 i/s - 1.79x  slower
              before:        18.8 i/s - 1.84x  slower

                              sax
         after(YJIT):        52.6 i/s
        before(YJIT):        48.4 i/s - 1.09x  slower
               after:        30.0 i/s - 1.75x  slower
              before:        28.2 i/s - 1.86x  slower

                             pull
         after(YJIT):        60.7 i/s
        before(YJIT):        57.9 i/s - 1.05x  slower
               after:        33.9 i/s - 1.79x  slower
              before:        32.0 i/s - 1.90x  slower

                           stream
         after(YJIT):        56.6 i/s
        before(YJIT):        52.8 i/s - 1.07x  slower
               after:        33.0 i/s - 1.72x  slower
              before:        31.4 i/s - 1.80x  slower

```

- YJIT=ON : 1.05x - 1.09x faster
- YJIT=OFF : 1.02x - 1.06x faster

---------

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
# Why?
To improve the parsing process, I would like to add benchmarks for all parsing processes.

## scan
- scan_full(regexp, false, true) == StringScanner#check
- scan_full(regexp, false, false) ==  StringScanner#match?

### CRuby

```
$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)    10.558M i/s -     10.848M times in 1.027445s (94.71ns/i)
          check(str)    13.368M i/s -     13.782M times in 1.030978s (74.80ns/i)
         match?(reg)    16.080M i/s -     16.247M times in 1.010340s (62.19ns/i)
         match?(str)    23.336M i/s -     23.501M times in 1.007088s (42.85ns/i)
Calculating -------------------------------------
          check(reg)    11.601M i/s -     31.675M times in 2.730287s (86.20ns/i)
          check(str)    15.217M i/s -     40.104M times in 2.635475s (65.72ns/i)
         match?(reg)    18.781M i/s -     48.241M times in 2.568662s (53.25ns/i)
         match?(str)    29.441M i/s -     70.007M times in 2.377840s (33.97ns/i)

Comparison:
         match?(str):  29441324.5 i/s
         match?(reg):  18780543.7 i/s - 1.57x  slower
          check(str):  15217130.1 i/s - 1.93x  slower
          check(reg):  11601371.2 i/s - 2.54x  slower
```
### JRuby

```
$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)     8.129M i/s -      8.090M times in 0.995222s (123.02ns/i)
          check(str)    16.691M i/s -     16.616M times in 0.995519s (59.91ns/i)
         match?(reg)     8.979M i/s -      9.001M times in 1.002440s (111.37ns/i)
         match?(str)    26.138M i/s -     26.011M times in 0.995150s (38.26ns/i)
Calculating -------------------------------------
          check(reg)    11.808M i/s -     24.387M times in 2.065238s (84.69ns/i)
          check(str)    31.762M i/s -     50.072M times in 1.576495s (31.48ns/i)
         match?(reg)    13.944M i/s -     26.936M times in 1.931719s (71.71ns/i)
         match?(str)    50.872M i/s -     78.414M times in 1.541392s (19.66ns/i)

Comparison:
         match?(str):  50872250.2 i/s
          check(str):  31761544.3 i/s - 1.60x  slower
         match?(reg):  13944219.6 i/s - 3.65x  slower
          check(reg):  11808244.1 i/s - 4.31x  slower
```

## search
- search_full(regexp, false, true) == StringScanner#check_until
- search_full(regexp, false, false) == StringScanner#exist?
```
$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     9.338M i/s -      9.456M times in 1.012573s (107.09ns/i)
    check_until(str)    11.385M i/s -     11.979M times in 1.052173s (87.83ns/i)
         exist?(reg)    13.416M i/s -     13.517M times in 1.007532s (74.54ns/i)
         exist?(str)    17.976M i/s -     18.677M times in 1.038981s (55.63ns/i)
Calculating -------------------------------------
    check_until(reg)    10.297M i/s -     28.015M times in 2.720634s (97.11ns/i)
    check_until(str)    12.684M i/s -     34.156M times in 2.692853s (78.84ns/i)
         exist?(reg)    15.184M i/s -     40.249M times in 2.650786s (65.86ns/i)
         exist?(str)    21.426M i/s -     53.928M times in 2.517008s (46.67ns/i)

Comparison:
         exist?(str):  21425527.1 i/s
         exist?(reg):  15183679.9 i/s - 1.41x  slower
    check_until(str):  12684053.7 i/s - 1.69x  slower
    check_until(reg):  10297134.8 i/s - 2.08x  slower
```

### JRuby
```
$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     7.646M i/s -      7.649M times in 1.000381s (130.78ns/i)
    check_until(str)    13.075M i/s -     13.010M times in 0.995048s (76.48ns/i)
         exist?(reg)     8.728M i/s -      8.684M times in 0.994921s (114.57ns/i)
         exist?(str)    20.609M i/s -     20.514M times in 0.995399s (48.52ns/i)
Calculating -------------------------------------
    check_until(reg)     9.371M i/s -     22.939M times in 2.447900s (106.71ns/i)
    check_until(str)    22.760M i/s -     39.225M times in 1.723414s (43.94ns/i)
         exist?(reg)    11.758M i/s -     26.185M times in 2.226997s (85.05ns/i)
         exist?(str)    34.564M i/s -     61.827M times in 1.788749s (28.93ns/i)

Comparison:
         exist?(str):  34564306.2 i/s
    check_until(str):  22759878.4 i/s - 1.52x  slower
         exist?(reg):  11757927.4 i/s - 2.94x  slower
    check_until(reg):   9371009.3 i/s - 3.69x  slower
```
@naitoh naitoh force-pushed the add_full_benchmark branch from 056c258 to 085505d Compare November 3, 2024 14:21
@naitoh
Copy link
Contributor Author

naitoh commented Nov 3, 2024

Do we need to compare check and check_until (match? and exist?)?

It does not seem necessary to compare check and check_until.
I split it into two files.

I think that JRuby's unstable results show a different problem. We may need to use more long target string (and/or pattern) to make the target operations the main operation in the benchmark.

This one has been left as is because I could not find a good solution.

@kou kou merged commit 81a80a1 into ruby:master Nov 3, 2024
37 checks passed
@kou
Copy link
Member

kou commented Nov 3, 2024

Thanks.

@naitoh naitoh deleted the add_full_benchmark branch November 4, 2024 06:49
matzbot pushed a commit to ruby/ruby that referenced this pull request Nov 27, 2024
(ruby/strscan#111)

# Why?
To improve the parsing process, I would like to add benchmarks for all
parsing processes.

## scan
- scan_full(regexp, false, true) == StringScanner#check
- scan_full(regexp, false, false) ==  StringScanner#match?

### CRuby

```
$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)    10.558M i/s -     10.848M times in 1.027445s (94.71ns/i)
          check(str)    13.368M i/s -     13.782M times in 1.030978s (74.80ns/i)
         match?(reg)    16.080M i/s -     16.247M times in 1.010340s (62.19ns/i)
         match?(str)    23.336M i/s -     23.501M times in 1.007088s (42.85ns/i)
Calculating -------------------------------------
          check(reg)    11.601M i/s -     31.675M times in 2.730287s (86.20ns/i)
          check(str)    15.217M i/s -     40.104M times in 2.635475s (65.72ns/i)
         match?(reg)    18.781M i/s -     48.241M times in 2.568662s (53.25ns/i)
         match?(str)    29.441M i/s -     70.007M times in 2.377840s (33.97ns/i)

Comparison:
         match?(str):  29441324.5 i/s
         match?(reg):  18780543.7 i/s - 1.57x  slower
          check(str):  15217130.1 i/s - 1.93x  slower
          check(reg):  11601371.2 i/s - 2.54x  slower
```
### JRuby

```
$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)     8.129M i/s -      8.090M times in 0.995222s (123.02ns/i)
          check(str)    16.691M i/s -     16.616M times in 0.995519s (59.91ns/i)
         match?(reg)     8.979M i/s -      9.001M times in 1.002440s (111.37ns/i)
         match?(str)    26.138M i/s -     26.011M times in 0.995150s (38.26ns/i)
Calculating -------------------------------------
          check(reg)    11.808M i/s -     24.387M times in 2.065238s (84.69ns/i)
          check(str)    31.762M i/s -     50.072M times in 1.576495s (31.48ns/i)
         match?(reg)    13.944M i/s -     26.936M times in 1.931719s (71.71ns/i)
         match?(str)    50.872M i/s -     78.414M times in 1.541392s (19.66ns/i)

Comparison:
         match?(str):  50872250.2 i/s
          check(str):  31761544.3 i/s - 1.60x  slower
         match?(reg):  13944219.6 i/s - 3.65x  slower
          check(reg):  11808244.1 i/s - 4.31x  slower
```

## search
- search_full(regexp, false, true) == StringScanner#check_until
- search_full(regexp, false, false) == StringScanner#exist?
```
$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     9.338M i/s -      9.456M times in 1.012573s (107.09ns/i)
    check_until(str)    11.385M i/s -     11.979M times in 1.052173s (87.83ns/i)
         exist?(reg)    13.416M i/s -     13.517M times in 1.007532s (74.54ns/i)
         exist?(str)    17.976M i/s -     18.677M times in 1.038981s (55.63ns/i)
Calculating -------------------------------------
    check_until(reg)    10.297M i/s -     28.015M times in 2.720634s (97.11ns/i)
    check_until(str)    12.684M i/s -     34.156M times in 2.692853s (78.84ns/i)
         exist?(reg)    15.184M i/s -     40.249M times in 2.650786s (65.86ns/i)
         exist?(str)    21.426M i/s -     53.928M times in 2.517008s (46.67ns/i)

Comparison:
         exist?(str):  21425527.1 i/s
         exist?(reg):  15183679.9 i/s - 1.41x  slower
    check_until(str):  12684053.7 i/s - 1.69x  slower
    check_until(reg):  10297134.8 i/s - 2.08x  slower
```

### JRuby
```
$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     7.646M i/s -      7.649M times in 1.000381s (130.78ns/i)
    check_until(str)    13.075M i/s -     13.010M times in 0.995048s (76.48ns/i)
         exist?(reg)     8.728M i/s -      8.684M times in 0.994921s (114.57ns/i)
         exist?(str)    20.609M i/s -     20.514M times in 0.995399s (48.52ns/i)
Calculating -------------------------------------
    check_until(reg)     9.371M i/s -     22.939M times in 2.447900s (106.71ns/i)
    check_until(str)    22.760M i/s -     39.225M times in 1.723414s (43.94ns/i)
         exist?(reg)    11.758M i/s -     26.185M times in 2.226997s (85.05ns/i)
         exist?(str)    34.564M i/s -     61.827M times in 1.788749s (28.93ns/i)

Comparison:
         exist?(str):  34564306.2 i/s
    check_until(str):  22759878.4 i/s - 1.52x  slower
         exist?(reg):  11757927.4 i/s - 2.94x  slower
    check_until(reg):   9371009.3 i/s - 3.69x  slower
```

ruby/strscan@81a80a176b
kou pushed a commit to ruby/rexml that referenced this pull request Dec 19, 2024
…ntil(string)` (#226)

## Why?
`StringScanner#check_until(string)` is faster than
`StringScanner#check_until(regex)`.

See:
- ruby/strscan#106
- ruby/strscan#111

## Benchmark
```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     19.459      19.840        35.035       35.786 i/s -     100.000 times in 5.139034s 5.040369s 2.854304s 2.794367s
                 sax     30.057      30.026        52.986       53.716 i/s -     100.000 times in 3.326998s 3.330499s 1.887303s 1.861652s
                pull     33.777      34.415        62.294       64.020 i/s -     100.000 times in 2.960622s 2.905668s 1.605284s 1.562002s
              stream     33.789      34.003        60.174       60.411 i/s -     100.000 times in 2.959521s 2.940916s 1.661845s 1.655334s

Comparison:
                              dom
         after(YJIT):        35.8 i/s
        before(YJIT):        35.0 i/s - 1.02x  slower
               after:        19.8 i/s - 1.80x  slower
              before:        19.5 i/s - 1.84x  slower

                              sax
         after(YJIT):        53.7 i/s
        before(YJIT):        53.0 i/s - 1.01x  slower
              before:        30.1 i/s - 1.79x  slower
               after:        30.0 i/s - 1.79x  slower

                             pull
         after(YJIT):        64.0 i/s
        before(YJIT):        62.3 i/s - 1.03x  slower
               after:        34.4 i/s - 1.86x  slower
              before:        33.8 i/s - 1.90x  slower

                           stream
         after(YJIT):        60.4 i/s
        before(YJIT):        60.2 i/s - 1.00x  slower
               after:        34.0 i/s - 1.78x  slower
              before:        33.8 i/s - 1.79x  slower

```

- YJIT=ON : 1.00x - 1.03x faster
- YJIT=OFF : 1.00x - 1.02x faster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants