Skip to content

Commit

Permalink
Use StringScanner#peek_byte to get double or single quotation mark
Browse files Browse the repository at this point in the history
## Why?
`StringScanner#peek_byte` is fast, because it does not generate String object.

## Benchmark
```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     19.753      19.888        35.641       35.928 i/s -     100.000 times in 5.062402s 5.028121s 2.805792s 2.783339s
                 sax     30.349      30.978        53.485       57.885 i/s -     100.000 times in 3.295012s 3.228103s 1.869671s 1.727567s
                pull     34.170      35.436        61.713       66.534 i/s -     100.000 times in 2.926534s 2.821955s 1.620404s 1.502996s
              stream     33.121      35.268        60.751       63.276 i/s -     100.000 times in 3.019222s 2.835443s 1.646065s 1.580374s

Comparison:
                              dom
         after(YJIT):        35.9 i/s
        before(YJIT):        35.6 i/s - 1.01x  slower
               after:        19.9 i/s - 1.81x  slower
              before:        19.8 i/s - 1.82x  slower

                              sax
         after(YJIT):        57.9 i/s
        before(YJIT):        53.5 i/s - 1.08x  slower
               after:        31.0 i/s - 1.87x  slower
              before:        30.3 i/s - 1.91x  slower

                             pull
         after(YJIT):        66.5 i/s
        before(YJIT):        61.7 i/s - 1.08x  slower
               after:        35.4 i/s - 1.88x  slower
              before:        34.2 i/s - 1.95x  slower

                           stream
         after(YJIT):        63.3 i/s
        before(YJIT):        60.8 i/s - 1.04x  slower
               after:        35.3 i/s - 1.79x  slower
              before:        33.1 i/s - 1.91x  slower

```
- YJIT=ON : 1.01x - 1.08x faster
- YJIT=OFF : 1.00x - 1.06x faster

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
naitoh and kou committed Dec 20, 2024
1 parent bb0bedd commit 4956690
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 2 deletions.
22 changes: 20 additions & 2 deletions lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -766,6 +766,25 @@ def process_instruction
[:processing_instruction, name, content]
end

if StringScanner::Version < "3.1.1"
def scan_quote
@source.match(/(['"])/, true)&.[](1)
end
else
def scan_quote
case @source.peek_byte
when 34 # '"'.ord
@source.scan_byte
'"'
when 39 # "'".ord
@source.scan_byte
"'"
else
nil
end
end
end

def parse_attributes(prefixes)
attributes = {}
expanded_names = {}
Expand All @@ -785,11 +804,10 @@ def parse_attributes(prefixes)
message = "Missing attribute equal: <#{name}>"
raise REXML::ParseException.new(message, @source)
end
unless match = @source.match(/(['"])/, true)
unless quote = scan_quote
message = "Missing attribute value start quote: <#{name}>"
raise REXML::ParseException.new(message, @source)
end
quote = match[1]
start_position = @source.position
value = @source.read_until(quote)
unless value.chomp!(quote)
Expand Down
8 changes: 8 additions & 0 deletions lib/rexml/source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,14 @@ def position=(pos)
@scanner.pos = pos
end

def peek_byte
@scanner.peek_byte
end

def scan_byte
@scanner.scan_byte
end

# @return true if the Source is exhausted
def empty?
@scanner.eos?
Expand Down

0 comments on commit 4956690

Please sign in to comment.