Skip to content

Commit 1d0c362

Browse files
authored
Optimize IOSource#read_until method (#210)
## Why? The result of `encode(term)` can be cached. ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 17.546 18.512 32.282 32.306 i/s - 100.000 times in 5.699323s 5.402026s 3.097658s 3.095448s sax 25.435 28.294 47.526 50.074 i/s - 100.000 times in 3.931613s 3.534310s 2.104122s 1.997057s pull 29.471 31.870 54.400 57.554 i/s - 100.000 times in 3.393211s 3.137793s 1.838222s 1.737494s stream 29.169 31.153 51.613 52.898 i/s - 100.000 times in 3.428318s 3.209941s 1.937508s 1.890424s Comparison: dom after(YJIT): 32.3 i/s before(YJIT): 32.3 i/s - 1.00x slower after: 18.5 i/s - 1.75x slower before: 17.5 i/s - 1.84x slower sax after(YJIT): 50.1 i/s before(YJIT): 47.5 i/s - 1.05x slower after: 28.3 i/s - 1.77x slower before: 25.4 i/s - 1.97x slower pull after(YJIT): 57.6 i/s before(YJIT): 54.4 i/s - 1.06x slower after: 31.9 i/s - 1.81x slower before: 29.5 i/s - 1.95x slower stream after(YJIT): 52.9 i/s before(YJIT): 51.6 i/s - 1.02x slower after: 31.2 i/s - 1.70x slower before: 29.2 i/s - 1.81x slower ``` - YJIT=ON : 1.00x - 1.06x faster - YJIT=OFF : 1.05x - 1.11x faster
1 parent 622011f commit 1d0c362

File tree

2 files changed

+36
-1
lines changed

2 files changed

+36
-1
lines changed

lib/rexml/source.rb

+2-1
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ def initialize(arg, encoding=nil)
7777
detect_encoding
7878
end
7979
@line = 0
80+
@term_encord = {}
8081
end
8182

8283
# The current buffer (what we're going to read next)
@@ -227,7 +228,7 @@ def read(term = nil, min_bytes = 1)
227228

228229
def read_until(term)
229230
pattern = Private::PRE_DEFINED_TERM_PATTERNS[term] || /#{Regexp.escape(term)}/
230-
term = encode(term)
231+
term = @term_encord[term] ||= encode(term)
231232
until str = @scanner.scan_until(pattern)
232233
break if @source.nil?
233234
break if @source.eof?

test/test_document.rb

+34
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,40 @@ def test_utf_16
403403
assert_equal(expected_xml, actual_xml)
404404
end
405405
end
406+
407+
class ReadUntilTest < Test::Unit::TestCase
408+
def test_utf_8
409+
xml = <<-EOX.force_encoding("ASCII-8BIT")
410+
<?xml version="1.0" encoding="UTF-8"?>
411+
<message testing=">">Hello world!</message>
412+
EOX
413+
document = REXML::Document.new(xml)
414+
assert_equal("UTF-8", document.encoding)
415+
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
416+
end
417+
418+
def test_utf_16le
419+
xml = <<-EOX.encode("UTF-16LE").force_encoding("ASCII-8BIT")
420+
<?xml version="1.0" encoding="UTF-16"?>
421+
<message testing=">">Hello world!</message>
422+
EOX
423+
bom = "\ufeff".encode("UTF-16LE").force_encoding("ASCII-8BIT")
424+
document = REXML::Document.new(bom + xml)
425+
assert_equal("UTF-16", document.encoding)
426+
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
427+
end
428+
429+
def test_utf_16be
430+
xml = <<-EOX.encode("UTF-16BE").force_encoding("ASCII-8BIT")
431+
<?xml version="1.0" encoding="UTF-16"?>
432+
<message testing=">">Hello world!</message>
433+
EOX
434+
bom = "\ufeff".encode("UTF-16BE").force_encoding("ASCII-8BIT")
435+
document = REXML::Document.new(bom + xml)
436+
assert_equal("UTF-16", document.encoding)
437+
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
438+
end
439+
end
406440
end
407441
end
408442
end

0 commit comments

Comments
 (0)