Skip to content

Commit 52a255b

Browse files
naitohkou
andcommitted
Optimize BaseParser#unnormalize method to replace "\r\n" with "\n" only when "\r\n" is included
## Why? See: ruby#158 (comment) ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.3/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 17.674 17.567 32.759 32.316 i/s - 100.000 times in 5.657973s 5.692371s 3.052595s 3.094448s sax 25.261 25.377 48.889 49.911 i/s - 100.000 times in 3.958626s 3.940640s 2.045460s 2.003575s pull 28.968 29.121 61.584 61.774 i/s - 100.000 times in 3.452132s 3.433967s 1.623789s 1.618809s stream 28.395 28.803 55.289 57.970 i/s - 100.000 times in 3.521761s 3.471812s 1.808673s 1.725029s Comparison: dom before(YJIT): 32.8 i/s after(YJIT): 32.3 i/s - 1.01x slower before: 17.7 i/s - 1.85x slower after: 17.6 i/s - 1.86x slower sax after(YJIT): 49.9 i/s before(YJIT): 48.9 i/s - 1.02x slower after: 25.4 i/s - 1.97x slower before: 25.3 i/s - 1.98x slower pull after(YJIT): 61.8 i/s before(YJIT): 61.6 i/s - 1.00x slower after: 29.1 i/s - 2.12x slower before: 29.0 i/s - 2.13x slower stream after(YJIT): 58.0 i/s before(YJIT): 55.3 i/s - 1.05x slower after: 28.8 i/s - 2.01x slower before: 28.4 i/s - 2.04x slower ``` - YJIT=ON : 0.98x - 1.05x faster - YJIT=OFF : 0.98x - 1.02x faster --------- Co-authored-by: Sutou Kouhei <kou@clear-code.com>
1 parent 78b2913 commit 52a255b

File tree

2 files changed

+26
-1
lines changed

2 files changed

+26
-1
lines changed

lib/rexml/parsers/baseparser.rb

+5-1
Original file line numberDiff line numberDiff line change
@@ -511,7 +511,11 @@ def normalize( input, entities=nil, entity_filter=nil )
511511

512512
# Unescapes all possible entities
513513
def unnormalize( string, entities=nil, filter=nil )
514-
rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
514+
if string.include?("\r")
515+
rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
516+
else
517+
rv = string.dup
518+
end
515519
matches = rv.scan( REFERENCE_RE )
516520
return rv if matches.size == 0
517521
rv.gsub!( Private::CHARACTER_REFERENCES ) {

test/test_pullparser.rb

+21
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,27 @@ def test_character_references
8282
end
8383
end
8484

85+
def test_text_content_with_line_breaks
86+
source = "<root><a>A</a><b>B\n</b><c>C\r\n</c></root>"
87+
parser = REXML::Parsers::PullParser.new( source )
88+
89+
events = {}
90+
element_name = ''
91+
while parser.has_next?
92+
event = parser.pull
93+
case event.event_type
94+
when :start_element
95+
element_name = event[0]
96+
when :text
97+
events[element_name] = event[1]
98+
end
99+
end
100+
101+
assert_equal('A', events['a'])
102+
assert_equal("B\n", events['b'])
103+
assert_equal("C\n", events['c'])
104+
end
105+
85106
def test_peek_unshift
86107
source = "<a><b/></a>"
87108
REXML::Parsers::PullParser.new(source)

0 commit comments

Comments
 (0)