parser.c: optimize json_string_unescape #894

samyron · 2025-11-16T03:21:49Z

This PR optimizes json_string_unescape.

Two commits:

Use ARM Neon to scan for \. While scanning, copy the current chunk to the output.
Add a fast path when unescaping a single character.

If this PR is accepted, I will follow up with an SSE2 implementation.

Benchmarks

Run on a Macbook Air M1.

twitterescaped.json is from simdjson-data.

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.103k i/100ms
Calculating -------------------------------------
               after     11.143k (± 0.8%) i/s   (89.74 μs/i) -     56.253k in   5.048516s

Comparison:
              before:    10366.8 i/s
               after:    11143.2 i/s - 1.07x  faster

== Parsing twitterescaped.json (562408 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    73.000 i/100ms
Calculating -------------------------------------
               after    737.341 (± 0.9%) i/s    (1.36 ms/i) -      3.723k in   5.049667s

Comparison:
              before:      712.1 i/s
               after:      737.3 i/s - 1.04x  faster

I should note that the fast path for unescaping a single character accounts for about 1% of the speed increase in activitypub.json. It's pretty minor.

byroot · 2025-11-22T13:45:33Z

Sorry for the delay, I just started a new work and was busy.

This PR is interesting, but while reviewing it, it gave me another idea: #902

We already find the \ during parsing, so we could actually record them to pass them to the decoder. Of course there is a space tradeoff, but that's the idea.

With a handcrafted benchmark:

benchmark_parsing "some_unescape", JSON.dump([((" "*100) + "\n")*15])
benchmark_parsing "more_unescape", JSON.dump([((" "*100) + "\n")*30])

My PR perform significantly better:

== Parsing some_unescape (1534 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   252.233k i/100ms
Calculating -------------------------------------
               after      2.619M (± 0.6%) i/s  (381.76 ns/i) -     13.116M in   5.007434s

Comparison:
              before:  3159184.8 i/s
               after:  2619427.0 i/s - 1.21x  slower


== Parsing more_unescape (3064 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   153.779k i/100ms
Calculating -------------------------------------
               after      1.579M (± 0.7%) i/s  (633.23 ns/i) -      7.997M in   5.063882s

Comparison:
              before:  1796951.2 i/s
               after:  1579212.5 i/s - 1.14x  slower

(after is your branch, before is #902).

So I think I'll go with #902

byroot force-pushed the sm/string-unescape-neon branch from 864ef5b to 9535e8a Compare November 22, 2025 08:46

Use ARM NEON instructions to accelerate unescaping strings.

b42e968

byroot force-pushed the sm/string-unescape-neon branch from 9535e8a to b42e968 Compare November 22, 2025 13:25

byroot mentioned this pull request Nov 22, 2025

parser.c: Record escape positions while parsing #902

Merged

byroot closed this in #902 Nov 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parser.c: optimize json_string_unescape #894

parser.c: optimize json_string_unescape #894

samyron commented Nov 16, 2025 •

edited

Loading

Uh oh!

byroot commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

parser.c: optimize json_string_unescape #894

parser.c: optimize json_string_unescape #894

Conversation

samyron commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

byroot commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samyron commented Nov 16, 2025 •

edited

Loading