Use batch APIs to create Array and Hash objects #678

byroot · 2024-11-03T12:56:17Z

Naively appending elements into RArray or RHash is inneficient because it might cause multiple reallocations and rehasing.

So it's preferable to accumulate all the elements onto a stack, and then use batch APIs to directly create right sized containers.

TODO:

Figure out the GC bug. We're not properly marking the rvalue_stack yet.
Restore support for create_additions, I just commented it out for now.

Before:

== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   779.000 i/100ms
                  oj   799.000 i/100ms
          Oj::Parser   953.000 i/100ms
           rapidjson   630.000 i/100ms
Calculating -------------------------------------
                json      7.989k (± 0.7%) i/s  (125.17 μs/i) -     40.508k in   5.070571s
                  oj      7.931k (± 1.8%) i/s  (126.09 μs/i) -     39.950k in   5.039171s
          Oj::Parser      9.624k (± 0.7%) i/s  (103.91 μs/i) -     48.603k in   5.050694s
           rapidjson      6.287k (± 0.3%) i/s  (159.05 μs/i) -     31.500k in   5.010181s

Comparison:
                json:     7989.2 i/s
          Oj::Parser:     9623.6 i/s - 1.20x  faster
                  oj:     7930.8 i/s - same-ish: difference falls within error
           rapidjson:     6287.3 i/s - 1.27x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    66.000 i/100ms
                  oj    62.000 i/100ms
          Oj::Parser    78.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    673.530 (± 0.7%) i/s    (1.48 ms/i) -      3.432k in   5.095837s
                  oj    620.473 (± 0.5%) i/s    (1.61 ms/i) -      3.162k in   5.096259s
          Oj::Parser    767.687 (± 0.9%) i/s    (1.30 ms/i) -      3.900k in   5.080601s
           rapidjson    553.048 (± 1.1%) i/s    (1.81 ms/i) -      2.805k in   5.072525s

Comparison:
                json:      673.5 i/s
          Oj::Parser:      767.7 i/s - 1.14x  faster
                  oj:      620.5 i/s - 1.09x  slower
           rapidjson:      553.0 i/s - 1.22x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    38.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    47.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    381.312 (± 0.5%) i/s    (2.62 ms/i) -      1.938k in   5.082614s
                  oj    328.735 (± 2.1%) i/s    (3.04 ms/i) -      1.666k in   5.070407s
          Oj::Parser    458.938 (± 0.9%) i/s    (2.18 ms/i) -      2.303k in   5.018529s
           rapidjson    376.744 (± 1.3%) i/s    (2.65 ms/i) -      1.900k in   5.044113s

Comparison:
                json:      381.3 i/s
          Oj::Parser:      458.9 i/s - 1.20x  faster
           rapidjson:      376.7 i/s - same-ish: difference falls within error
                  oj:      328.7 i/s - 1.16x  slower

After:

== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   960.000 i/100ms
                  oj   796.000 i/100ms
          Oj::Parser   969.000 i/100ms
           rapidjson   636.000 i/100ms
Calculating -------------------------------------
                json      8.957k (± 0.5%) i/s  (111.65 μs/i) -     45.120k in   5.037777s
                  oj      7.966k (± 0.5%) i/s  (125.53 μs/i) -     40.596k in   5.096207s
          Oj::Parser      9.579k (± 0.3%) i/s  (104.39 μs/i) -     48.450k in   5.057822s
           rapidjson      6.261k (± 8.9%) i/s  (159.73 μs/i) -     31.800k in   5.182342s

Comparison:
                json:     8956.5 i/s
          Oj::Parser:     9579.3 i/s - 1.07x  faster
                  oj:     7966.2 i/s - 1.12x  slower
           rapidjson:     6260.6 i/s - 1.43x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    82.000 i/100ms
                  oj    62.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    803.998 (± 0.6%) i/s    (1.24 ms/i) -      4.100k in   5.099692s
                  oj    608.292 (± 0.8%) i/s    (1.64 ms/i) -      3.100k in   5.096566s
          Oj::Parser    760.206 (± 0.5%) i/s    (1.32 ms/i) -      3.850k in   5.064529s
           rapidjson    549.562 (± 0.5%) i/s    (1.82 ms/i) -      2.750k in   5.004166s

Comparison:
                json:      804.0 i/s
          Oj::Parser:      760.2 i/s - 1.06x  slower
                  oj:      608.3 i/s - 1.32x  slower
           rapidjson:      549.6 i/s - 1.46x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    43.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    47.000 i/100ms
           rapidjson    36.000 i/100ms
Calculating -------------------------------------
                json    447.336 (± 0.9%) i/s    (2.24 ms/i) -      2.279k in   5.094945s
                  oj    336.266 (± 2.4%) i/s    (2.97 ms/i) -      1.700k in   5.058625s
          Oj::Parser    466.559 (± 1.3%) i/s    (2.14 ms/i) -      2.350k in   5.037637s
           rapidjson    392.039 (± 0.8%) i/s    (2.55 ms/i) -      1.980k in   5.050826s

Comparison:
                json:      447.3 i/s
          Oj::Parser:      466.6 i/s - 1.04x  faster
           rapidjson:      392.0 i/s - 1.14x  slower
                  oj:      336.3 i/s - 1.33x  slower

byroot · 2024-11-03T18:35:48Z

ext/json/ext/parser/parser.rl

@@ -173,6 +200,100 @@ static VALUE rsymbol_cache_fetch(rvalue_cache *cache, const char *str, const lon
    return rsymbol;
 }

+/* rvalue stack */
+
+#define RVALUE_STACK_INITIAL_CAPA 128


I can squeeze some more perf by bumping the initial capacity, but the concern is that it might overflow the stack.

activitypub.json goes to 93 deep., ctim_catalog.json to 390, and twitter.json to 239.

For most users it would be fine to allocate this much on the stack given most modern system have a 8MB stack, but I fear people using alpine/musl might run into trouble given the default stack size there is only 128KB.

If we were to ROFLscale this to 512 entries, that would use 4KiB on the stack. In addition with 512B from the initial fbuffer and ~120B for the JSON_Parser struct, that may be a bit much.

Actually, the difference is really minimal, so I don't think it's really worth it.

Naively appending elements into RArray or RHash is inneficient because it might cause multiple reallocations and rehasing. So it's preferable to accumulate all the elements onto a stack, and then use batch APIs to directly create right sized containers. Before: ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 779.000 i/100ms oj 799.000 i/100ms Oj::Parser 953.000 i/100ms rapidjson 630.000 i/100ms Calculating ------------------------------------- json 7.989k (± 0.7%) i/s (125.17 μs/i) - 40.508k in 5.070571s oj 7.931k (± 1.8%) i/s (126.09 μs/i) - 39.950k in 5.039171s Oj::Parser 9.624k (± 0.7%) i/s (103.91 μs/i) - 48.603k in 5.050694s rapidjson 6.287k (± 0.3%) i/s (159.05 μs/i) - 31.500k in 5.010181s Comparison: json: 7989.2 i/s Oj::Parser: 9623.6 i/s - 1.20x faster oj: 7930.8 i/s - same-ish: difference falls within error rapidjson: 6287.3 i/s - 1.27x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 66.000 i/100ms oj 62.000 i/100ms Oj::Parser 78.000 i/100ms rapidjson 55.000 i/100ms Calculating ------------------------------------- json 673.530 (± 0.7%) i/s (1.48 ms/i) - 3.432k in 5.095837s oj 620.473 (± 0.5%) i/s (1.61 ms/i) - 3.162k in 5.096259s Oj::Parser 767.687 (± 0.9%) i/s (1.30 ms/i) - 3.900k in 5.080601s rapidjson 553.048 (± 1.1%) i/s (1.81 ms/i) - 2.805k in 5.072525s Comparison: json: 673.5 i/s Oj::Parser: 767.7 i/s - 1.14x faster oj: 620.5 i/s - 1.09x slower rapidjson: 553.0 i/s - 1.22x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 38.000 i/100ms oj 34.000 i/100ms Oj::Parser 47.000 i/100ms rapidjson 38.000 i/100ms Calculating ------------------------------------- json 381.312 (± 0.5%) i/s (2.62 ms/i) - 1.938k in 5.082614s oj 328.735 (± 2.1%) i/s (3.04 ms/i) - 1.666k in 5.070407s Oj::Parser 458.938 (± 0.9%) i/s (2.18 ms/i) - 2.303k in 5.018529s rapidjson 376.744 (± 1.3%) i/s (2.65 ms/i) - 1.900k in 5.044113s Comparison: json: 381.3 i/s Oj::Parser: 458.9 i/s - 1.20x faster rapidjson: 376.7 i/s - same-ish: difference falls within error oj: 328.7 i/s - 1.16x slower ``` After: ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 960.000 i/100ms oj 796.000 i/100ms Oj::Parser 969.000 i/100ms rapidjson 636.000 i/100ms Calculating ------------------------------------- json 8.957k (± 0.5%) i/s (111.65 μs/i) - 45.120k in 5.037777s oj 7.966k (± 0.5%) i/s (125.53 μs/i) - 40.596k in 5.096207s Oj::Parser 9.579k (± 0.3%) i/s (104.39 μs/i) - 48.450k in 5.057822s rapidjson 6.261k (± 8.9%) i/s (159.73 μs/i) - 31.800k in 5.182342s Comparison: json: 8956.5 i/s Oj::Parser: 9579.3 i/s - 1.07x faster oj: 7966.2 i/s - 1.12x slower rapidjson: 6260.6 i/s - 1.43x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 82.000 i/100ms oj 62.000 i/100ms Oj::Parser 77.000 i/100ms rapidjson 55.000 i/100ms Calculating ------------------------------------- json 803.998 (± 0.6%) i/s (1.24 ms/i) - 4.100k in 5.099692s oj 608.292 (± 0.8%) i/s (1.64 ms/i) - 3.100k in 5.096566s Oj::Parser 760.206 (± 0.5%) i/s (1.32 ms/i) - 3.850k in 5.064529s rapidjson 549.562 (± 0.5%) i/s (1.82 ms/i) - 2.750k in 5.004166s Comparison: json: 804.0 i/s Oj::Parser: 760.2 i/s - 1.06x slower oj: 608.3 i/s - 1.32x slower rapidjson: 549.6 i/s - 1.46x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 43.000 i/100ms oj 34.000 i/100ms Oj::Parser 47.000 i/100ms rapidjson 36.000 i/100ms Calculating ------------------------------------- json 447.336 (± 0.9%) i/s (2.24 ms/i) - 2.279k in 5.094945s oj 336.266 (± 2.4%) i/s (2.97 ms/i) - 1.700k in 5.058625s Oj::Parser 466.559 (± 1.3%) i/s (2.14 ms/i) - 2.350k in 5.037637s rapidjson 392.039 (± 0.8%) i/s (2.55 ms/i) - 1.980k in 5.050826s Comparison: json: 447.3 i/s Oj::Parser: 466.6 i/s - 1.04x faster rapidjson: 392.0 i/s - 1.14x slower oj: 336.3 i/s - 1.33x slower ```

Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container.

Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container. The original action stack remains, as we need to keep track of what we're currently parsing and how big it is. But for the recursive cases, we no longer need to create a child stack.

This comment was marked as outdated.

Sign in to view

byroot mentioned this pull request Nov 3, 2024

rb_hash_bulk_insert() is not exposed in ruby.h oracle/truffleruby#3705

Closed

byroot commented Nov 3, 2024

View reviewed changes

byroot force-pushed the rvalue-stack branch from b2eca41 to d0d4c1d Compare November 3, 2024 22:11

byroot marked this pull request as ready for review November 3, 2024 22:12

byroot merged commit 2625e8c into ruby:master Nov 3, 2024
36 checks passed

byroot deleted the rvalue-stack branch November 3, 2024 22:18

casperisfine mentioned this pull request Nov 8, 2024

Use batch APIs to create arrays and hashes msgpack/msgpack-ruby#370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use batch APIs to create Array and Hash objects #678

Use batch APIs to create Array and Hash objects #678

byroot commented Nov 3, 2024 •

edited

Loading

This comment was marked as outdated.

byroot Nov 3, 2024

byroot Nov 3, 2024

Use batch APIs to create Array and Hash objects #678

Use batch APIs to create Array and Hash objects #678

Conversation

byroot commented Nov 3, 2024 • edited Loading

This comment was marked as outdated.

byroot Nov 3, 2024

Choose a reason for hiding this comment

byroot Nov 3, 2024

Choose a reason for hiding this comment

byroot commented Nov 3, 2024 •

edited

Loading