Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use batch APIs to create Array and Hash objects #678

Merged
merged 1 commit into from
Nov 3, 2024

Conversation

byroot
Copy link
Member

@byroot byroot commented Nov 3, 2024

Naively appending elements into RArray or RHash is inneficient because it might cause multiple reallocations and rehasing.

So it's preferable to accumulate all the elements onto a stack, and then use batch APIs to directly create right sized containers.

TODO:

  • Figure out the GC bug. We're not properly marking the rvalue_stack yet.
  • Restore support for create_additions, I just commented it out for now.

Before:

== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   779.000 i/100ms
                  oj   799.000 i/100ms
          Oj::Parser   953.000 i/100ms
           rapidjson   630.000 i/100ms
Calculating -------------------------------------
                json      7.989k (± 0.7%) i/s  (125.17 μs/i) -     40.508k in   5.070571s
                  oj      7.931k (± 1.8%) i/s  (126.09 μs/i) -     39.950k in   5.039171s
          Oj::Parser      9.624k (± 0.7%) i/s  (103.91 μs/i) -     48.603k in   5.050694s
           rapidjson      6.287k (± 0.3%) i/s  (159.05 μs/i) -     31.500k in   5.010181s

Comparison:
                json:     7989.2 i/s
          Oj::Parser:     9623.6 i/s - 1.20x  faster
                  oj:     7930.8 i/s - same-ish: difference falls within error
           rapidjson:     6287.3 i/s - 1.27x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    66.000 i/100ms
                  oj    62.000 i/100ms
          Oj::Parser    78.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    673.530 (± 0.7%) i/s    (1.48 ms/i) -      3.432k in   5.095837s
                  oj    620.473 (± 0.5%) i/s    (1.61 ms/i) -      3.162k in   5.096259s
          Oj::Parser    767.687 (± 0.9%) i/s    (1.30 ms/i) -      3.900k in   5.080601s
           rapidjson    553.048 (± 1.1%) i/s    (1.81 ms/i) -      2.805k in   5.072525s

Comparison:
                json:      673.5 i/s
          Oj::Parser:      767.7 i/s - 1.14x  faster
                  oj:      620.5 i/s - 1.09x  slower
           rapidjson:      553.0 i/s - 1.22x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    38.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    47.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    381.312 (± 0.5%) i/s    (2.62 ms/i) -      1.938k in   5.082614s
                  oj    328.735 (± 2.1%) i/s    (3.04 ms/i) -      1.666k in   5.070407s
          Oj::Parser    458.938 (± 0.9%) i/s    (2.18 ms/i) -      2.303k in   5.018529s
           rapidjson    376.744 (± 1.3%) i/s    (2.65 ms/i) -      1.900k in   5.044113s

Comparison:
                json:      381.3 i/s
          Oj::Parser:      458.9 i/s - 1.20x  faster
           rapidjson:      376.7 i/s - same-ish: difference falls within error
                  oj:      328.7 i/s - 1.16x  slower

After:

== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   960.000 i/100ms
                  oj   796.000 i/100ms
          Oj::Parser   969.000 i/100ms
           rapidjson   636.000 i/100ms
Calculating -------------------------------------
                json      8.957k (± 0.5%) i/s  (111.65 μs/i) -     45.120k in   5.037777s
                  oj      7.966k (± 0.5%) i/s  (125.53 μs/i) -     40.596k in   5.096207s
          Oj::Parser      9.579k (± 0.3%) i/s  (104.39 μs/i) -     48.450k in   5.057822s
           rapidjson      6.261k (± 8.9%) i/s  (159.73 μs/i) -     31.800k in   5.182342s

Comparison:
                json:     8956.5 i/s
          Oj::Parser:     9579.3 i/s - 1.07x  faster
                  oj:     7966.2 i/s - 1.12x  slower
           rapidjson:     6260.6 i/s - 1.43x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    82.000 i/100ms
                  oj    62.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    803.998 (± 0.6%) i/s    (1.24 ms/i) -      4.100k in   5.099692s
                  oj    608.292 (± 0.8%) i/s    (1.64 ms/i) -      3.100k in   5.096566s
          Oj::Parser    760.206 (± 0.5%) i/s    (1.32 ms/i) -      3.850k in   5.064529s
           rapidjson    549.562 (± 0.5%) i/s    (1.82 ms/i) -      2.750k in   5.004166s

Comparison:
                json:      804.0 i/s
          Oj::Parser:      760.2 i/s - 1.06x  slower
                  oj:      608.3 i/s - 1.32x  slower
           rapidjson:      549.6 i/s - 1.46x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    43.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    47.000 i/100ms
           rapidjson    36.000 i/100ms
Calculating -------------------------------------
                json    447.336 (± 0.9%) i/s    (2.24 ms/i) -      2.279k in   5.094945s
                  oj    336.266 (± 2.4%) i/s    (2.97 ms/i) -      1.700k in   5.058625s
          Oj::Parser    466.559 (± 1.3%) i/s    (2.14 ms/i) -      2.350k in   5.037637s
           rapidjson    392.039 (± 0.8%) i/s    (2.55 ms/i) -      1.980k in   5.050826s

Comparison:
                json:      447.3 i/s
          Oj::Parser:      466.6 i/s - 1.04x  faster
           rapidjson:      392.0 i/s - 1.14x  slower
                  oj:      336.3 i/s - 1.33x  slower

@byroot

This comment was marked as outdated.

@@ -173,6 +200,100 @@ static VALUE rsymbol_cache_fetch(rvalue_cache *cache, const char *str, const lon
return rsymbol;
}

/* rvalue stack */

#define RVALUE_STACK_INITIAL_CAPA 128
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can squeeze some more perf by bumping the initial capacity, but the concern is that it might overflow the stack.

activitypub.json goes to 93 deep., ctim_catalog.json to 390, and twitter.json to 239.

For most users it would be fine to allocate this much on the stack given most modern system have a 8MB stack, but I fear people using alpine/musl might run into trouble given the default stack size there is only 128KB.

If we were to ROFLscale this to 512 entries, that would use 4KiB on the stack. In addition with 512B from the initial fbuffer and ~120B for the JSON_Parser struct, that may be a bit much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the difference is really minimal, so I don't think it's really worth it.

Naively appending elements into RArray or RHash is inneficient because
it might cause multiple reallocations and rehasing.

So it's preferable to accumulate all the elements onto a stack, and
then use batch APIs to directly create right sized containers.

Before:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   779.000 i/100ms
                  oj   799.000 i/100ms
          Oj::Parser   953.000 i/100ms
           rapidjson   630.000 i/100ms
Calculating -------------------------------------
                json      7.989k (± 0.7%) i/s  (125.17 μs/i) -     40.508k in   5.070571s
                  oj      7.931k (± 1.8%) i/s  (126.09 μs/i) -     39.950k in   5.039171s
          Oj::Parser      9.624k (± 0.7%) i/s  (103.91 μs/i) -     48.603k in   5.050694s
           rapidjson      6.287k (± 0.3%) i/s  (159.05 μs/i) -     31.500k in   5.010181s

Comparison:
                json:     7989.2 i/s
          Oj::Parser:     9623.6 i/s - 1.20x  faster
                  oj:     7930.8 i/s - same-ish: difference falls within error
           rapidjson:     6287.3 i/s - 1.27x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    66.000 i/100ms
                  oj    62.000 i/100ms
          Oj::Parser    78.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    673.530 (± 0.7%) i/s    (1.48 ms/i) -      3.432k in   5.095837s
                  oj    620.473 (± 0.5%) i/s    (1.61 ms/i) -      3.162k in   5.096259s
          Oj::Parser    767.687 (± 0.9%) i/s    (1.30 ms/i) -      3.900k in   5.080601s
           rapidjson    553.048 (± 1.1%) i/s    (1.81 ms/i) -      2.805k in   5.072525s

Comparison:
                json:      673.5 i/s
          Oj::Parser:      767.7 i/s - 1.14x  faster
                  oj:      620.5 i/s - 1.09x  slower
           rapidjson:      553.0 i/s - 1.22x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    38.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    47.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    381.312 (± 0.5%) i/s    (2.62 ms/i) -      1.938k in   5.082614s
                  oj    328.735 (± 2.1%) i/s    (3.04 ms/i) -      1.666k in   5.070407s
          Oj::Parser    458.938 (± 0.9%) i/s    (2.18 ms/i) -      2.303k in   5.018529s
           rapidjson    376.744 (± 1.3%) i/s    (2.65 ms/i) -      1.900k in   5.044113s

Comparison:
                json:      381.3 i/s
          Oj::Parser:      458.9 i/s - 1.20x  faster
           rapidjson:      376.7 i/s - same-ish: difference falls within error
                  oj:      328.7 i/s - 1.16x  slower
```

After:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   960.000 i/100ms
                  oj   796.000 i/100ms
          Oj::Parser   969.000 i/100ms
           rapidjson   636.000 i/100ms
Calculating -------------------------------------
                json      8.957k (± 0.5%) i/s  (111.65 μs/i) -     45.120k in   5.037777s
                  oj      7.966k (± 0.5%) i/s  (125.53 μs/i) -     40.596k in   5.096207s
          Oj::Parser      9.579k (± 0.3%) i/s  (104.39 μs/i) -     48.450k in   5.057822s
           rapidjson      6.261k (± 8.9%) i/s  (159.73 μs/i) -     31.800k in   5.182342s

Comparison:
                json:     8956.5 i/s
          Oj::Parser:     9579.3 i/s - 1.07x  faster
                  oj:     7966.2 i/s - 1.12x  slower
           rapidjson:     6260.6 i/s - 1.43x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    82.000 i/100ms
                  oj    62.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    803.998 (± 0.6%) i/s    (1.24 ms/i) -      4.100k in   5.099692s
                  oj    608.292 (± 0.8%) i/s    (1.64 ms/i) -      3.100k in   5.096566s
          Oj::Parser    760.206 (± 0.5%) i/s    (1.32 ms/i) -      3.850k in   5.064529s
           rapidjson    549.562 (± 0.5%) i/s    (1.82 ms/i) -      2.750k in   5.004166s

Comparison:
                json:      804.0 i/s
          Oj::Parser:      760.2 i/s - 1.06x  slower
                  oj:      608.3 i/s - 1.32x  slower
           rapidjson:      549.6 i/s - 1.46x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    43.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    47.000 i/100ms
           rapidjson    36.000 i/100ms
Calculating -------------------------------------
                json    447.336 (± 0.9%) i/s    (2.24 ms/i) -      2.279k in   5.094945s
                  oj    336.266 (± 2.4%) i/s    (2.97 ms/i) -      1.700k in   5.058625s
          Oj::Parser    466.559 (± 1.3%) i/s    (2.14 ms/i) -      2.350k in   5.037637s
           rapidjson    392.039 (± 0.8%) i/s    (2.55 ms/i) -      1.980k in   5.050826s

Comparison:
                json:      447.3 i/s
          Oj::Parser:      466.6 i/s - 1.04x  faster
           rapidjson:      392.0 i/s - 1.14x  slower
                  oj:      336.3 i/s - 1.33x  slower
```
@byroot byroot marked this pull request as ready for review November 3, 2024 22:12
@byroot byroot merged commit 2625e8c into ruby:master Nov 3, 2024
36 checks passed
@byroot byroot deleted the rvalue-stack branch November 3, 2024 22:18
casperisfine pushed a commit to Shopify/msgpack-ruby that referenced this pull request Nov 8, 2024
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.
casperisfine pushed a commit to Shopify/msgpack-ruby that referenced this pull request Nov 8, 2024
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.

The original action stack remains, as we need to keep track of what
we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.
casperisfine pushed a commit to Shopify/msgpack-ruby that referenced this pull request Nov 8, 2024
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.

The original action stack remains, as we need to keep track of what
we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.
casperisfine pushed a commit to Shopify/msgpack-ruby that referenced this pull request Nov 8, 2024
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.

The original action stack remains, as we need to keep track of what
we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.
casperisfine pushed a commit to Shopify/msgpack-ruby that referenced this pull request Nov 8, 2024
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.

The original action stack remains, as we need to keep track of what
we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.
casperisfine pushed a commit to Shopify/msgpack-ruby that referenced this pull request Nov 8, 2024
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.

The original action stack remains, as we need to keep track of what
we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.
casperisfine pushed a commit to Shopify/msgpack-ruby that referenced this pull request Nov 8, 2024
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.

The original action stack remains, as we need to keep track of what
we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant