-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use batch APIs to create Array and Hash objects #678
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
@@ -173,6 +200,100 @@ static VALUE rsymbol_cache_fetch(rvalue_cache *cache, const char *str, const lon | |||
return rsymbol; | |||
} | |||
|
|||
/* rvalue stack */ | |||
|
|||
#define RVALUE_STACK_INITIAL_CAPA 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can squeeze some more perf by bumping the initial capacity, but the concern is that it might overflow the stack.
activitypub.json
goes to 93
deep., ctim_catalog.json
to 390
, and twitter.json
to 239
.
For most users it would be fine to allocate this much on the stack given most modern system have a 8MB stack, but I fear people using alpine/musl
might run into trouble given the default stack size there is only 128KB
.
If we were to ROFLscale this to 512 entries, that would use 4KiB on the stack. In addition with 512B
from the initial fbuffer
and ~120B for the JSON_Parser
struct, that may be a bit much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the difference is really minimal, so I don't think it's really worth it.
Naively appending elements into RArray or RHash is inneficient because it might cause multiple reallocations and rehasing. So it's preferable to accumulate all the elements onto a stack, and then use batch APIs to directly create right sized containers. Before: ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 779.000 i/100ms oj 799.000 i/100ms Oj::Parser 953.000 i/100ms rapidjson 630.000 i/100ms Calculating ------------------------------------- json 7.989k (± 0.7%) i/s (125.17 μs/i) - 40.508k in 5.070571s oj 7.931k (± 1.8%) i/s (126.09 μs/i) - 39.950k in 5.039171s Oj::Parser 9.624k (± 0.7%) i/s (103.91 μs/i) - 48.603k in 5.050694s rapidjson 6.287k (± 0.3%) i/s (159.05 μs/i) - 31.500k in 5.010181s Comparison: json: 7989.2 i/s Oj::Parser: 9623.6 i/s - 1.20x faster oj: 7930.8 i/s - same-ish: difference falls within error rapidjson: 6287.3 i/s - 1.27x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 66.000 i/100ms oj 62.000 i/100ms Oj::Parser 78.000 i/100ms rapidjson 55.000 i/100ms Calculating ------------------------------------- json 673.530 (± 0.7%) i/s (1.48 ms/i) - 3.432k in 5.095837s oj 620.473 (± 0.5%) i/s (1.61 ms/i) - 3.162k in 5.096259s Oj::Parser 767.687 (± 0.9%) i/s (1.30 ms/i) - 3.900k in 5.080601s rapidjson 553.048 (± 1.1%) i/s (1.81 ms/i) - 2.805k in 5.072525s Comparison: json: 673.5 i/s Oj::Parser: 767.7 i/s - 1.14x faster oj: 620.5 i/s - 1.09x slower rapidjson: 553.0 i/s - 1.22x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 38.000 i/100ms oj 34.000 i/100ms Oj::Parser 47.000 i/100ms rapidjson 38.000 i/100ms Calculating ------------------------------------- json 381.312 (± 0.5%) i/s (2.62 ms/i) - 1.938k in 5.082614s oj 328.735 (± 2.1%) i/s (3.04 ms/i) - 1.666k in 5.070407s Oj::Parser 458.938 (± 0.9%) i/s (2.18 ms/i) - 2.303k in 5.018529s rapidjson 376.744 (± 1.3%) i/s (2.65 ms/i) - 1.900k in 5.044113s Comparison: json: 381.3 i/s Oj::Parser: 458.9 i/s - 1.20x faster rapidjson: 376.7 i/s - same-ish: difference falls within error oj: 328.7 i/s - 1.16x slower ``` After: ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 960.000 i/100ms oj 796.000 i/100ms Oj::Parser 969.000 i/100ms rapidjson 636.000 i/100ms Calculating ------------------------------------- json 8.957k (± 0.5%) i/s (111.65 μs/i) - 45.120k in 5.037777s oj 7.966k (± 0.5%) i/s (125.53 μs/i) - 40.596k in 5.096207s Oj::Parser 9.579k (± 0.3%) i/s (104.39 μs/i) - 48.450k in 5.057822s rapidjson 6.261k (± 8.9%) i/s (159.73 μs/i) - 31.800k in 5.182342s Comparison: json: 8956.5 i/s Oj::Parser: 9579.3 i/s - 1.07x faster oj: 7966.2 i/s - 1.12x slower rapidjson: 6260.6 i/s - 1.43x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 82.000 i/100ms oj 62.000 i/100ms Oj::Parser 77.000 i/100ms rapidjson 55.000 i/100ms Calculating ------------------------------------- json 803.998 (± 0.6%) i/s (1.24 ms/i) - 4.100k in 5.099692s oj 608.292 (± 0.8%) i/s (1.64 ms/i) - 3.100k in 5.096566s Oj::Parser 760.206 (± 0.5%) i/s (1.32 ms/i) - 3.850k in 5.064529s rapidjson 549.562 (± 0.5%) i/s (1.82 ms/i) - 2.750k in 5.004166s Comparison: json: 804.0 i/s Oj::Parser: 760.2 i/s - 1.06x slower oj: 608.3 i/s - 1.32x slower rapidjson: 549.6 i/s - 1.46x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 43.000 i/100ms oj 34.000 i/100ms Oj::Parser 47.000 i/100ms rapidjson 36.000 i/100ms Calculating ------------------------------------- json 447.336 (± 0.9%) i/s (2.24 ms/i) - 2.279k in 5.094945s oj 336.266 (± 2.4%) i/s (2.97 ms/i) - 1.700k in 5.058625s Oj::Parser 466.559 (± 1.3%) i/s (2.14 ms/i) - 2.350k in 5.037637s rapidjson 392.039 (± 0.8%) i/s (2.55 ms/i) - 1.980k in 5.050826s Comparison: json: 447.3 i/s Oj::Parser: 466.6 i/s - 1.04x faster rapidjson: 392.0 i/s - 1.14x slower oj: 336.3 i/s - 1.33x slower ```
Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container.
Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container. The original action stack remains, as we need to keep track of what we're currently parsing and how big it is. But for the recursive cases, we no longer need to create a child stack.
Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container. The original action stack remains, as we need to keep track of what we're currently parsing and how big it is. But for the recursive cases, we no longer need to create a child stack.
Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container. The original action stack remains, as we need to keep track of what we're currently parsing and how big it is. But for the recursive cases, we no longer need to create a child stack.
Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container. The original action stack remains, as we need to keep track of what we're currently parsing and how big it is. But for the recursive cases, we no longer need to create a child stack.
Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container. The original action stack remains, as we need to keep track of what we're currently parsing and how big it is. But for the recursive cases, we no longer need to create a child stack.
Basically a port of ruby/json#678 Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container. The original action stack remains, as we need to keep track of what we're currently parsing and how big it is. But for the recursive cases, we no longer need to create a child stack.
Naively appending elements into RArray or RHash is inneficient because it might cause multiple reallocations and rehasing.
So it's preferable to accumulate all the elements onto a stack, and then use batch APIs to directly create right sized containers.
TODO:
create_additions
, I just commented it out for now.Before:
After: