Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Refactored JSON writing (5-10x) #709

Merged
merged 4 commits into from
Dec 27, 2021
Merged

Refactored JSON writing (5-10x) #709

merged 4 commits into from
Dec 27, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Dec 24, 2021

This PR refactored the writing to JSON:

  1. allows writing one batch at the time thereby allowing stream writing
  2. changed the number of allocations from O(N) to O(1) by bypassing the serde_json::Value and use a design based on streaming-iterator
  3. decoupled IO-bound (write) from CPU-bound (serialize), thereby allowing for async and parallel serialization

This change is backward incompatible - the existing design to write to json was completely replaced by a new design based on IO- / CPU- bounded separatation to enable parallelism, async, and stream writing, so any code relying on the old API needs a full migration.

Benchmarks (see below) show 5-10x speedup and do not require all data to be available, which imo is sufficient grounds for this backward incompatible change

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Dec 24, 2021
@codecov
Copy link

codecov bot commented Dec 24, 2021

Codecov Report

Merging #709 (fa6886e) into main (dfa6370) will increase coverage by 0.28%.
The diff coverage is 84.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #709      +/-   ##
==========================================
+ Coverage   70.16%   70.44%   +0.28%     
==========================================
  Files         309      310       +1     
  Lines       16800    16759      -41     
==========================================
+ Hits        11787    11806      +19     
+ Misses       5013     4953      -60     
Impacted Files Coverage Δ
src/io/iterator.rs 71.42% <ø> (ø)
src/io/json/mod.rs 100.00% <ø> (ø)
src/util/bench_util.rs 0.00% <ø> (ø)
src/io/json/write/format.rs 66.66% <66.66%> (ø)
src/io/json/write/mod.rs 85.18% <85.18%> (ø)
src/io/json/write/serialize.rs 88.13% <88.13%> (+25.94%) ⬆️
src/compute/arithmetics/time.rs 26.60% <0.00%> (+0.91%) ⬆️
src/bitmap/utils/slice_iterator.rs 92.53% <0.00%> (+1.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfa6370...fa6886e. Read the comment docs.

@jorgecarleitao
Copy link
Owner Author

Performance improvement is 5-10x for the primitive/utf8 types and likely much more for nested types, since the number of allocations was scaling with O(N^depth) and now scales as O(depth)).

Heading out for today. Happy festivities everyone!

json write i32 2^10     time:   [79.852 us 80.011 us 80.270 us]                                
                        change: [-77.258% -77.085% -76.792%] (p = 0.00 < 0.05)
json write utf8 2^10    time:   [82.197 us 82.369 us 82.596 us]                                 
                        change: [-84.881% -84.830% -84.773%] (p = 0.00 < 0.05)
json write f64 2^10     time:   [116.00 us 116.25 us 116.57 us]                                
                        change: [-73.020% -72.898% -72.756%] (p = 0.00 < 0.05)
json write i32 2^12     time:   [317.74 us 318.17 us 318.74 us]                                
                        change: [-81.809% -81.631% -81.500%] (p = 0.00 < 0.05)
json write utf8 2^12    time:   [335.34 us 335.90 us 336.63 us]                                 
                        change: [-85.578% -85.514% -85.437%] (p = 0.00 < 0.05)
json write f64 2^12     time:   [471.23 us 471.85 us 472.64 us]                                
                        change: [-73.818% -73.696% -73.581%] (p = 0.00 < 0.05)
json write i32 2^14     time:   [1.3015 ms 1.3049 ms 1.3097 ms]                                 
                        change: [-87.439% -87.375% -87.298%] (p = 0.00 < 0.05)
json write utf8 2^14    time:   [2.9488 ms 2.9542 ms 2.9608 ms]                                  
                        change: [-78.847% -78.771% -78.693%] (p = 0.00 < 0.05)
json write f64 2^14     time:   [1.8420 ms 1.8445 ms 1.8471 ms]                                 
                        change: [-82.130% -82.045% -81.963%] (p = 0.00 < 0.05)
json write i32 2^16     time:   [5.1455 ms 5.1558 ms 5.1717 ms]                                 
                        change: [-89.493% -89.457% -89.419%] (p = 0.00 < 0.05)
json write utf8 2^16    time:   [10.611 ms 10.734 ms 10.892 ms]                                 
                        change: [-84.254% -84.053% -83.858%] (p = 0.00 < 0.05)
json write f64 2^16     time:   [7.5628 ms 7.5775 ms 7.5981 ms]                                
                        change: [-84.848% -84.799% -84.746%] (p = 0.00 < 0.05)
json write i32 2^18     time:   [21.193 ms 21.256 ms 21.330 ms]                                
                        change: [-89.739% -89.706% -89.672%] (p = 0.00 < 0.05)
json write utf8 2^18    time:   [35.795 ms 35.853 ms 35.923 ms]                                 
                        change: [-87.280% -87.253% -87.228%] (p = 0.00 < 0.05)
json write f64 2^18     time:   [30.404 ms 30.501 ms 30.617 ms]                                
                        change: [-85.406% -85.363% -85.307%] (p = 0.00 < 0.05)

@jorgecarleitao jorgecarleitao changed the title Refactored JSON writing Refactored JSON writing (5-10x) Dec 24, 2021
@jorgecarleitao jorgecarleitao added backwards-incompatible and removed enhancement An improvement to an existing feature labels Dec 25, 2021
@jorgecarleitao jorgecarleitao merged commit f07cc2c into main Dec 27, 2021
@jorgecarleitao jorgecarleitao deleted the json_write branch December 27, 2021 16:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant