Benchmark of JSON Formats in Memory and Parquet File

Purpose

The repo benchmarks the encoding/decoding performance and storage/query efficiency in parquet files of a variety of json encoding formats. It covers:

JSON, json in plain string
JSONB
JSONC, a variant of tape representation
Variant, dynamically parse json as arrow struct

Six different kinds of metrics for each format are evaluated:

Encode time, time to read json strings and encode into in-memory representations
Decode time, time to decode the in-memory representations into json strings
Flush time, time to flush the in-memory representations into a parquet file
Load time, time to load jsons from parquet files to in-memory representations
Parquet file size and compression ratio
Query time

The JSON payloads are characterized by the following params:

Data types
Max nested layer
NDV, the number of distinct values, which is important for the compression methods in parquet
Num of lines, num of jsons encoded into a parquet file, say, rows in a row group

Results in Different Payloads

Payload 1: Events of opentelemetry traces:

Data Types: Float, String, Object

Max Nested Layer: 2

NDV: [0.1, 0.3, 0.6, 1.0]

Num of Lines: [1024, 8192, 100000]

Metric: [Parquet file size and compression ratio]

Example of a line:

{
  "name": "jHl2oDDnPc1i2OzlP5Y",
  "timestamp": "2024-07-25T04:33:11.369386Z", # timestamp is distinct for all lines
  "attributes": { "event_attributes": 48.28667666464348 }
}

Results:

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
benches		benches
examples		examples
figures		figures
json		json
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark of JSON Formats in Memory and Parquet File

Purpose

Results in Different Payloads

About

Releases

Packages

Languages

CookiePieWw/json-format-in-parquet-benchmark

Folders and files

Latest commit

History

Repository files navigation

Benchmark of JSON Formats in Memory and Parquet File

Purpose

Results in Different Payloads

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages