Run dedicated attribute column benchmark on production data #2629

stoewer · 2023-07-10T04:36:28Z

Run a local benchmark (based on BenchmarkBackendBlockTraceQL). To get realistic results the benchmark uses large blocks from production instances and compares the TraceQL performance with and without dedicated attribute column assignments.

The benchmark results are added as comments to this issue.

The text was updated successfully, but these errors were encountered:

stoewer · 2023-07-12T12:10:09Z

Benchmark results

Block size: The benchmark was performed on a production block with compaction level 5. The size of the block with vParquet2 encoding was 6.07 GB. After converting the block to the vParquet3 schema the increased to 6.26 GB (3.01%). The size increases due to the added columns containing all null values.
After moving 10 resource and 10 span attributes to dedicated columns** the block size reduced to 4.93 GB (-18.74% compared to vParquet2).

Footer size: During the conversion from vParquet2 to vParquet3 with configured dedicated columns the footer size increased from 1090692 byte to 1577277 byte (44.61%).

Block meta size: Configuring dedicated columns increases the size of the meta.json from 401 byte to 1595 byte (298%). This will also impact the size of the index.json.gz. Probably less than the uncompressed meta.json as the dedicated columns definitions contain a lot of redundant information (type, scope).

The following benchmark compares vParquet2 vs vParquet3 with dedicated columns.

cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics     
                                                      │ bench-traceql-no-dedicated-cols.txt │ bench-traceql-with-dedicated-cols.txt │
                                                      │               sec/op                │     sec/op      vs base               │
DedicatedBlockTraceQL/res_attr_match-16                                       214.91m ± 10%      87.09m ± 5%  -59.48% (p=0.000 n=8)
DedicatedBlockTraceQL/res_attr_no_match-16                                    181.83m ±  4%      44.97m ± 1%  -75.27% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_res_attr_match-16                             159.81m ± 11%      46.95m ± 5%  -70.62% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_res_attr_no_match-16                          180.67m ±  2%      37.31m ± 2%  -79.35% (p=0.000 n=8)
DedicatedBlockTraceQL/span_attr_match-16                                      181.70m ±  9%      90.07m ± 6%  -50.43% (p=0.000 n=8)
DedicatedBlockTraceQL/span_attr_no_match-16                                   185.44m ±  4%      64.40m ± 2%  -65.27% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_span_attr_match-16                            131.70m ± 21%      52.84m ± 9%  -59.88% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_span_attr_no_match-16                         184.85m ±  3%      41.94m ± 3%  -77.31% (p=0.000 n=8)
geomean                                                                        176.1m            55.36m       -68.57%

** The dedicated attributes were chosen by analyzing the block and selecting the 10 attributes where the uncompressed values take up the most space

mdisibio · 2023-07-12T12:28:12Z

This is amazing, wins in both size and speed 🚀 I really like the improvement on the hardest query span_attr_match which is 50% faster. This is searching the generic key/value columns correct? It would be interesting to see the column sizes, if you have that information handy, and compare the size reduction to the speed improvement.

joe-elliott · 2023-07-12T12:33:38Z

Nice! A bit concerned about that footer increase, but 1.5MB for a 6GB block isn't bad.

Numbers look awesome overall.

stoewer · 2023-07-12T12:40:54Z

Yes span_attr_match is the benchmark that searches through the generic attribute column and matches a result

stoewer mentioned this issue Jul 10, 2023

vParquet3: Configurable dedicated columns for span/resource attributes #2527

Closed

stoewer self-assigned this Jul 10, 2023

stoewer added this to Tempo squad Jul 10, 2023

github-project-automation bot moved this to Todo in Tempo squad Jul 10, 2023

stoewer moved this from Todo to In Progress in Tempo squad Jul 10, 2023

stoewer changed the title ~~Run benchmark with large production data~~ Run dedicated attribute column benchmark on production data Jul 12, 2023

stoewer moved this from In Progress to In Review in Tempo squad Jul 12, 2023

stoewer closed this as completed Jul 14, 2023

github-project-automation bot moved this from In Review to Done in Tempo squad Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run dedicated attribute column benchmark on production data #2629

Run dedicated attribute column benchmark on production data #2629

stoewer commented Jul 10, 2023 •

edited by zalegrala

Loading

stoewer commented Jul 12, 2023 •

edited

Loading

mdisibio commented Jul 12, 2023

joe-elliott commented Jul 12, 2023

stoewer commented Jul 12, 2023

Run dedicated attribute column benchmark on production data #2629

Run dedicated attribute column benchmark on production data #2629

Comments

stoewer commented Jul 10, 2023 • edited by zalegrala Loading

stoewer commented Jul 12, 2023 • edited Loading

mdisibio commented Jul 12, 2023

joe-elliott commented Jul 12, 2023

stoewer commented Jul 12, 2023

stoewer commented Jul 10, 2023 •

edited by zalegrala

Loading

stoewer commented Jul 12, 2023 •

edited

Loading