Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run dedicated attribute column benchmark on production data #2629

Closed
Tracked by #2527
stoewer opened this issue Jul 10, 2023 · 4 comments
Closed
Tracked by #2527

Run dedicated attribute column benchmark on production data #2629

stoewer opened this issue Jul 10, 2023 · 4 comments
Assignees

Comments

@stoewer
Copy link
Contributor

stoewer commented Jul 10, 2023

Run a local benchmark (based on BenchmarkBackendBlockTraceQL). To get realistic results the benchmark uses large blocks from production instances and compares the TraceQL performance with and without dedicated attribute column assignments.

The benchmark results are added as comments to this issue.

@stoewer stoewer self-assigned this Jul 10, 2023
@stoewer stoewer moved this from Todo to In Progress in Tempo squad Jul 10, 2023
@stoewer stoewer changed the title Run benchmark with large production data Run dedicated attribute column benchmark on production data Jul 12, 2023
@stoewer
Copy link
Contributor Author

stoewer commented Jul 12, 2023

Benchmark results

Block size: The benchmark was performed on a production block with compaction level 5. The size of the block with vParquet2 encoding was 6.07 GB. After converting the block to the vParquet3 schema the increased to 6.26 GB (3.01%). The size increases due to the added columns containing all null values.
After moving 10 resource and 10 span attributes to dedicated columns** the block size reduced to 4.93 GB (-18.74% compared to vParquet2).

Footer size: During the conversion from vParquet2 to vParquet3 with configured dedicated columns the footer size increased from 1090692 byte to 1577277 byte (44.61%).

Block meta size: Configuring dedicated columns increases the size of the meta.json from 401 byte to 1595 byte (298%). This will also impact the size of the index.json.gz. Probably less than the uncompressed meta.json as the dedicated columns definitions contain a lot of redundant information (type, scope).

The following benchmark compares vParquet2 vs vParquet3 with dedicated columns.

cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics     
                                                      │ bench-traceql-no-dedicated-cols.txt │ bench-traceql-with-dedicated-cols.txt │
                                                      │               sec/op                │     sec/op      vs base               │
DedicatedBlockTraceQL/res_attr_match-16                                       214.91m ± 10%      87.09m ± 5%  -59.48% (p=0.000 n=8)
DedicatedBlockTraceQL/res_attr_no_match-16                                    181.83m ±  4%      44.97m ± 1%  -75.27% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_res_attr_match-16                             159.81m ± 11%      46.95m ± 5%  -70.62% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_res_attr_no_match-16                          180.67m ±  2%      37.31m ± 2%  -79.35% (p=0.000 n=8)
DedicatedBlockTraceQL/span_attr_match-16                                      181.70m ±  9%      90.07m ± 6%  -50.43% (p=0.000 n=8)
DedicatedBlockTraceQL/span_attr_no_match-16                                   185.44m ±  4%      64.40m ± 2%  -65.27% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_span_attr_match-16                            131.70m ± 21%      52.84m ± 9%  -59.88% (p=0.000 n=8)
DedicatedBlockTraceQL/dedicated_span_attr_no_match-16                         184.85m ±  3%      41.94m ± 3%  -77.31% (p=0.000 n=8)
geomean                                                                        176.1m            55.36m       -68.57%

** The dedicated attributes were chosen by analyzing the block and selecting the 10 attributes where the uncompressed values take up the most space

@stoewer stoewer moved this from In Progress to In Review in Tempo squad Jul 12, 2023
@mdisibio
Copy link
Contributor

This is amazing, wins in both size and speed 🚀 I really like the improvement on the hardest query span_attr_match which is 50% faster. This is searching the generic key/value columns correct? It would be interesting to see the column sizes, if you have that information handy, and compare the size reduction to the speed improvement.

@joe-elliott
Copy link
Member

Nice! A bit concerned about that footer increase, but 1.5MB for a 6GB block isn't bad.

Numbers look awesome overall.

@stoewer
Copy link
Contributor Author

stoewer commented Jul 12, 2023

Yes span_attr_match is the benchmark that searches through the generic attribute column and matches a result

@stoewer stoewer closed this as completed Jul 14, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in Tempo squad Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants