-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: symdb custom binary format #3138
Conversation
Do we need to update the docs for this feature? |
@knylander-grafana, I'll update the https://grafana.com/docs/pyroscope/latest/reference-pyroscope-architecture/block-format page – just a couple of lines, nothing very important |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating the reference architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
# Conflicts: # pkg/phlaredb/schemas/v1/functions.go # pkg/phlaredb/schemas/v1/locations.go # pkg/phlaredb/schemas/v1/mappings.go # pkg/phlaredb/schemas/v1/schema_test.go # pkg/phlaredb/schemas/v1/strings.go # pkg/phlaredb/symdb/block_reader.go # pkg/phlaredb/symdb/block_writer.go # pkg/phlaredb/symdb/partition_memory.go # pkg/phlaredb/symdb/resolver_pprof.go
Resolves #2926
The change eliminates the use of parquet tables from symdb. This significantly improves read selectivity for symbolic information and, more importantly, enables fetching symbolic information directly from blocks in the object storage without the need to keep parquet files open in memory (in ingesters and store-gateways).
Note that the new format is not enabled by default. This is done for backward compatibility purposes (a feature flag, of sorts). Later, after more intensive internal testing, the format will be enabled by default.
Compression
The new encoding allows to achieve up to 30% reduction in size on disk.
Using the current encoding and block layout:
Encoded in the new format:
Performance
Even though the change was not aimed at directly optimizing performance in terms of query latencies, benchmarks show a ~10-20% reduction in the overall query duration for
SelectMergeByStacktraces
:Single service:
The whole block (
{}
):The test dataset comprises real-life data from one of the internal deployments – 1GB collected over one hour.