You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pyroscope offers users a query interface for basic filtering and aggregation, enabling profiling data analysis. This includes the ability to query a profile and its associated time-series for a specific execution context, such as an HTTP endpoint:
Profiling data can serve as a source for deriving metrics (time series) in various scenarios:
Measuring resources consumed by specific job classes or execution scopes in the application.
Implementing scheduling, pacing, throttling, and rate limiting based on resource consumption for account management, billing, and accounting.
Managing resource quotas.
Continuous benchmarking.
Another area of interest is alerting, behavior analysis, and anomaly detection.
Runtime environments rarely allows for the collection of such statistics in a convenient way, and profiling data augmented with dynamic tags (sample labels) can allow us to fill this gap.
Problem
Pyroscope faces limitations hindering its use in scenarios mentioned above:
Sensitivity to the curse of dimensionality in sampled profiling data, particularly with dynamic labels of high cardinality.
Limited data manipulation capabilities in the Pyroscope query engine compared to traditional time-series databases.
Specificity of the Pyroscope query interface, making it incompatible with existing query languages and dialects (except Prometheus label matchers).
These limitations will be mitigated in the future but won't be completely eliminated.
Proposal
Code Instrumentation
I propose an SDK that enables users to define metrics directly in the code, similar to regular metrics definition. For example, in Go, a Prometheus-like interface could be implemented as follows:
// The timer/watcher is to be initialized similarly to any other prometheus// metrics. The difference is that it does not need/support registration,// and is never exported at the source (although it can be implemented).varjobCPUTimer=cputimer.NewCPUTimerVec(cputimer.Opts{
Name: "my_job_cpu_time_total",
}, []string{"class", "account"})
varuserCPUTimer=jobCPUTimer.WithLabelValues("ETL", "user-1")
funcdoJob(ctx context.Context) {
ctx, stop:=userCPUTimer.Start(ctx)
deferstop()
// CPU-intensive job
}
Through pprof labels, samples of the timer scope are attributed to specific time series, such as my_job_cpu_time_total{class="ETL",account="user-1"}. Static profile labels (e.g., pod, service_name, region, etc.) also become part of the metric labels.
Recording rules
Additionally, users should be able to configure metric recording rules outside the code when instrumentation is not available or dynamic labels are not supported for a given profile type. These rules allow scoping metrics to a stack trace location or a function name, albeit with increased configuration complexity due to the volatile nature of stack traces.
Metrics export
On the server side, distributors extract configured and annotated time-series from sample values and proxy them to a time-series database in commonly used formats. Metric labels are removed from profiles to avoid issues associated with high cardinality of sample dimensions.
Prometheus remote write could be the default protocol, but it requires strict ordering of samples by timestamp within a single series (so does Mimir, although it can handle OOO writes). This may require us to introduce a dedicated component/service responsible for ordering, batching, sharding, and sending. A good example is Tempo's metrics-generator.
Background
Pyroscope offers users a query interface for basic filtering and aggregation, enabling profiling data analysis. This includes the ability to query a profile and its associated time-series for a specific execution context, such as an HTTP endpoint:
Profiling data can serve as a source for deriving metrics (time series) in various scenarios:
Another area of interest is alerting, behavior analysis, and anomaly detection.
Runtime environments rarely allows for the collection of such statistics in a convenient way, and profiling data augmented with dynamic tags (sample labels) can allow us to fill this gap.
Problem
Pyroscope faces limitations hindering its use in scenarios mentioned above:
These limitations will be mitigated in the future but won't be completely eliminated.
Proposal
Code Instrumentation
I propose an SDK that enables users to define metrics directly in the code, similar to regular metrics definition. For example, in Go, a Prometheus-like interface could be implemented as follows:
Through pprof labels, samples of the timer scope are attributed to specific time series, such as
my_job_cpu_time_total{class="ETL",account="user-1"}
. Static profile labels (e.g.,pod
,service_name
,region
, etc.) also become part of the metric labels.Recording rules
Additionally, users should be able to configure metric recording rules outside the code when instrumentation is not available or dynamic labels are not supported for a given profile type. These rules allow scoping metrics to a stack trace location or a function name, albeit with increased configuration complexity due to the volatile nature of stack traces.
Metrics export
On the server side, distributors extract configured and annotated time-series from sample values and proxy them to a time-series database in commonly used formats. Metric labels are removed from profiles to avoid issues associated with high cardinality of sample dimensions.
Prometheus remote write could be the default protocol, but it requires strict ordering of samples by timestamp within a single series (so does Mimir, although it can handle OOO writes). This may require us to introduce a dedicated component/service responsible for ordering, batching, sharding, and sending. A good example is Tempo's metrics-generator.
A very basic prototype (user code):
Related:
The text was updated successfully, but these errors were encountered: