Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive metrics from ingested profiles #2908

Open
kolesnikovae opened this issue Jan 10, 2024 · 0 comments
Open

Derive metrics from ingested profiles #2908

kolesnikovae opened this issue Jan 10, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Jan 10, 2024

Background

Pyroscope offers users a query interface for basic filtering and aggregation, enabling profiling data analysis. This includes the ability to query a profile and its associated time-series for a specific execution context, such as an HTTP endpoint:

image

Profiling data can serve as a source for deriving metrics (time series) in various scenarios:

  • Measuring resources consumed by specific job classes or execution scopes in the application.
  • Implementing scheduling, pacing, throttling, and rate limiting based on resource consumption for account management, billing, and accounting.
  • Managing resource quotas.
  • Continuous benchmarking.

Another area of interest is alerting, behavior analysis, and anomaly detection.

Runtime environments rarely allows for the collection of such statistics in a convenient way, and profiling data augmented with dynamic tags (sample labels) can allow us to fill this gap.

Problem

Pyroscope faces limitations hindering its use in scenarios mentioned above:

  • Sensitivity to the curse of dimensionality in sampled profiling data, particularly with dynamic labels of high cardinality.
  • Limited data manipulation capabilities in the Pyroscope query engine compared to traditional time-series databases.
  • Specificity of the Pyroscope query interface, making it incompatible with existing query languages and dialects (except Prometheus label matchers).

These limitations will be mitigated in the future but won't be completely eliminated.

Proposal

Code Instrumentation

I propose an SDK that enables users to define metrics directly in the code, similar to regular metrics definition. For example, in Go, a Prometheus-like interface could be implemented as follows:

// The timer/watcher is to be initialized similarly to any other prometheus
// metrics. The difference is that it does not need/support registration,
// and is never exported at the source (although it can be implemented).
var jobCPUTimer = cputimer.NewCPUTimerVec(cputimer.Opts{
	Name: "my_job_cpu_time_total",
}, []string{"class", "account"})

var userCPUTimer = jobCPUTimer.WithLabelValues("ETL", "user-1")

func doJob(ctx context.Context) {
	ctx, stop := userCPUTimer.Start(ctx)
	defer stop()
        // CPU-intensive job
}

Through pprof labels, samples of the timer scope are attributed to specific time series, such as my_job_cpu_time_total{class="ETL",account="user-1"}. Static profile labels (e.g., pod, service_name, region, etc.) also become part of the metric labels.

Recording rules

Additionally, users should be able to configure metric recording rules outside the code when instrumentation is not available or dynamic labels are not supported for a given profile type. These rules allow scoping metrics to a stack trace location or a function name, albeit with increased configuration complexity due to the volatile nature of stack traces.

Metrics export

On the server side, distributors extract configured and annotated time-series from sample values and proxy them to a time-series database in commonly used formats. Metric labels are removed from profiles to avoid issues associated with high cardinality of sample dimensions.

Prometheus remote write could be the default protocol, but it requires strict ordering of samples by timestamp within a single series (so does Mimir, although it can handle OOO writes). This may require us to introduce a dedicated component/service responsible for ordering, batching, sharding, and sending. A good example is Tempo's metrics-generator.

A very basic prototype (user code):

image

Related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant