measurement residual in SDK #2199

zhchai · 2023-06-19T01:19:56Z

Describe your environment
opentelemetry-cpp 1.9.0 SDK

Steps to reproduce
Using one instrument to observe memory of multiple processes, process can appear and disappear dynamically.
Measurements of disappeared process are still transmitted out to opentelemetry collector.
code snippet:

  auto exporter = otlp_exporter::OtlpGrpcMetricExporterFactory::Create(options);

  std::string version{"1.2.0"};
  std::string schema{"https://opentelemetry.io/schemas/1.2.0"};

  // Initialize and set the global MeterProvider
  metric_sdk::PeriodicExportingMetricReaderOptions options;
  options.export_interval_millis = std::chrono::milliseconds(1000);
  options.export_timeout_millis  = std::chrono::milliseconds(500);
  std::unique_ptr<metric_sdk::MetricReader> reader{
      new metric_sdk::PeriodicExportingMetricReader(std::move(exporter), options)};
  auto provider = std::shared_ptr<metrics_api::MeterProvider>(new metric_sdk::MeterProvider());
  auto p        = std::static_pointer_cast<metric_sdk::MeterProvider>(provider);
  p->AddMetricReader(std::move(reader));

  metrics_api::Provider::SetMeterProvider(provider);

void foo_library::observable_process_example(const std::string &name)
{
  std::string observableProcess = name + "_process_metrics_v1";
  auto provider                               = metrics_api::Provider::GetMeterProvider();
  nostd::shared_ptr<metrics_api::Meter> meter = provider->GetMeter(name, "1.2.0");
  auto process_mem_counter = meter->CreateInt64ObservableGauge(observableProcess);
  process_mem_counter->AddCallback(MeasurementFetcher::process_callback, nullptr);
  while(1)
  {
      std::this_thread::sleep_for(std::chrono::seconds(1));
  }
}

  static void process_callback(opentelemetry::metrics::ObserverResult observer, void * /* state */)
  {
    auto observer_int = nostd::get<nostd::shared_ptr<opentelemetry::metrics::ObserverResultT<int64_t>>>(observer);
    for (int n: processes)
    {
        process_memory = getProcessMemory(n); //get processes memory info for pid n
        observer_int ->Observe(process_memory, {{"process_name", process.name},{"pid", n}});
    }
    
  }

What is the expected behavior?
If the process exit, it's metrics should not be seen in opentelemetry collector.

What is the actual behavior?
Measurements of disappeared process are still transmitted out to opentelemetry collector.

Additional context
see code snippet.

The text was updated successfully, but these errors were encountered:

lalitb · 2023-06-19T06:13:03Z

Duplicate to #2184. I am working on the fix, tentatively planned to be resolved during this week.

lalitb · 2023-06-20T02:03:05Z

@zhchai I see that you are using default temporality for OTLP exporter, which is Cumulative. With this temporality, even when there is no new measurement for a given instrument within a collection period, the previous metric value would be exported. With delta temporality, if there is no new measurement within a collection period, no new metrics would be generated for that instrument.
You can configure the temporality to Delta and try again ?

EDIT - Please ignore my comment. I realized this is Observable Gauge and should only return the values between the collection interval. Will get this fixed.

gaochuang · 2023-06-21T00:04:59Z

@lalitb May I ask what is the cause of this issue? And When will this be fixed? Thanks.

lalitb · 2023-06-21T03:39:32Z

@gaochuang the fix is in the temporal storage for the observable aggregation, targeting to fix during this week and for the next release.

gaochuang · 2023-06-21T07:04:01Z

@lalitb Thanks.

lalitb · 2023-06-23T18:57:14Z

@gaochuang @zhchai - Sorry for all the confusion here. I had to revisit our implementation, and also see the behavior of other language implementations (JS, .Net) here. So, to summarize the findings:

For an Observable Gauge with Last Value Aggregation (irrespective of the temporarily configured), the exporter will emit metrics for all unique sets of attributes observed since the start, even if there are no new measurements generated for a specific set of attributes in subsequent reporting periods.

This is how it is implemented in C++, and also in JS and .Net as I validated.

In the example shared here, if there are no new measurements for given process, every collection will still have the metric points for them with the last aggregated timestamp. Perhaps this timestamp can be used in backend as the criteria for when was the process active last.

Not directly related to this issue, but I noticed that we don't have the required unit-tests for Observable Gauge to do validation, I will add them as part of closing this issue.

gaochuang · 2023-06-24T02:03:10Z

@lalitb Thanks for the clarification. Would like to ask a question, if I want to avoid this behavior, how can I do it? By the way, Do you have a guide or examples?

zhchai · 2023-06-25T01:16:56Z

if there are no new measurements for given process, every collection will still have the metric points for them with the last aggregated timestamp. Perhaps this timestamp can be used in backend as the criteria for when was the process active last.

Actually we find that the timestamps of these measurements of exited processes are always updated, which is not match with "the last aggregated timestamp".

lalitb · 2023-06-26T05:24:50Z

if I want to avoid this behavior, how can I do it

Ideally I won't advice using the process_name/pid as key's in the measurement attributes. Please read about the cardinality limit. The set of possible values an attribute set can have should be finite, and SDK recommends it to be 2000. In this case, for the attribute set {{"process_name", process.name},{"pid", n}}, the possible values are infinite. As the processes keep on created/terminated over time in given system, and new attribute values keep generating, and so is the aggregated data points for these attribute values. And these aggregated data points are stored in the SDK memory forever (in case of last-value aggregation), which can lead to cardinality explosion.
The recommendation would be instead to

Use System Metrics semantic conventions to collect the memory usage in the given system. Ofcourse, this will not provide per-process memory usage.
Use Process Metrics to collect memory usage of given process.

Unfortunately, I don't see ideal way of creating an application to sending memory-usage of all the currently running processes as measurement. I would recommend creating a GitHub issue in otel-specs repo asking for recommendation.

Actually we find that the timestamps of these measurements of exited processes are always updated

I earlier tested with console exporter, and it shows the last measurement timestamp, and same is specified in specs, will check with OTLP exporter too if it behaves differently.

github-actions · 2023-08-26T01:44:15Z

This issue was marked as stale due to lack of activity.

zhchai added the bug Something isn't working label Jun 19, 2023

lalitb self-assigned this Jun 19, 2023

lalitb modified the milestones: Logs post GA release, Metrics post GA release Jun 19, 2023

zhchai mentioned this issue Jun 29, 2023

Recommendation of metircs for monitoring memory-usage of all the currently running processes open-telemetry/opentelemetry-specification#3575

Closed

github-actions bot added the Stale label Aug 26, 2023

ThomsonTan added do-not-stale and removed Stale labels Aug 26, 2023

marcalff removed this from the Metrics post GA release - 1 milestone Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

measurement residual in SDK #2199

measurement residual in SDK #2199

zhchai commented Jun 19, 2023

lalitb commented Jun 19, 2023

lalitb commented Jun 20, 2023 •

edited

Loading

gaochuang commented Jun 21, 2023

lalitb commented Jun 21, 2023

gaochuang commented Jun 21, 2023

lalitb commented Jun 23, 2023 •

edited

Loading

gaochuang commented Jun 24, 2023 •

edited

Loading

zhchai commented Jun 25, 2023

lalitb commented Jun 26, 2023

github-actions bot commented Aug 26, 2023

measurement residual in SDK #2199

measurement residual in SDK #2199

Comments

zhchai commented Jun 19, 2023

lalitb commented Jun 19, 2023

lalitb commented Jun 20, 2023 • edited Loading

gaochuang commented Jun 21, 2023

lalitb commented Jun 21, 2023

gaochuang commented Jun 21, 2023

lalitb commented Jun 23, 2023 • edited Loading

gaochuang commented Jun 24, 2023 • edited Loading

zhchai commented Jun 25, 2023

lalitb commented Jun 26, 2023

github-actions bot commented Aug 26, 2023

lalitb commented Jun 20, 2023 •

edited

Loading

lalitb commented Jun 23, 2023 •

edited

Loading

gaochuang commented Jun 24, 2023 •

edited

Loading