Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

measurement residual in SDK #2199

Open
zhchai opened this issue Jun 19, 2023 · 10 comments
Open

measurement residual in SDK #2199

zhchai opened this issue Jun 19, 2023 · 10 comments
Assignees
Labels
bug Something isn't working do-not-stale

Comments

@zhchai
Copy link

zhchai commented Jun 19, 2023

Describe your environment
opentelemetry-cpp 1.9.0 SDK

Steps to reproduce
Using one instrument to observe memory of multiple processes, process can appear and disappear dynamically.
Measurements of disappeared process are still transmitted out to opentelemetry collector.
code snippet:

  auto exporter = otlp_exporter::OtlpGrpcMetricExporterFactory::Create(options);

  std::string version{"1.2.0"};
  std::string schema{"https://opentelemetry.io/schemas/1.2.0"};

  // Initialize and set the global MeterProvider
  metric_sdk::PeriodicExportingMetricReaderOptions options;
  options.export_interval_millis = std::chrono::milliseconds(1000);
  options.export_timeout_millis  = std::chrono::milliseconds(500);
  std::unique_ptr<metric_sdk::MetricReader> reader{
      new metric_sdk::PeriodicExportingMetricReader(std::move(exporter), options)};
  auto provider = std::shared_ptr<metrics_api::MeterProvider>(new metric_sdk::MeterProvider());
  auto p        = std::static_pointer_cast<metric_sdk::MeterProvider>(provider);
  p->AddMetricReader(std::move(reader));

  metrics_api::Provider::SetMeterProvider(provider);
void foo_library::observable_process_example(const std::string &name)
{
  std::string observableProcess = name + "_process_metrics_v1";
  auto provider                               = metrics_api::Provider::GetMeterProvider();
  nostd::shared_ptr<metrics_api::Meter> meter = provider->GetMeter(name, "1.2.0");
  auto process_mem_counter = meter->CreateInt64ObservableGauge(observableProcess);
  process_mem_counter->AddCallback(MeasurementFetcher::process_callback, nullptr);
  while(1)
  {
      std::this_thread::sleep_for(std::chrono::seconds(1));
  }
}
  static void process_callback(opentelemetry::metrics::ObserverResult observer, void * /* state */)
  {
    auto observer_int = nostd::get<nostd::shared_ptr<opentelemetry::metrics::ObserverResultT<int64_t>>>(observer);
    for (int n: processes)
    {
        process_memory = getProcessMemory(n); //get processes memory info for pid n
        observer_int ->Observe(process_memory, {{"process_name", process.name},{"pid", n}});
    }
    
  }

What is the expected behavior?
If the process exit, it's metrics should not be seen in opentelemetry collector.

What is the actual behavior?
Measurements of disappeared process are still transmitted out to opentelemetry collector.

Additional context
see code snippet.

@zhchai zhchai added the bug Something isn't working label Jun 19, 2023
@lalitb lalitb self-assigned this Jun 19, 2023
@lalitb
Copy link
Member

lalitb commented Jun 19, 2023

Duplicate to #2184. I am working on the fix, tentatively planned to be resolved during this week.

@lalitb
Copy link
Member

lalitb commented Jun 20, 2023

@zhchai I see that you are using default temporality for OTLP exporter, which is Cumulative. With this temporality, even when there is no new measurement for a given instrument within a collection period, the previous metric value would be exported. With delta temporality, if there is no new measurement within a collection period, no new metrics would be generated for that instrument.
You can configure the temporality to Delta and try again ?

EDIT - Please ignore my comment. I realized this is Observable Gauge and should only return the values between the collection interval. Will get this fixed.

@gaochuang
Copy link

@lalitb May I ask what is the cause of this issue? And When will this be fixed? Thanks.

@lalitb
Copy link
Member

lalitb commented Jun 21, 2023

@gaochuang the fix is in the temporal storage for the observable aggregation, targeting to fix during this week and for the next release.

@gaochuang
Copy link

@lalitb Thanks.

@lalitb
Copy link
Member

lalitb commented Jun 23, 2023

@gaochuang @zhchai - Sorry for all the confusion here. I had to revisit our implementation, and also see the behavior of other language implementations (JS, .Net) here. So, to summarize the findings:

For an Observable Gauge with Last Value Aggregation (irrespective of the temporarily configured), the exporter will emit metrics for all unique sets of attributes observed since the start, even if there are no new measurements generated for a specific set of attributes in subsequent reporting periods.

This is how it is implemented in C++, and also in JS and .Net as I validated.

In the example shared here, if there are no new measurements for given process, every collection will still have the metric points for them with the last aggregated timestamp. Perhaps this timestamp can be used in backend as the criteria for when was the process active last.

Not directly related to this issue, but I noticed that we don't have the required unit-tests for Observable Gauge to do validation, I will add them as part of closing this issue.

@gaochuang
Copy link

gaochuang commented Jun 24, 2023

@lalitb Thanks for the clarification. Would like to ask a question, if I want to avoid this behavior, how can I do it? By the way, Do you have a guide or examples?

@zhchai
Copy link
Author

zhchai commented Jun 25, 2023

if there are no new measurements for given process, every collection will still have the metric points for them with the last aggregated timestamp. Perhaps this timestamp can be used in backend as the criteria for when was the process active last.

Actually we find that the timestamps of these measurements of exited processes are always updated, which is not match with "the last aggregated timestamp".

@lalitb
Copy link
Member

lalitb commented Jun 26, 2023

if I want to avoid this behavior, how can I do it

Ideally I won't advice using the process_name/pid as key's in the measurement attributes. Please read about the cardinality limit. The set of possible values an attribute set can have should be finite, and SDK recommends it to be 2000. In this case, for the attribute set {{"process_name", process.name},{"pid", n}}, the possible values are infinite. As the processes keep on created/terminated over time in given system, and new attribute values keep generating, and so is the aggregated data points for these attribute values. And these aggregated data points are stored in the SDK memory forever (in case of last-value aggregation), which can lead to cardinality explosion.
The recommendation would be instead to

  • Use System Metrics semantic conventions to collect the memory usage in the given system. Ofcourse, this will not provide per-process memory usage.
  • Use Process Metrics to collect memory usage of given process.

Unfortunately, I don't see ideal way of creating an application to sending memory-usage of all the currently running processes as measurement. I would recommend creating a GitHub issue in otel-specs repo asking for recommendation.

Actually we find that the timestamps of these measurements of exited processes are always updated

I earlier tested with console exporter, and it shows the last measurement timestamp, and same is specified in specs, will check with OTLP exporter too if it behaves differently.

@github-actions
Copy link

This issue was marked as stale due to lack of activity.

@github-actions github-actions bot added the Stale label Aug 26, 2023
@marcalff marcalff removed this from the Metrics post GA release - 1 milestone Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working do-not-stale
Projects
None yet
Development

No branches or pull requests

5 participants