Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during observe.The metric storage is invalid on removing callbacks #2373

Closed
TheOxFromOutOfTheBox opened this issue Oct 17, 2023 · 3 comments · Fixed by #2376
Closed
Assignees
Labels
bug Something isn't working

Comments

@TheOxFromOutOfTheBox
Copy link

Describe your environment Describe any aspect of your environment relevant to the problem, including your platform, build system, version numbers of installed dependencies, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main branch.
OpenTelemetry v1.9.1 via vcpkg
OpenSUSE 15.5
Steps to reproduce

as per opentelemetry specifications,

For callback functions registered after an asynchronous instrument is created, the API is required to support a mechanism for unregistration. For example, the object returned from register_callback can support an unregister() method directly

for c++, there is AddCallback() and RemoveCallback()

here is how I've created a async gauge, this is done in a new thread whose only job is to check the addresses and update values based on testFetcher.

auto meter_provider = metrics_api::Provider::GetMeterProvider();
instrument_holder.emplace_back(meter->CreateDoubleObservableGauge(gauge_name, description, unit));
instrument_holder.back()->AddCallback(MeasurementFetcher::testFetcher, value);

When I want to end all metric callbacks, I'm facing issues with the handling of the lifecycle of threads and the instrument objects.
My process -
1 - remove callbacks from each ObservableCallbackPtr in instrument_holder
2 - erase all items instrument_holder
3 - join all threads created, which were stored in an object called threads

std::cout<<"end callback for size "<< instrument_holder.size()<<std::endl;
for(int i=0; i<instrument_holder.size();i++)
{
	std::cout<<"inside end callback for index "<< i<<std::endl;
	instrument_holder.back()->RemoveCallback(MeasurementFetcher::testFetcher,nullptr);
}
instrument_holder.erase(instrument_holder.begin(),instrument_holder.end());
for(auto &a: threads)
{
	a.join();
}
threads.erase(threads.begin(),threads.end());

What is the expected behavior?
Should not get any errors while exporting.

What is the actual behavior?
I get the following error

[Error] File: /home/kmh/vcpkg/buildtrees/opentelemetry-cpp/src/v1.9.1-711a32e9e2.clean/sdk/src/metrics/state/observable_registry.cc:56 [ObservableRegistry::Observe] - Error during observe.The metric storage is invalid

Additional context

from issue #2081, this seems to be an issue regarding the destructor.
https://github.com/open-telemetry/opentelemetry-cpp/blob/main/sdk/src/metrics/async_instruments.cc doesn't have any defined destructor.

What is the current behaviour on the object going out of scope, what does the destructor do? Does the backpointer still remain?

I'm not that well versed in C++, so please let me know if I'm missing something very obvious. How can I get rid of this error?

@TheOxFromOutOfTheBox TheOxFromOutOfTheBox added the bug Something isn't working label Oct 17, 2023
@github-actions github-actions bot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 17, 2023
@perhapsmaple
Copy link
Contributor

@lalitb I would like to work on this issue. How about adding another function called CleanupCallbacks to ObservableRegistry that is called from the destructor of ObservableInstrument, which does something like this:

void ObservableRegistry::CleanupCallbacks(opentelemetry::metrics::ObservableInstrument *instrument)
{
  std::lock_guard<std::mutex> lock_guard{callbacks_m_};
  auto iter = std::remove_if(
      callbacks_.begin(), callbacks_.end(),
      [instrument](const std::unique_ptr<ObservableCallbackRecord> &record) {
        return record->instrument == instrument;
      });
  callbacks_.erase(iter, callbacks_.end());
}

Will this be enough to remove all of the back pointers from the registry and avoid crashes in cases like this.

@lalitb
Copy link
Member

lalitb commented Oct 18, 2023

Thanks @perhapsmaple for volunteering to fix this. The approach looks good. Have assigned this to you.

@marcalff
Copy link
Member

Thanks @perhapsmaple

This was discussed but never implemented, see #2081 (comment)

@marcalff marcalff removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants