-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify the user-facing API for metric instruments #299
Specify the user-facing API for metric instruments #299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First half of the document, will review the rest tonight after some meetings.
required to delete handles when they are no-longer in use. | ||
|
||
```golang | ||
func (s *server) processStream(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Handle should mostly have the same lifetime as the Metric. Deleting a Handle should happen only when no other values will be recorded for that instrument/labelset.
I think the Handle.Delete is confusing, and probably should be removed for the moment. The intention of deleting is different than the one in this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I can't delete a handle, it's only useful to me for objects that live indefinitely in my process. This seems like a significant detractor. Also if handles can't be deleted, they can't be used to implement the direct access calling convention. In my Go SDK I've implemented support for handle deletion, and it is tricky, but it is the only way to make handles useful for definite-lifetime objects.
I would assume that a handle can be deleted and that the corresponding resources will be reclaimed once those updates are collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand, I think Handles semantically mean a "bidding" between an Instrument (which has a process lifetime) and a LabelSet (which if I understand correctly has the same lifetime) so it does not extend any lifetime or does not have any lifetime concern.
If LabelSet does not have a process lifetime then we need a delete there as well. Personally I see no reason to have delete (I inherit that from Prometheus I think but never seen a real use-case for these). So I am happy to just remove the Delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my thinking, LabelSet
objects do not have a lifetime managed by the SDK. They are values that the caller is allowed to refer to, they will be garbage collected when no longer used or released when the object is finalized. Think of a LabelSet as a value that can be used for a fast lookup, not as something registered by the SDK with a managed lifetime.
On the other handle, there is a presumption that handles take resources and should support delete. Prometheus clients do support delete as well, for example: https://godoc.org/github.com/prometheus/client_golang/prometheus#CounterVec.Delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Delete in prometheus is to delete the "Metric" and no longer report it, so should be called when no more values will be recorded for that "Metric" (metric in the description of the Delete function is what it is called Timeseries in OpenMetrics).
To understand that some internal infos are required about prometheus implementation:
Metric {
Map<Map<String, String> /labels/, Value /usually some atomic var/> timeseries
}
Delete will remove the entry from the timeseries map so will never be reported again. So deleting the handle should not be used as in the examples you gave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It suggests that you can only delete handles after you stop using them and force a collection, somehow. This doesn't look well addressed in Prometheus, but the documentation for Delete is pretty clear about its purpose, and I believe it's the same purpose.
I'm asking for more than Prometheus offers. I intend for the implementation of the direct calling convention to be: (1) acquire handle, (2) operate, (3) delete handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand if we are on the same page about prometheus use of Delete which is different than what you propose. I think prometheus use has a lot of benefits, also keep in mind that an implementation is free to not allocate anything in any map or anywhere so you can still avoid holding memory if that is your goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very clear how exactly Prometheus intends their Delete method to be used, but it does accomplish the same goal that I have, which is to free memory. I now understand that this discussion relates with Prometheus semantics, which require all memory to be pinned. The only way to release memory is to expunge it and reset the process start timestamp, i.e., force a reset.
It does appear that a Prometheus user can achieve the effect that I want if they: (1) issue final metric updates on a handle, (2) force a Collection, (3) Delete the timeseries to free resources. This allows the data to be collected and memory to be released, but it breaks the semantics of Prometheus, because now a metric that once existed as actually disappeared, which is unexpected.
Now consider a statsd exporter that pushes data out periodically--it benefits from the same aggregation as the prometheus client in this case, but it's able to free memory with correct semantics. After you send a gauge update, you're free to forget that gauge ever existed. This is a fundamental difference. If I have a process that's going to be up for days, I want it to free the bits of memory that it's using to store unused metrics. I want all the performance benefits of handles, but cannot afford for the memory not to be reclaimed.
I see this as a major drawback of the Prometheus model, which can be fixed by pushing data into a server that keeps state. This is where I expect OpenTelemetry clients to head -- pushing aggregated metrics to an opentelemetry service that can export Prometheus.
As you know, an implementation is expected to implement handles for efficiency. I've stated that my intention is to use handles as the underlying implementation for direct calls. I.e., Counter.Add() will be implemented as (1) Get handle, (2) Add(), (3) Release handle <--- (i.e., delete the handle). I'm willing to stay away from the word "Delete" because it implies deletion of a timeseries. I'm looking for "release" the handle, in order to forget it and allow memory to be reclaimed in a situation where Prometheus semantics are not requiring the memory to be kept. I don't consider an SDK that does "not allocate anything" to be an option--we expect handles to be fast, I just don't expect them to tie up memory for the lifetime of the process.
This proposal requires that we implement a method to release handles, and I'm flexible on naming. How do you like Forget()
or Release()
?
Re-usable `LabelSet` objects provide a potential optimization for | ||
scenarios where handles might not be effective. For example, if the | ||
label set will be re-used but only used once per metric, handles do | ||
not offer any optimization. It may be best to pre-compute a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can "pre-compute" the LabelSet
then you can "pre-compute" the Handle
which is a bit more optimal, so I think this should say that if there is an opportunity to "pre-compute" the LabelSet
then user should use Handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have both. You can compute one label set and then use it to construct N handles.
(As a user I am still not interested in using handles. I do not wish to allocate one field per metric that I use regardless of the optimization win.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a lot more language about the relative performance of handles and label sets. I also added a language-optional spec for ordered-LabelSet construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It may be best to pre-compute a canonicalized LabelSet
once and re-use it with the direct calling convention."
Does this refer to process level lifetime or request level lifetime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not intend for LabelSets to have a lifetime--they need not tie up any resources inside the SDK. Only when you associate a LabelSet with an instrument (i.e., bind a handle), only then are resources consumed that should potentially be deleted when the handle lifetime ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even then I think it is up to the implementation, you can return a Bind/Handle that does not reference any internal memory, but that allows us to also reference a small amount of extra memory (which to be honest is way less than the LabelSet because we are talking about an Atomic number).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the proposed Go SDK, I allocate these small amount of extra memory the way you would expect. Handles are supported. These small amounts of memory have to be freed when they are no longer in use. Handles support a delete method so that after they've been fully collected, the memory can be freed. How about Release()
or Forget()
?
specification/api-metrics-user.md
Outdated
`Add()`, `Set()`, and `Record()` method for submitting individual | ||
metric events. | ||
|
||
### Observer metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at this example in Java for runtime metrics where I was able to use Handles and also probably win some performance:
https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/GarbageCollector.java#L59
I was able to pre-compute all the Handles.
Also I like (ok don't blame me that I like my design there) the way I passed an object called "Result" or "Update" to the callback that is used to construct the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your example was a bit surprising. Why not let each Observer handle have its own callback? That seems more natural. Not saying it's bad, but letting handles use their own callback could allow parallelization.
Instead of Result.put
, I would expect Result.observe
, fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallelization can be implemented if really needed in the callback. The most cases where I had to use the Observers were cases where multiple values (for more than one Handle) are gotten using one syscall (or any other mechanism). For example here https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/MemoryPools.java#L94 you can see that I use heapUsage result to update 3 handles, if one callback per Handle I need to get the heapUsage 3 times.
Also I saw another interesting pattern, where a syscall (or any mechanism to get the metric) requires the data to update different Handles in different metrics (this is a bit more rare than the example I gave to you).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The put vs observe I have 0 problem changing that. Put is a Java specific keyword so happy to use observe there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to return to the OTEP 0008 on metric observers and pin down a few more details. I'm removing it from this draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry got trapped. Thought that the whole PR was just that small fix
I've run out of time to respond to these remarks today. Back at it next week. |
specification/api-metrics-user.md
Outdated
metric events, but offer varying degrees of performance and | ||
convenience. | ||
|
||
#### Metric handle calling convention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the name "Bound Instruments", created via Instrument.Bind(labels)
method. A handle is essentially an instrument (a thing used to record measurements), so imo introducing a new concept of "handle" is more cognitive overhead than a "slightly different/optimized instrument".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdandrutu Do you agree? I think it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just a name change, I am usually very flexible with that, if it does change behavior I would like to understand what changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's just a name change. We would replace "Instrument handle" with "Bound instrument", "GetHandle" with "Bind", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I update the names in this PR?
specification/api-metrics-user.md
Outdated
|
||
### Observer metric | ||
|
||
Observer metrics do not support handles or direct calls. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, I am very confused by the observers and what they are used for. Could use some examples / use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Java I created an example for some runtime metrics that uses observer https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/GarbageCollector.java#L59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that example, it didn't help. How is that different from having a timer thread that would bulk-publish metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general you can see these two as equivalent. There are some important differences:
- With Observer implementation can better control which metrics to read, as an example some metrics can be disabled and we don't have to read them at all (which is not possible with the timer because the timer callback will get executed anyway).
- With Observer pattern in case of a pull based model like Prometheus, we do execute the read of the metrics only when an external event ask for the metric (in case of Prometheus when the server does a request to the client).
- With Observer the some implementations can give users an uniform way to control the read interval for these metrics.
- In some languages is not that easy to start a timer (for example in Java that will be around 50 (estimated) lines of code). So having this logic in one place is better. Also timer is not free and having the possibility to implement all these callbacks with less timers may be something that some implementations may consider.
These are some important advantages to have the logic of reading these kind of metrics controlled by the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text describing Observer says nothing about all these aspects, like timers, interval control, what triggers the callback, etc. Can we remove it from this PR and do another one dedicated just to the Observer, with more detailed description / specification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion about doing it in a different PR or in this one, but I do agree that things that I mentioned in the previous comment should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have to add more content, I'm happy to iterate in this PR because there are many open questions that will delay it. The OTEP has details: https://github.com/open-telemetry/oteps/blob/master/text/0008-metric-observer.md. The argument is that it's most efficient when the exporter or whomever is pulling metrics determines the frequency that metrics are captured. This is important in cases where the metrics are expensive to compute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to return to the OTEP 0008 on metric observers and pin down a few more details. I'm removing it from this draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there are a lot of open questions here, I will keep this PR open and push one update at the end of today. Meanwhile, please consider the questions raised by me and others, here.
specification/api-metrics-user.md
Outdated
metric events, but offer varying degrees of performance and | ||
convenience. | ||
|
||
#### Metric handle calling convention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdandrutu Do you agree? I think it's a good idea.
specification/api-metrics-user.md
Outdated
`Add()`, `Set()`, and `Record()` method for submitting individual | ||
metric events. | ||
|
||
### Observer metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your example was a bit surprising. Why not let each Observer handle have its own callback? That seems more natural. Not saying it's bad, but letting handles use their own callback could allow parallelization.
Instead of Result.put
, I would expect Result.observe
, fwiw.
specification/api-metrics-user.md
Outdated
|
||
### Observer metric | ||
|
||
Observer metrics do not support handles or direct calls. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have to add more content, I'm happy to iterate in this PR because there are many open questions that will delay it. The OTEP has details: https://github.com/open-telemetry/oteps/blob/master/text/0008-metric-observer.md. The argument is that it's most efficient when the exporter or whomever is pulling metrics determines the frequency that metrics are captured. This is important in cases where the metrics are expensive to compute.
…for allocating static metrics
Did you forget to push your change? |
| Unit | WithUnit(string) | Units specified according to the [UCUM](http://unitsofmeasure.org/ucum.html). | | ||
| Recommended label keys | WithRecommendedKeys(list) | Recommended grouping keys for this instrument. | | ||
| Monotonic | WithMonotonic(boolean) | Configure a counter or gauge that accepts only monotonic/non-monotonic updates. | | ||
| Absolute | WithAbsolute(boolean) | Configure a measure that does or does not accept negative updates. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change considering your feedback on the previous PR. We've got two options here, WithMonotonic(boolean)
and WithAbsolute(boolean)
. To construct a measure instrument for signed values, you would pass WithAbsolute(false)
as the option. Sound good? I will update the other document accordingly in a parallel PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AloisReitbauer ^^^
My understanding of this PR is that in principle it looks good. And now there are many questions about naming and behavior of specific methods. @jmacd @bogdandrutu @yurishkuro @rghetia @lzchen do you believe those issues may be filed as a separate items and marked for v0.3 while this PR is merged? Or the entire PR postponed to the next milestone? |
I prefer on merging this PR and sorting out the remaining issues in v0.3 rather than moving this entirely to v0.3 |
There appears to be a disagreement about the handle I believe we could quickly modify "Instrument handle" to "Bound instrument" as a follow-on, I'd rather not continue modifying this PR. I also intend to supply a document detailing the |
@jmacd can we ship without |
I think we are seeing that it is not trivial to add I would like to hear a convincing plan that implements the direct calling convention without using handles underneath before we remove |
@bogdandrutu @SergeyKanzhelev I am willing to take |
@jmacd I believe we will need a variant of With this in mind it's easier if we will simply remove the paragraph about it for now. |
Done @SergeyKanzhelev |
@bogdandrutu @yurishkuro @rghetia @lzchen please file issues off the every outstanding conversation. Merging the rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make progress, let's submit the PR as it is and fix outstanding comments in a future PR.
merging with the outstanding comments to be addressed. |
* Edits, expansion * Minor fix for AloisReitbauer's feedback on PR250 * Describe metrics event context; make use of global Meter recommended for allocating static metrics * Feedback part 1 * Feedback part 2 * Feedback part 3 * Feedback part 4 * Revert api-metrics.md fix * Comment on convenience overload to bypass LabelSet * Change signed/non-negative to absolute * Remove observer instrument from this spec for v0.2 * Remove Delete for 0.2
* Edits, expansion * Minor fix for AloisReitbauer's feedback on PR250 * Describe metrics event context; make use of global Meter recommended for allocating static metrics * Feedback part 1 * Feedback part 2 * Feedback part 3 * Feedback part 4 * Revert api-metrics.md fix * Comment on convenience overload to bypass LabelSet * Change signed/non-negative to absolute * Remove observer instrument from this spec for v0.2 * Remove Delete for 0.2
This is rewritten content that was removed from PR#250. There is only only substantial change here, but it is substantial. This specification now says that instruments are allocated by the
Meter
, meaning they are automatically registered and qualified by the named Meter's component namespace. This resolves questions about what kind of registry support is needed and how to export metrics that have not been updated.See the new caveats about instrument storage. These changes will encourage users not to store instruments in static scope, otherwise they will be forced to use the global Meter to allocate metric instruments.