-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OTLP Registry] Timestamps are exported without seconds/milliseconds when temporality is DELTA #5041
Comments
This is expected and yes it is by design. The OtlpMeterRegistry in delta mode uses StepMeter implementation which collects data for the step duration. (Gauge should be a exception for this but I see that is not the case for which I will file a follow-up issue)
Are you saying that when your step is finer and still not getting
Yes. This is particularly helpful in a distributed environment where 'N' different instances send data to the back-end. When plotting the charts, if all the data is for the same duration(i,e interval), it becomes straightforward to make correlations. As opposed if they are randomized, the data correlation can be off upto 2 step lengths. |
Isn't that always the case with every |
No. The micrometer StepMeterRegistries are always aligned with the step. |
I might be missing something very obvious. The reason the timestamps are truncated in the OTLP registry is so that different instances of Micrometer running in different places always align to the same "time buckets". This is not automatically the case, since the Micrometer instances start at different times. Then they export at the same interval, but these intervals are very unlikely to align, especially if you have multiple instances. This is what this calculation addresses, right? Wouldn't that problem be the same if you used any other StepMeterRegistry? Is there a reason this is worse for the OTLP registry? |
Maybe I was a bit unclear with my explanations. What I meant is - All the step registries exhibit the same behavior i,e they collect data for the same step window (let's say the step is 1 minute, every instance would collect data from the start of any minute to the end of any minute). Of course, the exceptions are app start and end minutes where this will be an incomplete window. The collected data would be exported randomly over the course of the next step(minute in this case). This is to avoid all the instances flooding the backend with requests exactly at the same moment(typically at the start of the minute). But the data would be exported would be for a full step and the timestamps would give you that information. OTLP has 2 fields for timestamp, start time, and timeUnix which are used to represent this window. Maybe the confusion arose because the the data for 00:00:00-00:00:01 was not sent immediately at 00:00:01. Rather it is sent between 00:00:01-00:00:02, but this is more of when the data is exported. Related issues, |
I think there are a couple of things that confuse me here. Let me try to clarify how I understand it works and then correct what I misunderstand:
I think the confusion stems from the fact that to force the alignment to a certain time bucket in the export structure (i.e. the fact that the OTLP message timestamps are always truncated by using the step size) you must force the export to happen on that same schedule. (That would defeat the purpose of spacing them out to reduce load, because all exports for all instances would happen at 00:01:30, 00:02:30 and so on) What I would expect to happen is the following:
Now, the timestamps actually align with when the data was collected. This gives the backend a chance to determine how to book that data. Today, this decision is made on the client side, and there is no way for the backend to know that the data that arrives with certain timestamps doesn't actually come from that timeframe. I understand that this means that the start/end timestamps will differ for each running app instance. However, the timestamps created today don't actually represent when the data was collected unless I am missing the step where the data recorded in Micrometer is actually aligned to the bucket boundaries that are exported in the OTLP message. Recordings are put into the timestamp range, even if they were not recorded in that timestamp range (i.e. the data recorded at 00:02:10 will be put into the 00:01:00 to 00:02:00 bucket). |
You are partially right. The publishing part is more or less what you said. But the key differences are as follows (I am assuming that the initial calculated delay is 30 seconds as mentioned in your example),
The first export will only contain data for (00:00:00, 00:01:00] with timestamps startTime=00:00:00; time=00:01:00. (But the export starts at 00:01:30)
The second export will only contain data for (00:01:00, 00:02:00] and it will export with timestamp startTime=00:01:00; time=00:02:00. (But the export starts at 00:02:30) Hope the above example makes sense. Now this addresses all the problems discussed here,
|
I don't understand how it is ensured that the value that is polled from the Micrometer instrument at 00:01:30 only contains data for 00:00:00 to 00:01:00. |
Got it. This is done by a polling service in StepMeterRegistry (for OTLP this is slightly different on how histograms are handled but they follow the same concept). There might be a skew of a few initial milliseconds based on the number of meters in the registry but this will be very very minimal in the order of a few microseconds per meter. |
Ah, there is a completely separate background thread and loop here; I missed that. So, at the beginning of each 1-minute bucket (e.g., at 00:01:00 + 1ms), the And the StepValue also separately calculates the beginning of the export interval when it is initialized. Thanks for explaining that; it wasn't straightforward for me. |
Ok it seems then it is all working as expected. Feel free to close this issue then. Thanks! |
Describe the bug
When configuring the OTLP registry with Delta temporality, the exported metrics have timestamps with "full minutes" - no seconds/milliseconds. I believe this is a bug because we are actually losing information when converting OTLP delta metrics.
Example with cumulative:
Produces this OTLP Data (Collector output)
Example with Delta:
Produces this OTLP Data (Collector output) - Seconds are all 0s
Environment
1.12.5
OTLP
openjdk 11.0.11 2021-04-20
To Reproduce
Example app code as above
Expected behavior
When configured with Delta timestamps should also contain seconds and milliseconds
Additional context
I looked at the code, and I noticed that in this line https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-otlp/src/main/java/io/micrometer/registry/otlp/OtlpMetricConverter.java#L67
It divides the wall time by the step (which is 1 min by default I believe), which probably is causing the "truncation" of the timestamps. Was there any reason why we want the timestamp for delta to align with the export interval? I imagine we could just take the wall time and transform it to nanoseconds?
The text was updated successfully, but these errors were encountered: