Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus receiver: log error message when process_start_time_seconds gauge is missing #969

Closed
nilebox opened this issue May 14, 2020 · 2 comments · Fixed by #1921
Closed
Assignees
Milestone

Comments

@nilebox
Copy link
Member

nilebox commented May 14, 2020

Prometheus receiver supports the flag use_start_time_metric: true.
When this flag is enabled, every Prometheus endpoint must have the process_start_time_seconds gauge, e.g.

# TYPE process_start_time_seconds gauge
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
process_start_time_seconds 1508230997

When this metric is present, it will set the startTime in

} else if b.useStartTimeMetric && metricName == startTimeMetricName {
b.startTime = v
}

and will be used for adjusting time in metrics - this is expected behavior.

But if the process_start_time_seconds gauge is missing, startTime will have the default zero value.

The problem is that during transaction commit phase, we perform the check

// AdjustStartTime - startTime has to be non-zero in this case.
if tr.metricBuilder.startTime == 0.0 {
metrics = []*metricspb.Metric{}
droppedTimeseries = numTimeseries

which silently drops all metrics and updates the local variable droppedTimeseries that is never used.

As a result, debugging this corner case is a nightmare, and the only way to discover all of this is to use Go debugger (that's how I found this issue).

At the very least, we should log a message when metrics get dropped.
I would suggest using WARN level for this specific issue, but if it's considered normal for some situations, we should print INFO or DEBUG message.


Is there also some other existing way of reporting counters like droppedTimeseries?

@nilebox
Copy link
Member Author

nilebox commented May 14, 2020

@dinooliva as the author of PR #394 which introduced this flag, you may have more context on this, and share your opinion?

@rf232
Copy link
Contributor

rf232 commented Aug 3, 2020

I also got hit by this and second @nilebox' desire for at least some kind of logging in this situation and maybe adding this to some exposed metric

@nilebox nilebox self-assigned this Aug 3, 2020
@bogdandrutu bogdandrutu added this to the Backlog milestone Aug 4, 2020
tigrannajaryan pushed a commit that referenced this issue Oct 14, 2020
…1921)

Report error via obsreport.EndMetricsReceiveOp and return error in transaction Commit()

Fixes #969
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
* Consider renaming Infer to Any. Any is a commonly used concept in Go.
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this issue Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants