-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/kafkametricsreceiver] do not crash collector on startup when kafka is unavailable #8817
[receiver/kafkametricsreceiver] do not crash collector on startup when kafka is unavailable #8817
Conversation
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
5570505
to
aa83a11
Compare
I'm wondering if we can consider reviewing this PR as-is. There is some work in progress around health reporting that we can take advantage of when it's finished, but his PR stands on its own as an improvement. With these changes, the collector will not crash if kafka is not available during startup, and it will try to reconnect on each scrape until successful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, this is still an improvement, even if it's not applicable to all receivers just yet. Please resolve the conflicts, I'll review shortly
aa83a11
to
e86dd09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an improvement to the current behaviour, scrapers failing to talk to their destinations shouldn't crash the collector. Please address the changelog comment. @dmitryax PTAL
if err := s.setupClient(); err != nil { | ||
return pmetric.Metrics{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way of code reusing seems a bit confusing to me. It sounds like we setup a client on each scrape. If we remove the function it becomes much cleaner IMO
if err := s.setupClient(); err != nil { | |
return pmetric.Metrics{}, err | |
} | |
if s.client != nil { | |
s.client, err := newSaramaClient(s.config.Brokers, s.saramaConfig) | |
if err != nil { | |
return pmetric.Metrics{}, fmt.Errorf("failed to create client in consumer scraper: %w", err) | |
} | |
} |
Another approach to avoid copying this code in all the scrapers would be using a wrapper for the scrape function, then each scraper creation would be:
scraperhelper.NewScraper(
s.Name(),
withClientSetup(s.scrape),
scraperhelper.WithShutdown(s.shutdown),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked both of your suggestions, but went with the first (inlining) as it makes the code slightly easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me. The suggestions proposed by @dmitryax make the pr cleaner as well. 👍
d414813
to
b3b30d2
Compare
b3b30d2
to
ef0e7ed
Compare
…n kafka is unavailable (open-telemetry#8817) * kafkametricsreceiver initialize client in scrape * add changelog entry * more descriptive changelog entry * kafkametricsreceiver: guard against nil client in shutdown * kafkametricsreceiver: inline client creation in scrape
Description:
This PR updates the kafkametricsreceiver so that it will not crash the collector during start if kafka is not available. It pushes the sarama client setup into the
Scrape
method. If kafka is unavailable theScrape
will return an error and try to setup the client during subsequent scrapes until it is successful. This is part of a larger discussion (#8816) on how scrape based receivers should behave when the service they monitor is not available. I'm open to any and all suggestions how this should be handled, and hope to use this as a starting point to come up with best practices that can be applied to other scrape based receivers.Link to tracking Issue:
#8816, #8349
Testing:
Unit tests were updated for the changed behavior. I also did an end to test where I started the collector without a running kafka service. The collector starts and logs errors when the kafkametricsreceiver fails to scrape. I verified that when I start kafka, the receiver successfully connects and starts producing metrics. See the abbreviated log output below: