Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix supervisord collector #978

Merged
merged 5 commits into from
Aug 6, 2018
Merged

Fix supervisord collector #978

merged 5 commits into from
Aug 6, 2018

Conversation

SuperQ
Copy link
Member

@SuperQ SuperQ commented Jun 18, 2018

  • Use github.com/mattn/go-xmlrpc that doesn't leak goroutines.
  • Use Prometheus best practices for uptime metric.
    • Use "start time" rather than "uptime".
    • Don't emit a start time if the process is down.

Closes: #859

@SuperQ SuperQ requested a review from gouthamve June 18, 2018 09:15

res, err := xmlrpc.Call(*supervisordURL, "supervisor.getAllProcessInfo")
if err != nil {
return fmt.Errorf("unable to call supervisord: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuperQ Shouldn't it be log instead of fmt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we're returning an error type up the stack to the top level collector logger.

@SuperQ
Copy link
Member Author

SuperQ commented Jun 18, 2018

/cc @zerkms I did some testing of supervisord locally, but I don't have a large deployment to test this on.

@@ -98,7 +93,7 @@ func (c *supervisordCollector) isRunning(state int) bool {
}

func (c *supervisordCollector) Update(ch chan<- prometheus.Metric) error {
var infos []struct {
var info struct {
Name string `xmlrpc:"name"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need the field annotations now that you are setting these fields manually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need them, but it helps with understanding the code.


for _, p := range res.(xmlrpc.Array) {
for k, v := range p.(xmlrpc.Struct) {
switch k {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just slightly worried here that field names might change in the future or might not always be present, and then we won't even notice, struct field values will just be subtly wrong. Could we guard against that by somehow checking that all expected fields are present? Simplest would be to have a map[string]struct{} as a set, then explicitly set an entry for a field when it has been seen, then in the end count that len(myStruct) == 9, and if not, return an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm also not so happy about this way of mapping. The original library wrote directly to xml struct so we were sure the fields were mapped.

@juliusv
Copy link
Member

juliusv commented Jun 19, 2018

👍 besides comment

CHANGELOG.md Outdated
@@ -2,11 +2,15 @@

**Breaking changes**

* supvervisord collector reports "start_time_seconds" rather than "uptime"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add this to the recording rules file for backward compatibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@discordianfish
Copy link
Member

Isn't there a working xmlrpc lib which supports type annotations to map the response directly to a struct as before? That would be better.. But LGTM either way.

In general though I feel that supervisord is actually a bit to specific to be part of the node-exporter.. So we might want to consider deprecating it eventually. But if this fixes the issues, let's go with it for now.

@SuperQ
Copy link
Member Author

SuperQ commented Jun 23, 2018

@discordianfish The original one does do correct XML to struct data mapping, but it leaks goroutines/memory.

@discordianfish
Copy link
Member

@SuperQ Bump, should we merge this now?

@juliusv
Copy link
Member

juliusv commented Aug 5, 2018

👍 though the sanity check in https://github.com/prometheus/node_exporter/pull/978/files#r196392075 could be added right now already

@discordianfish
Copy link
Member

Yeah I'm not happy to manually unmarshal this in the first place. But since this is a bug affecting users, I'd rather merge it as-is then waiting longer. Ideally we would replace this by somehing that supports struct annotations for the parsing.. Not sure why this isn't possible here though but no time to investigate right now.

@juliusv
Copy link
Member

juliusv commented Aug 6, 2018

Yeah, I'm ok with merging it, just saying nothing would prevent a len() sanity check right now.

@SuperQ
Copy link
Member Author

SuperQ commented Aug 6, 2018

Yes, I agree about the len() thing. I just haven't had time to work on it.

@juliusv
Copy link
Member

juliusv commented Aug 6, 2018

Well then let's just merge it for now - however, you need to fix a conflict in CHANGELOG.md.

SuperQ added 5 commits August 6, 2018 14:29
Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.

Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Use Prometheus best practices for uptime metric.
* Use "start time" rather than "uptime".
* Don't emit a start time if the process is down.

Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
@SuperQ SuperQ force-pushed the superq/supervisord branch from 6b35e2e to cee6cf9 Compare August 6, 2018 12:30
@SuperQ SuperQ merged commit 5d23ad0 into master Aug 6, 2018
@SuperQ SuperQ deleted the superq/supervisord branch August 6, 2018 14:54
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric

* Use Prometheus best practices for uptime metric.
  * Use "start time" rather than "uptime".
  * Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.

Signed-off-by: Ben Kochie <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants