-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix supervisord collector #978
Conversation
|
||
res, err := xmlrpc.Call(*supervisordURL, "supervisor.getAllProcessInfo") | ||
if err != nil { | ||
return fmt.Errorf("unable to call supervisord: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SuperQ Shouldn't it be log instead of fmt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we're returning an error
type up the stack to the top level collector logger.
/cc @zerkms I did some testing of supervisord locally, but I don't have a large deployment to test this on. |
@@ -98,7 +93,7 @@ func (c *supervisordCollector) isRunning(state int) bool { | |||
} | |||
|
|||
func (c *supervisordCollector) Update(ch chan<- prometheus.Metric) error { | |||
var infos []struct { | |||
var info struct { | |||
Name string `xmlrpc:"name"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need the field annotations now that you are setting these fields manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need them, but it helps with understanding the code.
|
||
for _, p := range res.(xmlrpc.Array) { | ||
for k, v := range p.(xmlrpc.Struct) { | ||
switch k { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just slightly worried here that field names might change in the future or might not always be present, and then we won't even notice, struct field values will just be subtly wrong. Could we guard against that by somehow checking that all expected fields are present? Simplest would be to have a map[string]struct{}
as a set, then explicitly set an entry for a field when it has been seen, then in the end count that len(myStruct) == 9
, and if not, return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I'm also not so happy about this way of mapping. The original library wrote directly to xml struct so we were sure the fields were mapped.
👍 besides comment |
CHANGELOG.md
Outdated
@@ -2,11 +2,15 @@ | |||
|
|||
**Breaking changes** | |||
|
|||
* supvervisord collector reports "start_time_seconds" rather than "uptime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add this to the recording rules file for backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Isn't there a working xmlrpc lib which supports type annotations to map the response directly to a struct as before? That would be better.. But LGTM either way. In general though I feel that supervisord is actually a bit to specific to be part of the node-exporter.. So we might want to consider deprecating it eventually. But if this fixes the issues, let's go with it for now. |
@discordianfish The original one does do correct XML to struct data mapping, but it leaks goroutines/memory. |
71c9615
to
6b35e2e
Compare
@SuperQ Bump, should we merge this now? |
👍 though the sanity check in https://github.com/prometheus/node_exporter/pull/978/files#r196392075 could be added right now already |
Yeah I'm not happy to manually unmarshal this in the first place. But since this is a bug affecting users, I'd rather merge it as-is then waiting longer. Ideally we would replace this by somehing that supports struct annotations for the parsing.. Not sure why this isn't possible here though but no time to investigate right now. |
Yeah, I'm ok with merging it, just saying nothing would prevent a |
Yes, I agree about the |
Well then let's just merge it for now - however, you need to fix a conflict in CHANGELOG.md. |
Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines. Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Use Prometheus best practices for uptime metric. * Use "start time" rather than "uptime". * Don't emit a start time if the process is down. Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
6b35e2e
to
cee6cf9
Compare
* Replace supervisord xmlrpc library * Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines. * Fix uptime metric * Use Prometheus best practices for uptime metric. * Use "start time" rather than "uptime". * Don't emit a start time if the process is down. * Add changelog entry. * Add example compatibility rules. Signed-off-by: Ben Kochie <superq@gmail.com>
github.com/mattn/go-xmlrpc
that doesn't leak goroutines.Closes: #859