-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inputs.slurm): Add a SLURM input plugin #15700
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for the issue and PR! Looks like tests failed as you will ned to run go mod tidy
. I have put some initial observations below, let us know if you have any questions.
Thanks again!
edit: and some linter issues:
plugins/inputs/slurm/slurm.go:42:68 unparam (*Slurm).createHTTPClient - result 1 (error) is always nil
plugins/inputs/slurm/slurm.go:84:10 perfsprint fmt.Errorf can be replaced with errors.New
plugins/inputs/slurm/slurm.go:140:2 gocritic rangeValCopy: each iteration copies 928 bytes (consider pointers or indexing)
plugins/inputs/slurm/slurm.go:145:20 perfsprint fmt.Sprintf can be replaced with faster strconv.Itoa
plugins/inputs/slurm/slurm.go:255:69 bodyclose response body must be closed
plugins/inputs/slurm/slurm.go:261:48 errorlint non-wrapping format verb for fmt.Errorf. Use `%w` to format errors
plugins/inputs/slurm/slurm.go:267:72 bodyclose response body must be closed
plugins/inputs/slurm/slurm.go:273:48 errorlint non-wrapping format verb for fmt.Errorf. Use `%w` to format errors
plugins/inputs/slurm/slurm.go:279:74 bodyclose response body must be closed
plugins/inputs/slurm/slurm.go:285:49 errorlint non-wrapping format verb for fmt.Errorf. Use `%w` to format errors
plugins/inputs/slurm/slurm.go:292:17 bodyclose response body must be closed
plugins/inputs/slurm/slurm.go:298:54 errorlint non-wrapping format verb for fmt.Errorf. Use `%w` to format errors
plugins/inputs/slurm/slurm.go:305:17 bodyclose response body must be closed
plugins/inputs/slurm/slurm.go:311:56 errorlint non-wrapping format verb for fmt.Errorf. Use `%w` to format errors
PS You can mark the conversations as resolved when you applied the suggestions (I cannot do this myself) |
Whops! I thought you could do that yourself... I went ahead and resolved all the conversations to which you reacted with 👍. It looks like the last one remaining is the one on the Thanks a ton for letting me know about having to resolve conversations myself! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your efforts!
We just tackled a single occurrence of the `bodyclose` error to try and see if we can fix it like we did... Judging by the implementation of the `Execute()` method the actual body of the request should be closed, but maybe the linter is not seeing it clearly...
TODO: handle `panic()`s within `Gather()`.
Instead of explicitly checking for `nil` pointers on each assignment we'd rather recover from `panic()`s caused by `nil` pointer dereferences to centralize the management of this type of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor markup remarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your update @pcolladosoto! I tried to clarify some of your questions, see my comments. However, the more I think about the auto-typing of the "tres" entries, the more I think it is a bad idea. Could we please either make really sure we always get the same datatype for a field or default to parsing as float
and explicitly ask the user which fields should be integers?
plugins/inputs/slurm/sample.conf
Outdated
## nodes, partitions, reservations | ||
## Please note incorrect endpoints will be silently ignore and | ||
## that endpoint names are case insensitive. | ||
# ignored_endpoints = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having it swapped is better. Just set the corresponding endpoints in Init()
if the list is empty and be done. This makes it also easier to extend the list and it will show the user what is actually collected.
plugins/inputs/slurm/slurm.go
Outdated
if s.ResponseTimeout == 0 { | ||
s.ResponseTimeout = config.Duration(time.Second * 5) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me help you out here... Please set the ResponseTimeout: config.Duration(5 * time.Second)
in init()
(all lowercase) as this allows the user to overwrite the value in the config and set it to zero.
plugins/inputs/slurm/slurm.go
Outdated
parsedInt, err := strconv.ParseInt(val, 10, 64) | ||
if err == nil && factor >= 1 { | ||
parsedValues[tag] = parsedInt * int64(factor) | ||
continue | ||
} | ||
parsedFloat, err := strconv.ParseFloat(val, 64) | ||
if err == nil { | ||
parsedValues[tag] = parsedFloat * factor | ||
continue | ||
} | ||
parsedValues[tag] = val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me give you an example. Assume we got the following string values for the same tag
(e.g. temperature
):
temperature=23.9
temperature=23.1
temperature=23
The first two ones will clearly produce an error when trying to parse them as an int
so you will get a field named temperature
with a float value for the first two examples. However, the third line perfectly parses as an integer and therefore you will get a resulting temperature
field being aninteger instead of a float. This will lead to type conflicts in e.g. InfluxDB as the same column cannot hold different datatypes.
Thinking more about it, you will have the same issue with
mem=10M
resulting in integer10
mem=1G
resulting in integer1024
mem=100k
resulting in float0.09765625
Thanks a ton for the input! Co-authored-by: Thomas Casteleyn <thomas.casteleyn@me.com>
Hi @srebhan! I'm waiting for the linters to finish up, but if everything passes how are we looking for a merge 🤔? I believe I have addressed all the outstanding concerns, but please do let me know if I left any of the out. Thanks a ton for your time, patience and helpful input 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pcolladosoto thank you very much for the quick update and the kind words. I have some more minor comments, mostly on pre-allocation and checking the endpoint list but nothing really big. I think if those are done, we are ready to merge.
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
A correction was committed before we pushed our already corrected version... Sorry for the noise!
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks @pcolladosoto for all your work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through this with us!
Summary
As set forth in the associated issue, this PR strives to add a new input plugin capable of recollecting data from a SLURM cluster. In our particular use case, this would make gathering and integrating job, node and other metadata into our regular monitoring infrastructure that much easier.
Also, given SLURM implements a REST API interface acquiring all this data doesn't call for parsing a ton of text and is instead a matter of implementing the appropriate clients. This effort has been undertaken in an accompanying repository so as to relocate some of the complexity outside of Telegraf. The repository in question is pcolladosoto/goslurm.
Thank you both for your time and for making Telegraf: it's an awesome tool!
If there are any errors or if there's anything missing I'd like first of all to apologise and second to make you aware that you shouldn't hesitate to contact me to follow up on any issue.
Thanks again!
Checklist
Related issues
resolves #15699