-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for openbmc w/ httppower #33
Conversation
Explicitly set post/get before performing a post/get, so that no assumptions are made about implicit settings.
This looks nice @chu11! I guess it's a question for @elgeoman whether he can live with the caveats:
If this is supposed to be used on a system like sierra, then probably not, but if it's a few dozen nodes, maybe? One thing that seems to be missing is a test. See @grondo had a good thought - openbmcpower in python? There are probably some good python modules for managing async REST API's... |
Yup, just wasn't sure if this was merge-able, so was going to add later.
If writing from scratch, I had thought of python as well. But would have to architect in the separate "ping monitor" either way. (Edit: Not suggesting it's a separate program, but rather it may not integrate with a REST API and has to be programmed off to the side.) |
Maybe "ping" could just be an HTTP get or similar? |
Oh, that's a good point. A response doesn't even have to be "valid", e.g. like a 404 error is an acceptable response to a GET/"ping". |
This is a good start for a couple dozen nodes, but the intention is to eventually go much larger than a couple dozen (say like 4K nodes or better). If openbmc happens to support IPMI v.2.0 in the near future, then developing something like openbmcpower may not be necessary. If openbmc decides to dump ipmi for restful interfaces for power control, then the effort to build openbmcpower may be worth it. |
My vote would be to get this PR ready to merge, and open a new issue for an We can get the design sketched out in the new issue and come up with a rough estimate of the effort required to implement, and then have more of a basis for planning the work. |
It's worth mentioning, one other consideration was to have every status / on / off do a full create / destroy a new curl handle and do a full login, on/off, logout. i.e. the dev file might have something like this:
i.e. what's in the login/logout parts of the dev file go into every status / on / off /etc. part But due to:
we'd be doing more http transactions per node and we'd effectively be doing it serially. So I didn't consider this path worthwhile to do (not to mention a fair amount of hacking up of httppower). |
re-pushed with some openbmc tests |
Thanks! One more thing that we should do here is put a note at the top of the
and maybe add a reference to issue #34 By the way, one legit thing to do in the dev script, to work around that first limitation, say for the off script, is to add a delay for the amount of time it typically takes to power off a node plus some fudge factor, then add a status query and fail the command if the node is still on. Powerman should parallel this delay over multiple devices. It means some repetition in the script, but it's preferable to running open loop IMHO. |
@elgeoman do you have the specific name of the hardware and firmware version? I looked around but couldn't find out the exact motherboard name (everywhere just says power9) and rest API doesn't seem to have firmware version call (and openbmctool I don't know user/pw). |
@chu11
|
Eek ... I've been seeing averages in the 25-ish second range. So adding a fudge ... 30 seconds? Too long? I would then have to increase the overall timeout from 30 seconds to something bigger as well.
I'm familiar with the "delay" command, but how would one send a "failure" back to the client safely? Nothing really jumps out at me. |
That is a long time...is the command trying to do a clean OS shutdown or a hard power off? Is there a different command/flag that we should be using to get it done more directly? I'd probably start with the delay + timeout you proposed and then fine tune it until it is reliable.
Just "expect" the desired result (like "off"). If something else is returned (like "on") the expect will hang until the script times out, and the user will get a failure. |
On the upside, should take the same time to power off one node or the whole cluster, since scripts are run in parallel across devices, and we've just got one plug per device. |
It is a "soft power" off, according to openbmc documentation. "hard power" off appears to take the BMC off standby power, so it's really not an option to use. But what's strange is the 20-25 seconds is both on & off. So it's unlikely it's a clean os shutdown kind of thing. |
re-pushed with extra language in the openbmc.dev file and some delays put in. |
No longer true - we'll catch that now, so I'd drop it from the comment. Otherwise LGTM! |
Hehe, I was speaking to the fact that is the "openbmc" behavior. I'll tweak the comment. |
re-pushed with a minor text tweak |
Great! Thanks for the work! |
This is a very simple support of openbmc with httppower. It's supported like this:
While working on this, I realized that scalable and good support of openbmc will be a lot harder / time consuming than originally thought.
httppower was originally written with a single / "global" curl handle. Therefore httppower can't be configured to work with multiple hosts if cookies are involved (since they are per host and
per handle). Thus the one line per host above.
To solve this, httppower would have to be written with some type of hashing algorithm to have a curl handle per host and would parse the host off the url used.
httppower uses the "easy" curl interface, which (AFAICT) is synchronous communication. Using the "hard" curl interface would allow for parallel network transactions.
Similar to how ipmi works, an "on" or "off" request / response is independent of the actual action. So you can get circumstances like this:
The reason FOO is listed as on, is b/c the http response for "off" has been received, but the "off" action has not yet been done. openbmc can perform the "off" action at some point in the future.
In ipmipower, this is solved with options like
--wait-until-on
and--wait-until-off
, which poll until an action is done.httppower does not have any ability to determine if a network interface is down. If powerman were configured for a large number of nodes, we can expect a single node to always be removed for servicing once in awhile. That means that powerman / httppower will constantly time out on pretty much any action such as
powerman --query
.This was solved in ipmipower b/c ipmipower performs some ipmi "pings" in the background, to effectively monitor for nodes disappearing from the network.
It's worth noting that the last two points are more difficult than in ipmipower, b/c httppower has no concept of "power query" or "ping". Those are things defined in the .dev file for every device that uses http. Does this mean it has to be developed into powerman as a feature for all possible devices?
Possible option: develop a "openbmcpower" similar to "ipmipower"?
Possible option: redo "httppower" from scratch to do a lot more stuff. Future consideration for other rest APIs such as Redfish.