-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redfishpower: recognize hierarchies / pre-requisites #81
Comments
had a mini-epiphany when chatting with @jf6b, this may be a difficult infrastructure thing within powerman, but it would be far more doable in redfishpower if we wanted to support within redfishpower as a "one off support" of this functionality. it would be non-optimal to configure "pre-requisites" within a redfishpower conf, but may be easier to do so (may need conf file for another feature anyways #94) |
Good idea! Do our cray systems use redfish for both blade and chassis power control? This might be pretty simple to implement even without a config file. For example, assume one
Or if that's too many Edit: or let free arguments each represent the blade hostlist for a chassis, so each chassis could be custom populated
Then how do we want it to work? Just the following obvious things?
|
I believe so. Although I gotta double check.
Yeah, that was my initial feeling, that it'd be too many redfishpower instances.
Oh, I think that could work. Although the command line would get long. But that should be trivial to add into a config file instead.
good question, in my head I was only going to check the parent, and if the parent was "off", do not perform the power operation and return "parent off" or something, but returning "off" instead is also reasonable. I'll ping the admins, see what they think. Edit: thinking about this a tad more, it perhaps is going to be based on the semantics of "hierarchy" vs "pre-req" and what we think this is. If I think of it more like a "pre-req", if pre-req not met, it should be error. But if it's a "hierarchy", then the parent status "carries" to the child status. Edit2: ohh, extending this ... if we think of this as a "hierarchy", should "pm --on rack8" turn on nodes N-M? |
IMHO configuring everything in the powerman.conf as opposed to in two places, would be better design.
Don't over think it, especially if we're not solving the general case in powerman :-) If the chassis is off, the blade is truly off. |
We should get the actual chassis configuration and dummy this up in test. It'll be easier to think about a concrete case, and important if we're doing this is a one-off anyway. |
I would say no we don't do racks unless there is power control at the rack level. If the problem you're proposing to solve there is to have a way to conveniently name a group of nodes, there are two other ways to do that already
|
FWIW the concept was relatively simple to add to redfishpower: https://github.com/chu11/powerman/tree/redfishpower_prequery There's a dumb Configuring things so it knows to |
Cool! Except I think command line would work out better that a separate config file (thus letting the powerman.conf be the config file) |
My one concern with this is that for huge systems, the command line will get super long and possibly annoying to maintain?? Maybe it's not a deal breaker. |
For pretend, think of a chassis with 8 slots, and 16 of those in a rack, and 64 racks for 8192 nodes: You could do one
A self respecting sys admin would write a script to generate that. Meh, it's not crazy? |
Yeah, but we'd have 64 redfishpowers running. I guess that's not an insane number, but we'd prefer to keep it down. Using your example below, lets say we got it down to 16 co-processes.
ehhh I'll ping some people, see what they think |
Yes but powerman can only do one thing at a time per device, so more is generally better... |
Are you referring to the fact powerman only sends one node to the |
trying to prototype pre-reqs config on command line and it's a little annoying given the fact the
Note that I avoided using |
No I mean powerman sends one command to redfishpower and then can't send another one until that command is complete. If there are multiple instances, powerman processes all the expect/send scripts in parallel. |
I'd suggest we call this a "power control hierarchy" and say that redfishpower supports a 2-level hierarchy. Pre-req seems like it conveys less about what is going on (and is a more open ended concept). Regarding the options, I'd suggest that if
|
Hmmmm, I guess one of the issues is if the user does:
we'd have to keep track of the order of the "chunks" input by the user. I guess it's not too hard to do, although a tad annoying. Perhaps it'd just be easier to say it's an error and you can't specify |
Works for me! |
hmmmmm .... is it a fair assumption in a cluster that the parents would divide evenly into the hosts? i.e.
In the event there are a few oddballs, could be started up by a separate redfishpower. |
Well I think we should ask what's easiest for the admins and then make it work. If not breaking existing options is tricky, then one idea would be to create another executable like Edit: |
as I continue to prototype it occurred to me: knowing how these URIs work for redfishpower, there's a non trivial chance that the admins would setup a different redfishpower for each blade "offset", so hypothetically in powerman.conf
b/c the URI for the first set of blades is This makes me wonder if we need some massive config file where users can configure the set of URIs for each node they configure. But for the time being, it made me realize the config
sort of works out. The config style |
I hadn't realized that the redfish "device" specification only defines one plug. Could we solve the above by defining a device spec for one chassis and then do plug substitution in the URIs? |
OH I just realized the hostnames are the plugs. Well, then the hostname's index in the hostlist for that chassis? |
perhaps we should open a separate issue on this. I'm not even sure how something like this could work. |
Per discussion in #126 / #128, when trying to power on a node where the chassis is "off", results in a "bad request" (400) error that returns immediately. In combination with #79, this may lesson the immediate need for a "check parent" option in redfishpower, as an error returned from powerman would make it less unclear why there was a problem. But a "parent" check does have usefulness for future consideration:
|
siphoning off discussion from #142 into here, as what I originally saw as different issues / features may be far more similar than I originally thought. There are really 3 possible scenarios to handle A) B) C) A) is the critical one to support B) is a nice to have, has some pro/cons per discussion in #142 Far easier to support than I imagined, because effectively instead of waiting for a "stat" to finish, we wait for an "on" to finish. C) the is the big debatable one. The ability to pass an option from powerman client to redfishpower currently does not exist, so we would have to determine if we want to do this by default or not. Which if we do by default, conflicts with Also note that There are some "off" differences which I'm not going to go into detail here. |
Should have been closed by #164 (reopen if not!) |
@watson6282 mentioned in chat
it would be good / useful / necessary for powerman to recognize hierarchies or pre-requisites for turning a node on and off.
The most obvious example is a bladed system. Nodes within the bladed system are impossible to turn on/off if the chassis they are in is turned off. It will inevitably lead to unnecessary messages / timeouts / powerman slowness.
The text was updated successfully, but these errors were encountered: