Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redfishpower: recognize hierarchies / pre-requisites #81

Closed
chu11 opened this issue Jan 23, 2024 · 27 comments
Closed

redfishpower: recognize hierarchies / pre-requisites #81

chu11 opened this issue Jan 23, 2024 · 27 comments
Assignees

Comments

@chu11
Copy link
Member

chu11 commented Jan 23, 2024

@watson6282 mentioned in chat

it would be good / useful / necessary for powerman to recognize hierarchies or pre-requisites for turning a node on and off.

The most obvious example is a bladed system. Nodes within the bladed system are impossible to turn on/off if the chassis they are in is turned off. It will inevitably lead to unnecessary messages / timeouts / powerman slowness.

@chu11
Copy link
Member Author

chu11 commented Feb 3, 2024

had a mini-epiphany when chatting with @jf6b, this may be a difficult infrastructure thing within powerman, but it would be far more doable in redfishpower if we wanted to support within redfishpower as a "one off support" of this functionality.

it would be non-optimal to configure "pre-requisites" within a redfishpower conf, but may be easier to do so (may need conf file for another feature anyways #94)

@garlick
Copy link
Member

garlick commented Feb 3, 2024

Good idea! Do our cray systems use redfish for both blade and chassis power control?

This might be pretty simple to implement even without a config file. For example, assume one redfishpower device instance per chassis and add a command line option to set the chassis plugname, e.g.

device "foo-chassis0" "redfish" "/usr/bin/redfishpower --chassis foo-chassis0 -h foo[0-15] |&"
device "foo-chassis1" "redfish" "/usr/bin/redfishpower --chassis foo-chassis1 -h foo[16-31]] |&"
...
node "foo-chassis0,foo[0-15]" "foo-chassis0" "foo-chassis0,foo[0-15]"
node "foo-chassis1,foo[16-31]" "foo-chassis1" "foo-chassis1,foo[16-31]"

Or if that's too many redfishpower instances maybe we could let --chassis accept a hostlist and add a --blades-per-chassis option?

Edit: or let free arguments each represent the blade hostlist for a chassis, so each chassis could be custom populated

--chassis=foo-chassis[0-1] foo[42,1-7] foo[43,9-15]

Then how do we want it to work? Just the following obvious things?

  • when querying a blade status, check the chassis status first and if off, return off for all blades immediately
  • when powering on a blade, check the chassis status first, and if off, return an error immediately
  • when powering off a blade, check the chassis status first and if off, return success immediately

@chu11
Copy link
Member Author

chu11 commented Feb 3, 2024

Good idea! Do our cray systems use redfish for both blade and chassis power control?

I believe so. Although I gotta double check.

Or if that's too many redfishpower instances

Yeah, that was my initial feeling, that it'd be too many redfishpower instances.

Edit: or let free arguments each represent the blade hostlist for a chassis, so each chassis could be custom populated

Oh, I think that could work. Although the command line would get long. But that should be trivial to add into a config file instead.

Then how do we want it to work? Just the following obvious things?

good question, in my head I was only going to check the parent, and if the parent was "off", do not perform the power operation and return "parent off" or something, but returning "off" instead is also reasonable.

I'll ping the admins, see what they think.

Edit: thinking about this a tad more, it perhaps is going to be based on the semantics of "hierarchy" vs "pre-req" and what we think this is. If I think of it more like a "pre-req", if pre-req not met, it should be error. But if it's a "hierarchy", then the parent status "carries" to the child status.

Edit2: ohh, extending this ... if we think of this as a "hierarchy", should "pm --on rack8" turn on nodes N-M?

@garlick
Copy link
Member

garlick commented Feb 3, 2024

Although the command line would get long. But that should be trivial to add into a config file instead.

IMHO configuring everything in the powerman.conf as opposed to in two places, would be better design.
Edit: auth tokens being a special case

it perhaps is going to be based on the semantics of "hierarchy" vs "pre-req" and what we think this is. If I think of it more like a "pre-req", if pre-req not met, it should be error. But if it's a "hierarchy", then the parent status "carries" to the child status.

Don't over think it, especially if we're not solving the general case in powerman :-) If the chassis is off, the blade is truly off.

@garlick
Copy link
Member

garlick commented Feb 3, 2024

We should get the actual chassis configuration and dummy this up in test. It'll be easier to think about a concrete case, and important if we're doing this is a one-off anyway.

@garlick
Copy link
Member

garlick commented Feb 3, 2024

If we think of this as a "hierarchy", should "pm --on rack8" turn on nodes N-M?

I would say no we don't do racks unless there is power control at the rack level.

If the problem you're proposing to solve there is to have a way to conveniently name a group of nodes, there are two other ways to do that already

  • plug aliases which allow multiple plugs to be grouped in an alias, added for the case when a "node" has redundant power supplies and you want to control them as a unit
  • genders, e.g. pm -g -1 rack4, where rack4 is a genders attribute that maps to a group of nodes

@chu11
Copy link
Member Author

chu11 commented Feb 5, 2024

FWIW the concept was relatively simple to add to redfishpower: https://github.com/chu11/powerman/tree/redfishpower_prequery

There's a dumb test_status_first variable in there that will check the node status before doing stat/on/off and will error out if it is off. Which is of course dumb to test the status of the node before you do the operation, except for testing purposes.

Configuring things so it knows to test_status_first and to do it to the correct hostname is the TBD configuration part.

@garlick
Copy link
Member

garlick commented Feb 5, 2024

Cool! Except I think command line would work out better that a separate config file (thus letting the powerman.conf be the config file)

@chu11
Copy link
Member Author

chu11 commented Feb 5, 2024

Cool! Except I think command line would work out better that a separate config file (thus letting the powerman.conf be the config file)

My one concern with this is that for huge systems, the command line will get super long and possibly annoying to maintain?? Maybe it's not a deal breaker.

@garlick
Copy link
Member

garlick commented Feb 5, 2024

For pretend, think of a chassis with 8 slots, and 16 of those in a rack, and 64 racks for 8192 nodes:

You could do one redfishpower instance per rack easily, and each command line would need to look something like

redfishpower --chassis[0-15] foo[0-7] foo[8-15] foo[16-23] foo[24-31] foo[32-39] foo[40-47] foo[48-55] foo[56-63]

A self respecting sys admin would write a script to generate that. Meh, it's not crazy?

@chu11
Copy link
Member Author

chu11 commented Feb 5, 2024

A self respecting sys admin would write a script to generate that. Meh, it's not crazy?

Yeah, but we'd have 64 redfishpowers running. I guess that's not an insane number, but we'd prefer to keep it down. Using your example below, lets say we got it down to 16 co-processes.

redfishpower --chassis[0-63] foo[0-7] foo[8-15] foo[16-23] foo[24-31] foo[32-39] foo[40-47] foo[48-55] foo[56-63] foo[64-71] foo[72-79] foo[80-87] foo[88-95] foo[96-103] foo[104-111] foo[112-119] foo[120-127] foo[128-135] foo[136-143] foo[144-151] foo[152-159] foo[160-167] foo[168-175] foo[176-183] foo[184-191] foo[192-199] foo[200-207] foo[208-215] foo[216-223] foo[224-231] foo[232-239] foo[240-247] foo[248-255]

ehhh I'll ping some people, see what they think

@garlick
Copy link
Member

garlick commented Feb 5, 2024

Yes but powerman can only do one thing at a time per device, so more is generally better...

@chu11
Copy link
Member Author

chu11 commented Feb 6, 2024

Yes but powerman can only do one thing at a time per device, so more is generally better...

Are you referring to the fact powerman only sends one node to the on device script? Recall that redfishpower is configured with on_ranged, so it gets sent the range of hosts to power stat/on/off and redfishpower handles the parallelism.

@chu11
Copy link
Member Author

chu11 commented Feb 6, 2024

trying to prototype pre-reqs config on command line and it's a little annoying given the fact the --hostname option is already there. Perhaps something like

-h nodes[0-15]#chassis0 -h nodes[16-31]#chassis1 ...

Note that I avoided using :: as a separator b/c that can conflict w/ IPv6 addresses. I figured "//" would like like a URI. It could be any reasonable separator of course.

@garlick
Copy link
Member

garlick commented Feb 6, 2024

Are you referring to the fact powerman only sends one node to the on device script? Recall that redfishpower is configured with on_ranged, so it gets sent the range of hosts to power stat/on/off and redfishpower handles the parallelism.

No I mean powerman sends one command to redfishpower and then can't send another one until that command is complete. If there are multiple instances, powerman processes all the expect/send scripts in parallel.

@garlick
Copy link
Member

garlick commented Feb 6, 2024

I'd suggest we call this a "power control hierarchy" and say that redfishpower supports a 2-level hierarchy. Pre-req seems like it conveys less about what is going on (and is a more open ended concept).

Regarding the options, I'd suggest that if -h hosts is present, it just be considered the first of the free arguments, so these pairs are equivalent:

redfishpower --hostname foo[0-16]
redfishpower foo[0-16]
redfishpower --chassis c[0-1] --hostname foo[0-16] foo[17-31]
redfishpower --chassis c[0-1] foo[0-16] foo[17-31]

@chu11
Copy link
Member Author

chu11 commented Feb 6, 2024

Hmmmm, I guess one of the issues is if the user does:

redfishpower --hostname foo[0-16] --chassis c[0-1] foo[17-31]

we'd have to keep track of the order of the "chunks" input by the user. I guess it's not too hard to do, although a tad annoying.

Perhaps it'd just be easier to say it's an error and you can't specify --hostname and --chassis?

@garlick
Copy link
Member

garlick commented Feb 6, 2024

Works for me!

@chu11
Copy link
Member Author

chu11 commented Feb 6, 2024

hmmmmm .... is it a fair assumption in a cluster that the parents would divide evenly into the hosts? i.e.

redfishpower -h foo[0-15] --parent chassis[0-1] means foo[0-7]'s parent is chassis0 and foo[8-15] is chassis1? Would make things super easy.

In the event there are a few oddballs, could be started up by a separate redfishpower.

@garlick
Copy link
Member

garlick commented Feb 6, 2024

Well I think we should ask what's easiest for the admins and then make it work. If not breaking existing options is tricky, then one idea would be to create another executable like bluefish that is just for blade servers. It could share lots of code with the original but take different options.

Edit: powerfish 😁

@chu11
Copy link
Member Author

chu11 commented Feb 6, 2024

as I continue to prototype it occurred to me:

knowing how these URIs work for redfishpower, there's a non trivial chance that the admins would setup a different redfishpower for each blade "offset", so hypothetically in powerman.conf

redfishpower -h foo[0,8,16,32] |&
redfishpower -h foo[1,9,17,33] |&
redfishpower -h foo[2,10,18,34] |&

b/c the URI for the first set of blades is <somepath>/1/<path>, and the second set is <somepath>/2/<path>, etc.

This makes me wonder if we need some massive config file where users can configure the set of URIs for each node they configure.

But for the time being, it made me realize the config

redfishpower -h foo[0,8,16,32] --parent=chassis[0-3] |&
redfishpower -h foo[1,9,17,33] --parent=chassis[0-3] |&
redfishpower -h foo[2,10,18,34] --parent=chassis[0-3] |&

sort of works out. The config style redfishpower --chassis c[0-1] foo[0-16] foo[17-31] probably wouldn't work out on average.

@garlick
Copy link
Member

garlick commented Feb 7, 2024

I hadn't realized that the redfish "device" specification only defines one plug. Could we solve the above by defining a device spec for one chassis and then do plug substitution in the URIs?

@garlick
Copy link
Member

garlick commented Feb 7, 2024

OH I just realized the hostnames are the plugs. Well, then the hostname's index in the hostlist for that chassis?

@chu11
Copy link
Member Author

chu11 commented Feb 7, 2024

I hadn't realized that the redfish "device" specification only defines one plug. Could we solve the above by defining a device spec for one chassis and then do plug substitution in the URIs?

perhaps we should open a separate issue on this. I'm not even sure how something like this could work.

@chu11
Copy link
Member Author

chu11 commented Feb 7, 2024

Per discussion in #126 / #128, when trying to power on a node where the chassis is "off", results in a "bad request" (400) error that returns immediately.

In combination with #79, this may lesson the immediate need for a "check parent" option in redfishpower, as an error returned from powerman would make it less unclear why there was a problem.

But a "parent" check does have usefulness for future consideration:

  • if "stat" of the chassis parent returns "off", we don't have to send a "stat" to every child to verify its status. That's a lot less messages.
  • similar if trying to power off/on blades and the "parent chassis" is off, we save a bunch of messages
  • we can return a nicer error message than "bad request"

@chu11
Copy link
Member Author

chu11 commented Feb 24, 2024

siphoning off discussion from #142 into here, as what I originally saw as different issues / features may be far more similar than I originally thought.

There are really 3 possible scenarios to handle

A) pm --on node0 - should not work if parent is off.

B) pm --on blade0,node0 - turn on blade0 before node0 if blade0 is parent

C) pm --on --parents node0 - turn on parents of node0 before turning on node0

A) is the critical one to support

B) is a nice to have, has some pro/cons per discussion in #142 Far easier to support than I imagined, because effectively instead of waiting for a "stat" to finish, we wait for an "on" to finish.

C) the is the big debatable one. The ability to pass an option from powerman client to redfishpower currently does not exist, so we would have to determine if we want to do this by default or not. Which if we do by default, conflicts with A. I think we hold off and do this later.

Also note that C is trivial if B is solved, just some simple "parent" lookups to build up what we should turn on.

There are some "off" differences which I'm not going to go into detail here.

@chu11 chu11 self-assigned this Feb 26, 2024
@chu11 chu11 changed the title powerman: recognize hierarchies / pre-requisites redfishpower: recognize hierarchies / pre-requisites Feb 29, 2024
@garlick
Copy link
Member

garlick commented Apr 11, 2024

Should have been closed by #164 (reopen if not!)

@garlick garlick closed this as completed Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants