Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

powerman/redfishpower: report information on hierarchy #153

Closed
chu11 opened this issue Feb 29, 2024 · 8 comments · Fixed by #174
Closed

powerman/redfishpower: report information on hierarchy #153

chu11 opened this issue Feb 29, 2024 · 8 comments · Fixed by #174
Assignees

Comments

@chu11
Copy link
Member

chu11 commented Feb 29, 2024

Feel this is separate from #81.

Once parenting is supported in #81, it can be annoying when you do pm --on foobar and get an error of "ancestor off", and then are like "uhhh what is the parent of foobar"?

Completely throwing this out there, should we have a powerman --hierarchy-info foobar and redfishpower can output

foo
|-> bar
     |-> foobar

or whatever?

Just brainstorming for now.

@chu11
Copy link
Member Author

chu11 commented Mar 1, 2024

similarly it would be convenient to get power information only related to the "root". for example

on:      hetchy1001,hetchy-blade[1-2],hetchy-cmm[1-2],hetchy-jbod[201-202],hetchy-sssw[2-9],phetchy[201-202]
off:     hetchy[201-202,1002-1018],hetchy-blade[3-9]
unknown: 

which ones are related to chassis1 and chassis2? It'd be nice to (hypothetically) do powerman -q --relationship hetchy-cmm1 and have it only output blades/switches/nodes/jbods related to chassis1.

@chu11
Copy link
Member Author

chu11 commented Mar 11, 2024

Now that we have PR #162 that allows almost any diagnostic message to be returned, it would be trivial for an error of "ancestor off" to just be "ancestor off - NODE123" or whatever.

@chu11
Copy link
Member Author

chu11 commented Mar 19, 2024

Now that we have PR #162 that allows almost any diagnostic message to be returned, it would be trivial for an error of "ancestor off" to just be "ancestor off - NODE123" or whatever.

Hmmmm, I experimented with this idea but it didn't work out as nicely as I would have hoped.

A) If you output the ancestor plugname, that's not very informative to the user. i.e. "ancestor off (Enclosure)" ... you really want the host that is off.

B) if you output the ancestor hostname, that's not so informative either b/c all of the blades point to the same host. i.e. "t[0-15]: ancestor off (cmm0)", which isn't helpful, it's really the blade of each one you care about.

C) Could output both. "t[0-1]: ancestor off (cmm/blade0)", "t[2-3]: ancestor off (cmm/blade1)" .... But now we got a lot of extra output and the host + plugname isn't the actual thing to power control (i.e. it's really pblade-102 or whatever).

This might be searching for a solution for a problem that doesn't exist. So gonna hold off on it for now.

@garlick
Copy link
Member

garlick commented Apr 9, 2024

I think the sys admins already indicated that they are comfortable working around the hierarchy, which presumably means they know how chassis naming relates to blade naming and node naming, so this doesn't need to be too detailed IMHO.

But "ancestor" isn't as helpful as something like "cannot power on because cmm is off" or "...because blade1 is off".

@chu11
Copy link
Member Author

chu11 commented Apr 10, 2024

thinking about it a bit this morning, i could just output ancestor powered off host=X plugname=X, the combo of those two is probably more than enough.

I didn't think of that before b/c the former diagnostic output collapsed common output, but since we're doing 1 per line now, then doing the above doesn't matter.

@garlick
Copy link
Member

garlick commented Apr 10, 2024

Ah ok. Maybe s/ancestor powered off/cannot power on because enclosing hardware is off/ or similar.

Ancestor seems a bit abstract to me.

@chu11
Copy link
Member Author

chu11 commented Apr 10, 2024

oh i remember now, it was this generic "ancestor off" b/c technically could be "ancestor errorr" as well. I'll come up with something generic that cover all possibilities.

@adamdbertsch
Copy link

I agree that the sysadmins should be aware that there is hierarchy, although it's possible that someone on call might not be completely aware. But any sort of message the alludes to you needing go turn on a piece of equipment up stream is likely a-ok for this.

@chu11 chu11 self-assigned this Apr 10, 2024
@chu11 chu11 changed the title powerman/redfishpower: get information on hierarchy powerman/redfishpower: report information on hierarchy Apr 10, 2024
chu11 added a commit to chu11/powerman that referenced this issue Apr 11, 2024
Problem: When a power operation cannot be done due to an
issue with the hierarchy, a very general "ancestor off" or
"ancestor error" is returned.  This does not provide alot of
detail to users.

Update the error message to be of the form

cannot perform <op>, dependency <issue> (host=<host>, plug=<plug>)

for example

cannot perform on, dependency off (host=cmm plug=Enclosure)

Update tests for new error message.

Fixes chaos#153
@mergify mergify bot closed this as completed in #174 Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants