Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

powerman: when status and status_all are defined, use status_all only on full pluglist #156

Merged
merged 5 commits into from
Mar 3, 2024

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Mar 1, 2024

Problem: When selecting a script to use for an action, the "all" version of a script is always used if the action is a query action (e.g. power status query, beacon query, etc.).

This can be problematic in situations where blades/nodes/etc. are not fully populated, such as in a chassis. We do not want to query "all" things in that case.

Solution: Only call the "all" script if it is a query and a singlet version of the script does not exist. For device files where unpopulated nodes may exist, it can define an all script and singlet script for each scenario.

Built on top of #155

@garlick
Copy link
Member

garlick commented Mar 2, 2024

Nice! This probably needs test coverage though.

@chu11
Copy link
Member Author

chu11 commented Mar 2, 2024

Nice! This probably needs test coverage though.

Ahh, yeah. I guess it was sort of there in the bigger PR, but could use some specific tests targeted just to this.

@chu11 chu11 force-pushed the redfishpower_status_selection branch from a97bd5e to 653f335 Compare March 2, 2024 06:12
@chu11
Copy link
Member Author

chu11 commented Mar 2, 2024

re-pushed, added some coverage in t0004-status-query.t.

Edit: oops bash is ok with == but not dash.

@chu11 chu11 force-pushed the redfishpower_status_selection branch from 653f335 to 6793494 Compare March 2, 2024 06:23
@garlick
Copy link
Member

garlick commented Mar 2, 2024

Thanks for the test!

Other things:

  • Title should be more descriptive like, "powerman: when status and status_all are defined, use status_all only on full pluglist"
  • Any existing dev specs that define both should be updated to only define status_all to avoid using formerly dead and probably untested code. See appro-gb2.dev, appro-greenblade.dev and swpdu.dev, all of which contain multiple specifications.
  • This behavior should be explained in powerman.dev(5).

BTW which script does powerman choose when the device spec contains both status and status_all and does not define a plug name list?

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh just realized that this applies to status_beacon_all.
Hmm, looks like there is a status_temp_all but no status_temp?

@@ -145,6 +145,72 @@ test_expect_success 'stop powerman daemon and device server' '
wait &&
wait
'
test_expect_success 'create new powerman.conf with no status and status_all script' '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/no/both/

@@ -683,8 +683,12 @@ static int _enqueue_targeted_actions(Device * dev, int com, hostlist_t hl,
pluglist_iterator_destroy(itr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

Solution: Only call the "all" script if it is a query and a singlet
version of the script does not exist.

Makes it sound like "all" is never called if the singlet exists. Maybe "only call the all query script if all plugs are requested OR the singlet version of the script is not defined."?

For device files where
unpopulated nodes may exist, it can define an all script and singlet
script for each scenario.

s/nodes/plugs/

@chu11 chu11 changed the title powerman: do not always call "all" query script powerman: when status and status_all are defined, use status_all only on full pluglist Mar 2, 2024
@chu11
Copy link
Member Author

chu11 commented Mar 2, 2024

BTW which script does powerman choose when the device spec contains both status and status_all and does not define a plug name list?

TBH I wasn't sure. Adding some printf debug into _enqueue_targeted_actions() it appears dev->plugs is simply all the plugs defined by node configuration if the plug name list isn't formally listed. And if the user doesn't input anything, the input hostlist hl parameter is everything specified in the config file.

Problem: The "temp" command was missing a "goto ok" statement after
completing the temperature request.  It lead to unexpected output.

Add the missing "goto ok".
@chu11 chu11 force-pushed the redfishpower_status_selection branch from 6793494 to 4ca5822 Compare March 2, 2024 23:37
@chu11
Copy link
Member Author

chu11 commented Mar 2, 2024

re-pushed ... tweaked nit things per comments above and ...

  • added beacon and temperature equivalent script tests, it ends up "status_temp" was never covered in vpcd and there was a corner case in there :P
  • add documentation in powerman.dev(5)
  • removed unusued/untested singleton scripts in device files

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one commit message suggestion.

Comment on lines -47 to -49
script status {
send "pmnode %s\r"
expect "node([0-9]+): (on|off|n/a)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message for this, it would probably be good to mention that the current behavior is changing so that the untested singletons would start to get used, so this change just preserves the existing behavior.

Problem: Current powerman implementation will always
call the "all" status script instead of a "singleton"
status scripts for power status, beacon status, and temperature
status.  In the future this behavior will change.

Some devices specify both "all" and "singleton" status scripts.
This means the "singleton" status scripts are never used
and are untested, but with future changes they will begin to be
used.

Solution: Remove the singleton status scripts in appro-gb2.dev,
appro-greenblade.dev, and swpdu.dev.  This will preserve
current behavior of those device files.
Problem: When selecting a script to use for an action, the "all"
version of a script is always used if the action is a query action
(e.g. power status query, beacon query, etc.).

This can be problematic in situations where blades/nodes/etc. are
not fully populated, such as in a chassis.  We do not want to query
"all" things in that case.

Solution: Only call the all query script if all plugs are requested
OR the singlet version of the script is not defined."?  For device files
where unpopulated plugs may exist, it can define an all script and singlet
script for each scenario.
Problem: There are no tests to see which status script (status vs status_all)
is chosen when both are specified in a device file.

Add coverage in t0004-status-query.t, t0007-temperature.t, and t0008-beacon.t.
Problem: Recent updates altered when an "all" vs "singleton"
status script is called.  However this change is not documented.

Add information on why a user might want to specify both an
"all" and "singleton" status script for power status, beacon status,
and temperature status.
@chu11 chu11 force-pushed the redfishpower_status_selection branch from 4ca5822 to ac7726c Compare March 3, 2024 01:41
@chu11
Copy link
Member Author

chu11 commented Mar 3, 2024

thanks, pushed a commit message tweak and setting MWP

@mergify mergify bot merged commit 7d62cad into chaos:master Mar 3, 2024
8 checks passed
@chu11 chu11 deleted the redfishpower_status_selection branch March 3, 2024 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants