Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redfishpower HPE Cray EX chassis device file #173

Merged
merged 2 commits into from
Apr 11, 2024

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Apr 5, 2024

Built on top of #167 and #172, but those don't have to go in first. #167 is a "pretty nice to have" (would like to have before release), #172 can be debated before release if it should go in.

@chu11
Copy link
Member Author

chu11 commented Apr 8, 2024

re-pushed, its no longer based on #172, it was far less necessary for this PR compared to #167 nevermind, strike that ... i have tests that assume the --diag support :-) ... we can debate if those should be ripped out or not.

@chu11 chu11 force-pushed the redfishpower_hpe_cray_ex branch 2 times, most recently from e5ef2ba to 85d3660 Compare April 9, 2024 19:28
@chu11
Copy link
Member Author

chu11 commented Apr 9, 2024

re-pushed, rebasing on #172. mostly just updating the tests, "diagnostic" error messages are now by default (no --diag), to stderr, and one per line

@garlick
Copy link
Member

garlick commented Apr 11, 2024

The dependent PRs are approved so I'll wait to review this until it's rebased on master after those are merged if that's OK.

@chu11
Copy link
Member Author

chu11 commented Apr 11, 2024

re-pushed, rebasing on master and fixing up tests given new error message output possibilities

@garlick
Copy link
Member

garlick commented Apr 11, 2024

For a start, could we just propose the one spec in this PR, for the fully populated chassis, drop the inline documentation, and shorten the name to "cray-ex" or similar?

@chu11
Copy link
Member Author

chu11 commented Apr 11, 2024

ok, re-pushed per your comments above. We'll split out rabbit config into it's own PR later.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So nice to see this coming together!

Comment on lines 19 to 21
# include "/etc/powerman/redfishpower-cray-ex.dev"
# device "redfishpower" "redfishpower-CrayEX" "/usr/sbin/redfishpower -h -h cmm0,pmynode[0-15] |&"
# node "cmm0,myperif[0-7],myblade[0-7],mynode[0-15]" "redfishpower" "Enclosure,Perif[0-7],Blade[0-7],Node[0-15]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that an extra -h on the device line?

Suggest you don't name the example device "redfishpower" but something more useful like "chassis0".

Finally, since these device lines are going to be repeated quite a lot in a large config, can we shorten the spec name further and drop the capital letters?

E.g. instead of

device "redfishpower" "redfishpower-CrayEX" "/usr/sbin/redfishpower -h -h cmm0,pmynode[0-15] |&"

go with

device "chassis0" "cray-ex" "/usr/sbin/redfishpower -h cmm0,pmynode[0-15] |&"

Comment on lines +76 to +80
send "setplugs Enclosure 0\n"
expect "redfishpower> "
send "setplugs Perif[0-7],Blade[0-7] 0 Enclosure\n"
expect "redfishpower> "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might improve readability a bit if on the first occurrence of setplugs there was a comment listing the positional arguments, e.g.

# setplugs pluglist hostindex [parent plug]

Comment on lines +96 to +100
send "setpath Enclosure,Perif[0-7],Blade[0-7] stat redfish/v1/Chassis/{{plug}}\n"
expect "redfishpower> "
send "setpath Enclosure,Perif[0-7],Blade[0-7] on redfish/v1/Chassis/{{plug}}/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with setpath e.g.

# setpath pluglist {stat|on|off} path

Comment on lines +108 to +112
send "settimeout 75\n"
expect "redfishpower> "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout deserves quick comment too like

# settimeout [seconds] - on/off operations fail on expiration

#

test_expect_success 'create powerman.conf for chassis w/ 16 redfish nodes (crayex)' '
cat >powerman_hpe_cray_supercomputing_ex_chassis.conf <<-EOT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a temp file so maybe we could shorten the name a bit :-)

Comment on lines 110 to 112
#
# redfishpower hpe cray supercomputing ex chassis test
#
Copy link
Member

@garlick garlick Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be its own test script since its likely we'll want to add tests as things come up on the real system. Plus it allows for the tests to run in parallel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, and we can add your "big redfish test" in afterwards. (#127)

@chu11
Copy link
Member Author

chu11 commented Apr 11, 2024

re-pushed per comments above!

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just one more thought.

Comment on lines 23 to 30
# - If your chassis is not fully populated, put placeholder hosts in
# the redfishpower hosts configuration. Adjust plugs when
# configuring specific targets. For example, let's say perif[4-7]
# and blades[4-7] are not populated (thus nodes[8-15] also do not exist).
#
# device "chassis0" "cray-ex" "/usr/sbin/redfishpower -h cmm0,pnode[0-7],unused[0-7] |&"
# node "cmm0,myperif[0-3],myblade[0-3],mynode[0-7]" "redfishpower" "Enclosure,Perif[0-3],Blade[0-3],Node[0-7]"
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be slightly clearer if the example focused on the unused elements of the hostlist. So just depopulate Blade[4-7] hence Node[8-15] and leave Perif[4-7] out of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh one other quick thing. I assume auth USER:PASS is not the published vendor default password. Do we want to put the default in the dev script ? Then we wouldn't require people to immediatley copy the dev script.

It seems like this came up and we found the default on a public web document somewhere? (need to confirm)

Copy link
Member

@garlick garlick Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to check if this is accurate, but if so, it's out there searchable with a google so IMHO is fair game to include in a dev script.

https://support.hpe.com/hpesc/public/docDisplay?docId=a00117940en_us&docLocale=en_US&page=Manage_System_Passwords.html#redfish-credentials

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be slightly clearer if the example focused on the unused elements of the hostlist. So just depopulate Blade[4-7] hence Node[8-15] and leave Perif[4-7] out of it.

I guess in my example I wanted to show how to config if switches aren't populated too. Would it be clearer if I did two examples?

Oh one other quick thing. I assume auth USER:PASS is not the published vendor default password. Do we want to put the default in the dev script ? Then we wouldn't require people to immediatley copy the dev script.

Good point!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in my example I wanted to show how to config if switches aren't populated too. Would it be clearer if I did two examples?

Eh, I guess if you want although that part is just standard powerman stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm gonna punt on the default username/password to a follow up PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I guess if you want although that part is just standard powerman stuff.

Ohhh, i see your point. It's normal powerman conf stuff. makes sense.

@chu11
Copy link
Member Author

chu11 commented Apr 11, 2024

re-pushed, tweaking the one thing commented on above. if we decide to change default username/password will put into a follow up PR.

# are not populated (thus nodes[8-15] also do not exist).
#
# device "chassis0" "cray-ex" "/usr/sbin/redfishpower -h cmm0,pnode[0-7],unused[0-7] |&"
# node "cmm0,myperif[0-7],myblade[0-3],mynode[0-7]" "redfishpower" "Enclosure,Perif[0-7],Blade[0-3],Node[0-7]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device name should be "chassis0" here


# Use port = 11000 + test number
# That way there won't be port conflicts with make -j
testaddr=localhost:11029
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be 11000 + 37 = 11037

Add device file for a HPE Cray Supercomputing EX Chassis.

Fixes chaos#128
Problem: There is no testing for the new HPE Cray Supercomputing
EX Chassis device file.

Add new tests in new t0037-cray-ex.t test file.
@chu11
Copy link
Member Author

chu11 commented Apr 11, 2024

doh! good catches ... re-pushed with those two tweaks above

@mergify mergify bot merged commit 5f76e98 into chaos:master Apr 11, 2024
8 checks passed
@chu11 chu11 deleted the redfishpower_hpe_cray_ex branch April 12, 2024 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants