Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support power control to same target twice #205

Open
chu11 opened this issue Sep 12, 2024 · 4 comments
Open

Support power control to same target twice #205

chu11 opened this issue Sep 12, 2024 · 4 comments

Comments

@chu11
Copy link
Member

chu11 commented Sep 12, 2024

User asked for an interesting support. In the el cap environment, two compute nodes are attached to one blade. It can be inconvenient to map a compute node to its blade (eg node1478 maps to blade???)

Idea was map two fictitious node names to same blade, ie blade1477 and blade1478 both power control the same blade.

The primary issue is how to deal with user submitting both nodes at same time to power control (i.e. `pm -0 blade1477,blade1478).

Pondering a bit, I think this may be doable if arglist_find() could return multiple args (i.e. a single power result could update multiple args). The ranged scripts could handle by calling hostlist_uniq() before calling the power control target.

But I bet there’s stuff I’m not thinking about right now.

Edit: minimally, what errors could occur on non-ranged scripts. But I suppose we aren't checking that multiple "targets" don't already point to the same plug. So perhaps it doesn't matter?

@garlick
Copy link
Member

garlick commented Sep 12, 2024

We do have aliases so blade1477 and blade1477 could just map to the same blade name.

I thought the admins told us that they had other ways to do this sort of mapping and that we didn't need to provide support for such stuff in powerman?

@chu11
Copy link
Member Author

chu11 commented Sep 12, 2024

We do have aliases so blade1477 and blade1477 could just map to the same blade name.

That was my first thought. But behind the scenes I didn't know the fallout. For example, if someone did pm -0 blade1477,blade1478 I'm not sure if that would send 1 or 2 power control requests to redfishpower (i.e. does the ranged script call hostlist_uniq() and pass off Blade3 or off Blade[3,3] to redfishpower?).

I thought the admins told us that they had other ways to do this sort of mapping and that we didn't need to provide support for such stuff in powerman?

I think they do, but this is more of a convenience. vs pm -0 $(get-me-the-blade.sh node1478)

comments @watson6282

@garlick
Copy link
Member

garlick commented Sep 12, 2024

FWIW

$ pm -T -1 picl1,picl1
send(picl): 'on 1\n'
recv(picl): 'OK\n'
recv(picl): 'power> '
Command completed successfully

Two hosts named, one command issued.

@chu11
Copy link
Member Author

chu11 commented Sep 16, 2024

as a quick test

listen "localhost:11099"
include "/g/g0/achu/chaos/git/powerman/etc/devices/redfishpower-cray-ex-rabbit.dev"
device "d0" "cray-ex-rabbit" "/g/g0/achu/chaos/git/powerman/src/redfishpower/redfishpower -h cmm0,t[0-15],rabbit --test-mode|&"
node "cmm0,perif[0-4,7],blade[0-7],t[0-15],rabbit" "d0" "Enclosure,Perif[0-4,7],Blade[0-7],Node[0-16]"
alias "tblade0" "blade0"
alias "tblade1" "blade0"
alias "tblade2" "blade1"
alias "tblade3" "blade1"
...

with the exception that blade0 is listed twice in the hostrange output (easily solved with hostlist_unq() call), this actually works.

>src/powerman/powerman -h localhost:11099 -q tblade0,tblade1 -T
send(d0): 'stat Blade0\n'
recv(d0): 'Blade0: off\n'
recv(d0): 'redfishpower> '
on:      
off:     blade[0,0]
unknown: 
>src/powerman/powerman -h localhost:11099 -1 "tblade[0-3]" -T
send(d0): 'on Blade[0-1]\n'
recv(d0): 'Blade0: ok\n'
recv(d0): 'Blade1: ok\n'
recv(d0): 'redfishpower> '
Command completed successfully

I was initially a little surprised it works.

  • internally to arglist it hashes plugs. So for example, when iterating the inputted hosts
Arg *arglist_next(ArgListIterator itr)
{
    Arg *arg = NULL;
    char *node;

    node = hostlist_next(itr->itr);
    if (node != NULL) {
        arg = hash_find(itr->arglist->args, node);
        free(node); /* hostlist_next strdups returned string */
    }

    return arg;
}

each blade0 will return the same result b/c the hash_find() only has one response entry.

  • I think this only works with "ranged" power operations. For "singlet" scripts, I'm not sure it would work b/c duplicates are not handled.

The devil is in the details of course. I'm not sure what fallout there could be by calling hostlist_uniq() before return the response and other shenanigans.

But at a very high level ... I think this just works.

@watson6282 shall I invest more effort in this? Last we chatted it was an idea. Not sure if you really want to pursue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants