-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v4.0.x: regx/naive: add regx/naive component #6915
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to do what you claim. For a "none" component, all you should be doing is assembling the names of all the nodes into a comma-delimited string, and the same for all the vpids. I therefore don't understand why this code is looking at ranges and stuff - none of that applies here.
As the |
I think you misunderstood. My concern is that this component contains code that tracks regex construction for node names as well as vpids. A |
I think I got what you meant. You mean the portion at the end of the generated string where typically we have something of form |
Yes - just remove that code which checks for ranges and replace it with something like (excuse the lack of detail and syntax errors): loop over orte_node_pool:
opal_argv_append_nosize(&nodenames, nptr->name);
endloop
*regex = opal_argv_join(&nodenames, ',') |
I think the code can be further simplified. It is not tracking the hostname regex, but does use the structure. We should combine the creation of the hostlist from spaning the first two loops into just the first loop. We want to preserve the vpid regex tracking from the first loop to the third loop so that that integer range is preserved. The integer range regx portion is easier to verify/debug and hasn't shown any issues. @sam6258 One other feature that would be nice is to add a Per testing. as far as I know, master did not have a problem with this type of hostfile. |
"vpidonly"?? I don't know, but "none" is clearly misleading. Note that you will have a scaling issue here. The cmd line length is limited, and this will now put the full length of every hostname in the job on that cmd line. |
How about this:
This has the advantage that the base functionality is no compression - thus serving its goal of a basic component useful for isolating regex issues and providing a functional path forward for those users. The user can optionally turn on the vpid compression to test if it is that portion of the regex that is being problematic (good for debugging). If it is not problematic for the user then they can leave the MCA parameter on and gain a little savings in the vpid compression. @rhc54 You are correct about the scaling issue, but not because of the command line. The I should note that we are debugging a different issue with the A couple of questions about this area of code though:
|
007b4ff
to
9bfe061
Compare
We could call this component |
Updated with the following:
|
Sounds reasonable
Ah, indeed - I had forgotten about that param.
I glanced at the code real quick and the answer appears to be "no" - it only gets used when the orted's are launched. If it cannot go on the cmd line, then it gets sent out in the launch message.
Afraid not - we use it to generate the proc location info that gets stored in the key-value store. You may not be encountering problems only because the software using PMIx isn't asking it for location info. Note that we only have a "fwd" component in PMIx - not sure what you folks are doing in that area (i.e., if you have your own off to the side). We need to devise a new API or workaround to eliminate regex from PMIx. |
I think this is ready for re-review. |
9bfe061
to
e6523aa
Compare
Once this PR is approved, we will likely want to cherry-pick it to the v3.1.x series. For the |
Looks like something funny going on with the CI: |
bot:retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the name "none" either, because it sounds strange when you say to use regex none mca component, you'd expect there wouldn't be any component loaded for that framework, but instead it's actually loading this (non-default) one.
My question is:
Is it possible to get similar behavior as this component (i.e. no compression) in master without this component?
Another option occurred to me. Why don't you create a component that (a) sets @gpaulsen There is no problem on master as we strictly compress, which means we don't have a regex problem there. In fact, there is no orte regex framework on OMPI master any more for just that reason. |
@rhc54 That is an interesting idea. We could make that a new component in this framework, and maybe advocate for it to be the default since it should work for all scenarios. Are the zlib compression changes in the v4.0.x branch or would we need to bring them over (it shouldn't be hard to bring them over, but I don't know if that is too 'big' of a change for this branch)? I think we still would like to see a @sam6258 mentioned maybe renaming the component to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If folks want to rename the component (now is the time) then feel free to do so and we can all re-review.
I think there might have be a typo in our communication @jjhursey , I actually meant |
I Added WIP in case @sam6258 wanted to rename before we merge. |
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
e6523aa
to
1b0cfdf
Compare
This PR was discussed at our RM meeting and we decided that the recommendation to add a zlib component that does compress/decompress would be a better solution. See @rhc54's comments about this approach. RMs also discussed changing the default regx component to use and came to the conclusion that it would be dangerous to switch default regx component within a release stream. |
reopening. |
Discussed on the 2019-08-27 webex: @jjhursey makes a good point that this is an excellent debugging / lowest-common-denominator plugin. There's also nothing that says that we can't do both this one and a zlib component (e.g., in a future PR). I.e., let's re-open / merge this PR. It can be an open feature request to have a regex/zlib RP for some future v4.0.x release. |
@jsquyres Sounds good to me. I've started work on a |
@sam6258 Thanks! Mind if you call it Also, could you file a feature request issue for target:v4.0.x, just so that we can track this? It won't block the release of v4.0.2, but it'll get in the v4.0.x train whenever it gets in. |
@hppritcha okay to merge, when you are. |
This PR provides a new nidmap regx component that should never fail. We have seen instances where both the fwd and reverse regx components fail and result in launch failure when hostnames are randomly generated (
regx/fwd
failing hostfile: https://gist.github.com/sam6258/1089ff67eec9882a8029abf4c365f1da).Adding
regx/none
provides a new debug path for the other regx components as well as a functional solution when the others fail. This PR is only intended for v4.0.x and v3.1.x since the regx component does not exist in master (now uses zlib) or earlier versions.