Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.0.x: regx/naive: add regx/naive component #6915

Merged
merged 1 commit into from
Aug 30, 2019

Conversation

sam6258
Copy link
Contributor

@sam6258 sam6258 commented Aug 20, 2019

This PR provides a new nidmap regx component that should never fail. We have seen instances where both the fwd and reverse regx components fail and result in launch failure when hostnames are randomly generated (regx/fwd failing hostfile: https://gist.github.com/sam6258/1089ff67eec9882a8029abf4c365f1da).

Adding regx/none provides a new debug path for the other regx components as well as a functional solution when the others fail. This PR is only intended for v4.0.x and v3.1.x since the regx component does not exist in master (now uses zlib) or earlier versions.

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to do what you claim. For a "none" component, all you should be doing is assembling the names of all the nodes into a comma-delimited string, and the same for all the vpids. I therefore don't understand why this code is looking at ranges and stuff - none of that applies here.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 20, 2019

As the regx component stands today, the naming scheme seems to refer to the hostname portion of the regex (fwd/reverse) and hasn't really had an impact on the vpid compression. With the introduction of the none component, there is a bit different view now, so I think it is really up to how we want to deal with it. I also contemplated not compressing the vpid portion of the regex, but decided against it since it should almost always work. If we want to make that change, I'm happy to do it.

@rhc54
Copy link
Contributor

rhc54 commented Aug 20, 2019

I think you misunderstood. My concern is that this component contains code that tracks regex construction for node names as well as vpids. A none component should do neither. Over 50% of this code needs to be removed.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 20, 2019

I think I got what you meant. You mean the portion at the end of the generated string where typically we have something of form 0(4) should be of form 0,1,2,3, correct?

@rhc54
Copy link
Contributor

rhc54 commented Aug 20, 2019

Yes - just remove that code which checks for ranges and replace it with something like (excuse the lack of detail and syntax errors):

loop over orte_node_pool:
    opal_argv_append_nosize(&nodenames, nptr->name);
endloop

*regex = opal_argv_join(&nodenames, ',')

@jjhursey
Copy link
Member

I think the code can be further simplified. It is not tracking the hostname regex, but does use the structure. We should combine the creation of the hostlist from spaning the first two loops into just the first loop.

We want to preserve the vpid regex tracking from the first loop to the third loop so that that integer range is preserved. The integer range regx portion is easier to verify/debug and hasn't shown any issues.

@sam6258 One other feature that would be nice is to add a opal_output_verbose that would print the final output string upon request. I would add that to all of the regx component's _create command. That will help debugging this type of failure in the v4 and v3 series in the future.

Per testing. as far as I know, master did not have a problem with this type of hostfile.
Note also that this is a real hostfile for a test system, as odd as it may look. Instead of trying to patch the other regx components for these branches (which is a bit like whack-a-mole) if we had an option to just not compress the hostnames then we would be fine.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 20, 2019

@rhc54 Perhaps there is a better name for the component, rather than none? Like Josh said, we do want to preserve the vpid regex tracking.

@jjhursey I think I see what you mean with the first two loops. I'll make that change tomorrow morning.

@rhc54
Copy link
Contributor

rhc54 commented Aug 21, 2019

@rhc54 Perhaps there is a better name for the component, rather than none? Like Josh said, we do want to preserve the vpid regex tracking.

"vpidonly"?? I don't know, but "none" is clearly misleading.

Note that you will have a scaling issue here. The cmd line length is limited, and this will now put the full length of every hostname in the job on that cmd line.

@jjhursey
Copy link
Member

How about this:

  • Keep it the none component
  • Default to placing the vpids in a comma-separated list
  • Have an MCA parameter that optionally (if enabled) will place the vpids in the regex format.

This has the advantage that the base functionality is no compression - thus serving its goal of a basic component useful for isolating regex issues and providing a functional path forward for those users. The user can optionally turn on the vpid compression to test if it is that portion of the regex that is being problematic (good for debugging). If it is not problematic for the user then they can leave the MCA parameter on and gain a little savings in the vpid compression.

@rhc54 You are correct about the scaling issue, but not because of the command line. The plm_base_node_regex_threshold MCA parameter (799152e) is used to determine if the regex should be put on the command line or should be communicated. It's likely that the none regex will be longer than that threshold so will be communicated instead of placed on the command line. So the amount of data being communicated could be a problem if you used this at scale, but the goal of this component is to give a functional base path for those where the regex is failing them causing hangs.

I should note that we are debugging a different issue with the > plm_base_node_regex_threshold path. It seems to not be working correctly on the v4.x (and possibly 3.x) branch (regardless of this patch). On master we were able to reduce the value to 1 and it worked fine. On the v4.0.x branch I'm seeing it fail with "ORTE has lost communication with a remote daemon." when I set this value to 1 - even with 3 traditionally named hosts.

A couple of questions about this area of code though:

  • Is this function used to encode anything other than the daemon hostname/vpid for orted launch? Is it used to communicate anything about the job launch?
  • In PMIx we have a similar regex mechanism for the hostnames and process mappings (preg). So far we have not noticed any issue with that framework, but the code structure is similar so we will look at it next. The preg framework is used to encode/decode the hostlist and ppn mapping - so it probably only impacts the clients if they call PMIx_Resolve_peers or PMIx_Resolve_nodes. Is that correct?

@sam6258 sam6258 force-pushed the smiller_regx_none branch from 007b4ff to 9bfe061 Compare August 21, 2019 15:49
@jjhursey
Copy link
Member

We could call this component passthru if you all like that better than none.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 21, 2019

Updated with the following:

  • An mca variable to turn on/off vpid compression (default is off)
  • Simplified logic of first two loops into a single loop.
  • Added a verbose statement that prints the regex. This is useful because we can't get this normally without setting -mca plm_base_verbose, which set the daemons into a debug mode and causes failures after 128 nodes. Having this verbose print will allow us to print the regex at scale.

@rhc54
Copy link
Contributor

rhc54 commented Aug 21, 2019

How about this:

* Keep it the `none` component

* Default to placing the vpids in a comma-separated list

* Have an MCA parameter that optionally (if enabled) will place the vpids in the regex format

Sounds reasonable

You are correct about the scaling issue, but not because of the command line. The plm_base_node_regex_threshold MCA parameter (799152e) is used to determine if the regex should be put on the command line or should be communicated. It's likely that the none regex will be longer than that threshold so will be communicated instead of placed on the command line. So the amount of data being communicated could be a problem if you used this at scale, but the goal of this component is to give a functional base path for those where the regex is failing them causing hangs.

Ah, indeed - I had forgotten about that param.

Is this function used to encode anything other than the daemon hostname/vpid for orted launch? Is it used to communicate anything about the job launch?

I glanced at the code real quick and the answer appears to be "no" - it only gets used when the orted's are launched. If it cannot go on the cmd line, then it gets sent out in the launch message.

The preg framework is used to encode/decode the hostlist and ppn mapping - so it probably only impacts the clients if they call PMIx_Resolve_peers or PMIx_Resolve_nodes. Is that correct?

Afraid not - we use it to generate the proc location info that gets stored in the key-value store. You may not be encountering problems only because the software using PMIx isn't asking it for location info. Note that we only have a "fwd" component in PMIx - not sure what you folks are doing in that area (i.e., if you have your own off to the side). We need to devise a new API or workaround to eliminate regex from PMIx.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 22, 2019

I think this is ready for re-review.

orte/mca/regx/none/regx_none.c Outdated Show resolved Hide resolved
@sam6258 sam6258 force-pushed the smiller_regx_none branch from 9bfe061 to e6523aa Compare August 22, 2019 15:13
@jjhursey
Copy link
Member

Once this PR is approved, we will likely want to cherry-pick it to the v3.1.x series.

For the plm_base_node_regex_threshold issue I mentioned in the comments - we have tracked that to the routed/radix component. Hopefully, a fix will come shortly. That issue is only related to this PR in that it was found when doing scale testing. It is not helped/hurt by this PR.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 22, 2019

Looks like something funny going on with the CI: ‘EC2 (Open MPI EC2 Production Account) - Ubuntu 16.04 (i-0e0a452394d1f4c39)’ is offline

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 22, 2019

bot:retest

Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the name "none" either, because it sounds strange when you say to use regex none mca component, you'd expect there wouldn't be any component loaded for that framework, but instead it's actually loading this (non-default) one.

My question is:
Is it possible to get similar behavior as this component (i.e. no compression) in master without this component?

@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2019

Another option occurred to me. Why don't you create a component that (a) sets plm_base_node_regex_threshold to one so we don't put anything on the cmd line and (b) compresses the input hostnames and vpids instead of generating regex's? This eliminates the regex problem while retaining scalability. Since the threshold ensures it doesn't go on the orted cmd line there is no problem with sending it to the backend where it can be "decompressed".

@gpaulsen There is no problem on master as we strictly compress, which means we don't have a regex problem there. In fact, there is no orte regex framework on OMPI master any more for just that reason.

@jjhursey
Copy link
Member

@rhc54 That is an interesting idea. We could make that a new component in this framework, and maybe advocate for it to be the default since it should work for all scenarios. Are the zlib compression changes in the v4.0.x branch or would we need to bring them over (it shouldn't be hard to bring them over, but I don't know if that is too 'big' of a change for this branch)?

I think we still would like to see a none component like this to help isolate the problem when debugging what might be a regex issue. So I think this PR is good.

@sam6258 mentioned maybe renaming the component to native. I think that accurately represents what's going on. That or keep it as none. I would be fine with either.

Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If folks want to rename the component (now is the time) then feel free to do so and we can all re-review.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 24, 2019

I think there might have be a typo in our communication @jjhursey , I actually meant naive. Since the simplest regex for a string of text is the actual string of text, the naive name seemed to fit.

@gpaulsen gpaulsen added this to the v4.0.2 milestone Aug 26, 2019
@gpaulsen
Copy link
Member

I Added WIP in case @sam6258 wanted to rename before we merge.

Signed-off-by: Scott Miller <scott.miller1@ibm.com>
@sam6258 sam6258 force-pushed the smiller_regx_none branch from e6523aa to 1b0cfdf Compare August 26, 2019 15:58
@sam6258 sam6258 changed the title v4.0.x: regx/none: add regx/none component v4.0.x: regx/naive: add regx/naive component Aug 26, 2019
@hppritcha
Copy link
Member

hppritcha commented Aug 26, 2019

This PR was discussed at our RM meeting and we decided that the recommendation to add a zlib component that does compress/decompress would be a better solution. See @rhc54's comments about this approach. RMs also discussed changing the default regx component to use and came to the conclusion that it would be dangerous to switch default regx component within a release stream.

@hppritcha hppritcha closed this Aug 26, 2019
@hppritcha
Copy link
Member

reopening.

@hppritcha hppritcha reopened this Aug 27, 2019
@jsquyres
Copy link
Member

Discussed on the 2019-08-27 webex: @jjhursey makes a good point that this is an excellent debugging / lowest-common-denominator plugin. There's also nothing that says that we can't do both this one and a zlib component (e.g., in a future PR).

I.e., let's re-open / merge this PR. It can be an open feature request to have a regex/zlib RP for some future v4.0.x release.

@sam6258
Copy link
Contributor Author

sam6258 commented Aug 27, 2019

@jsquyres Sounds good to me. I've started work on a compress component already, so I'll work to carry it to completion.

@jsquyres
Copy link
Member

@sam6258 Thanks! Mind if you call it zlib? It's a little more explicit that way.

Also, could you file a feature request issue for target:v4.0.x, just so that we can track this? It won't block the release of v4.0.2, but it'll get in the v4.0.x train whenever it gets in.

@gpaulsen
Copy link
Member

@hppritcha okay to merge, when you are.

@hppritcha hppritcha merged commit 989461f into open-mpi:v4.0.x Aug 30, 2019
@sam6258 sam6258 deleted the smiller_regx_none branch August 31, 2019 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants