-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mca_base_param_files option is no longer read and should be removed #11532
Comments
Thanks ! I was just struggling with the same issue. The reason is still eludes me. |
Please see the explanation in #11459 |
We should probably remove the code and have it not show up in ompi_info considering it's not being used from mpirun |
Please see #11459 (comment) |
We decided during the weekly telecon that our course of action here is to have a hard abort if the user attempts to use this parameter. |
Created an issue in PRRTE - openpmix/prrte#1731 |
https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20230425 Summary of the issues ^ |
Sure, who do you want me to assign it to? |
Thanks Jeff. Could you assign it to @qkoziol for now? |
After extensive exploration of the issues, PRs, wiki notes, and code for this issue, I believe that the final task at this point is this action item from the (above) wiki page notes:
I'm going to look at that piece of code and see what it's actually doing currently and how to effect the change desired. |
Update from @rhc54 about this not being the correct approach:
|
I've asked Ralph for more details (about whet section of code he believes is involved) and am also investigating in parallel also. |
Jeff chatted w/Ralph earlier today, Ralph is looking at whether this is possibly already addressed w/changes in PRRTE already. If it has, then it may need to be backported to the branch of PRRTE that OMPI is using and the submodule pointer updated. |
So this is pretty much complete, minus my comment above. Specifically, here is how this works: For the case of
This has all been implemented and is in the v4.2.4 tag. Note that I don't currently worry about PRRTE params - imagine this is something I will add at some point. One can specify multiple files in the "param_files" value, so you can combine a file of PMIx-specific values with one from OMPI. For direct launch (e.g., srun): If you care about that, then you would need to check each entry to see if it references a PMIx framework/component. You could copy the code in PRRTE's Feel free to test this and report problems. From my perspective, you can close this issue - or you can leave it open if someone wants to implement the direct launch support. |
Actually, I take that back about direct launch. In that scenario, both OPAL and PMIx are going to read the param files, each extracting what it needs. So there is nothing further to do here. Sadly, it means each proc is opening/reading these files twice - I suppose someone can optimize that if they care. |
Sorry for the noise, but returning to direct launch. The only way to pass param file options in this situation is by envar. This means that the user must set an appropriate Of course, the user could provide both OMPI and PMIx values - and we currently handle that just fine, whether they point to the same or different files. If someone wants to process the OMPI file and handle any PMIx-related values in it, feel free. Another option would be to replicate the OMPI envar with a PMIx prefix and push it into the environment for PMIx to pick up and process. I'm not sure it is worth it myself, but up to you folks. |
Thanks for the detailed analysis @rhc54 ! I'll bring it up at the next OMPI developer meeting to see if the PMIx value aspect should be addressed. |
We discussed this at last week's OMPI developer meeting. The outcome of that discussion was that OMPI should replicate the appropriate envvar(s) with a PMIx prefix and put those back into the environment for PMIx to process. |
I'm not sure I totally grok the comment, so let me just try to see if I can clarify a bit. My prior suggestion only dealt with the What will not be handled are any translations or overlaps of params in those files. For example, an ORTE oob param specifying the network to use for runtime TCP connections will be ignored as the PMIx param has a different name. This may (likely will?) lead to unexpected behaviors when coming from prior OMPI versions. Again, only pertains to direct launch. There are functions in PRRTE for detecting and dealing with these translations and overlaps if you want to copy them. Unfortunately, we cannot just move them to PMIx as it would have to be executed in the pmdl/ompi component, and that gets opened and called too late to affect many of the more common MCA params. |
I'm recording my own notes here, just because I only pop in and out of PRTE/PMIx issues periodically, and the information does not stay resident in my brain cache. This is what I discovered via code diving and talking to @rhc54.
|
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
PRs to OMPI, PRRTE, and OpenPMIX to address the direct launch case: |
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
Dropped the PMIX PR, and updated the other two, based on review feedback. Please re-review. |
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
Address Github issue #11532 by translating legacy parameters for direct launches
OMPI PR merged to main, needs merged to 5.x branch |
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com> (cherry picked from commit 5d236e9) Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
PR for merge to 5.x branch: #11916 |
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com> (cherry picked from commit 5d236e9) Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com> (cherry picked from commit 5d236e9) Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com> (cherry picked from commit 5d236e9) Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com> (cherry picked from commit 5d236e9) Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com> (cherry picked from commit 5d236e9) Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
New new PR for bringing this to the v5.0.x branch: #11923 TIL: Clicking "sync branch" on Github will close any PRs open on that branch, likely due to the branch discarding those commits |
Address Github issue #11532 by translating legacy parameters for direct launches
Addressed with #11923 |
…for direct launches Borrow code from the OMPI schizo module in PRRTE that translates legacy MCA parameters when an application is direct launched (PRRTE will translate legacy parameters when natively launched). Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
ompi_info and the code show an mca parameter as:
However, when I actually provide this parameter, it doesn't seem to be getting picked up.
The text was updated successfully, but these errors were encountered: