-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeping job info in the dstor #144
Comments
Any progress on this? As far as LANL is concerned this bug is a blocker on Open MPI 2.1.0. On knl with 272 ranks per node the wasted space is ~ 272 * nodes * 0xaa0! I can't scale to even 1/8th the machine without an OOM. |
I sent @karasevb an email this morning asking for an update. |
@jjhursey Thanks Josh! Hopefully this gets fixed soon. With Open MPI master I currently see a net increase in node memory usage with the dstore enabled. Will test again once the fix is ready. |
@karasevb Once this is complete it might be worth looking at compressing strings stored in the dstore if they go over a certain length. The
That would compress very nicely even just using libz's deflate function. |
Hmm, I see you just pack the data. We could kill two birds with one stone (storage space AND network usage) by compressing the string in the buffer ops. |
I think a regex might be more appropriate and actually use less space - in this case, the regex generator we already have would have made it as N:0-271, where N is the number of replications. I'd need to look in ORTE at how that is generated as that value doesn't look right to me. The local cpus should only be the local ranks on this node. |
@rhc54 Yeah, that would work as well :) |
BTW, lpeers probably needs to be fixed as well:
|
yeah, no surprise at that |
Compression would still be helpful for strings that can't be fixed. It was trivial to add to buffer_ops. |
Agreed - my concern is only that we look at launch time as well as footprint, as the two often are tradeoffs. Also, we need to be a little careful about what users expect to be handed, and how it is accessed - e.g., we may need to add a flag to indicate "this data has been compressed" so we uncompress it before handing it back. |
I think that compression is an orthogonal solution here let's not mix them. Hopefully we will have this part ready for testing next week. |
@artpol84 Agreed. Just throwing it out there as we need to get the memory footprint down as much as possible. |
Maybe we can open another issue to track the compression of values? Then we can continue the conversation/development there. |
@jjhursey Sure. Will open that now. |
@jjhursey final preparations to RP. Today will be presented. |
@jjhursey sorry, need to fix some of the problems still, it will take some time. |
I re-evaluated the memory footprint as a follow-up of #129. (c) Before "keeping job info in the dstor" (d) Aefore "keeping job info in the dstor" The environment and condition of the evaluation is same as #129. The graph shows memory footprint per node (orted + 16 clients + share memory). Memory footprint of PMIx client processes (between the red line and the blue line in the graph) is largely improved. Thank you for your great work! |
@karasevb Well done!! |
According to recent investigation:
#129 (comment)
job info is not going to the dstore.
We need to make sure it sits there.
The text was updated successfully, but these errors were encountered: