-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large suites: light weight task proxies. #1689
Comments
Eliminate duplication of runtime settings? Default settings are now shared by task proxies, by means of intervening in the dict look-up mechanism: if a requested item doesn't exist, look it up in the shared defaults dict (#1500). Inherited settings could also be shared, not duplicated across the inheriting tasks (I think we're still doing that...). Could this be done like the defaults, or is storing references to the same data enough?. However, the minimum size with all possible sharing of data is still potentially very large: it is the amount of information under |
Load runtime settings only at job submit time? This is clearly the right thing to do given that (a) the minimum size of all task runtimes is potentially very large; and (b) it is not needed by the suite daemon for any purpose other than job submission. Choices:
Therefore, by a process of unassailable logic I propose that we implement 1. above. 😬 Possible caveat, next comment below. (Note that #1428 which - in effect - dropped all task runtime info immediately after job submission demonstrated a factor of six reduction in memory use for a large suite with a lot of runahead; however, the implementation there had some negative consequences that this proposal does not have, e.g. on monitoring and the ability to re-trigger tasks that have already finished. It may be that optimal sharing of task runtime data could also have achieved a big reduction here.) |
Possible caveat: can the task runtime conf files be generated incrementally at start-up without loading the entire runtime configuration - without defaults - into memory first, for inheritance processing? If not, is brief high memory use at start-up that much better than ongoing high memory use? The disk-based solution would still be much simpler (no need to bother with the extra complexity of ensuring optimal sharing of all settings). |
(this is in fact rather an old idea: #108 (comment)) |
With your proposed solution 1, are we going to end up with a new small file per task/job? The only concern is that it increases inode usage on the file system, (and some file systems are very unfriendly to lots of small files), but maybe it does not matter. (On similar note, but unrelated to this issue, perhaps we should move the |
No, it'll just be one new small file for each task |
Sounds like a good compromise. |
To get the full benefit of reduced task proxy size, we need to avoid the initial memory high water mark caused by parsing the suite in its entirety (which includes all the task proxy runtime info). Even when this data is garbage collected after writing the new task proxy runtime config files, the memory may not be returned to the OS by the Python interpreter (although it will be re-used internally) - i.e. the external "resident memory size" of the suite daemon may not go down after suite parsing. So, we have agreed on the following (via email): low-memory suite parsing design (Noting that when a process finishes, all of its memory is released to the OS).
|
[meeting]
|
#3515 (spawn on demand) does not address this issue but will reduce task pool size so much that it may not be relevant anymore. But before closing this we should consider the ideas above for reducing the memory footprint due to parsing the suite at start-up? |
A major limiting factor for very large suites is memory use by task proxies. This issue is about minimizing the size of task proxies (the total number of them matters too, but less so if they become very small).
Task proxies have two distinct purposes:
Clearly we need to either eliminate all duplication of runtime settings that could be shared by task proxies or only load runtime settings when needed at job submit time, then immediately forget them again.
The text was updated successfully, but these errors were encountered: