-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental planner PR #2: twindow replacement #233
Conversation
Codecov Report
@@ Coverage Diff @@
## master #233 +/- ##
==========================================
+ Coverage 55.05% 56.65% +1.59%
==========================================
Files 25 30 +5
Lines 5184 6019 +835
Branches 1168 1389 +221
==========================================
+ Hits 2854 3410 +556
- Misses 1626 1811 +185
- Partials 704 798 +94
Continue to review full report at Codecov.
|
@lipari: I spent sometime today to map out a few initial steps for scheduler-driven aggregate updates. Here is what I sketched -- so that I can get your feedback better if we have a chance to talk next week. resrcStart simple and place the planner-based aggregate structure only on a small set of resource type:
resrc_reqstInstead of computing aggregate requirements of the resrc_reqst by walking its hierarchy at each match-check time, compute node/core aggregate requirements upfront and stick them into resrc_reqst at certain levels:
match_resourceAugment insertion point at which the scheduler updates the aggregate planners right after a job is successfully scheduledI still need to do some hacking to find the appropriate place. I initially envisioned I do this at the job tagging time but I didn't have a chance to look at the code today. Performance testers and target
CorrectnessI will certainly utilize existing tests and make sure all tests pass. But test cases to validate/verify the correctness of our scheduling algorithms are still somewhat limited, and it feels like we need some solutions there. |
@dongahn, the resource matching implementation that currently exists is a brute-force approach. Its advantage is that is very versatile in the complex resource requests that can be matched. The drawbacks are the performance penalty for visiting each child resource individually. |
@lipari: Thanks. Yes, this is a great idea for debugging and verification! |
Pull in Linux red black tree and interval tree adapted for user space use. Original repo is at https://github.com/markfasheh/interval-tree
Add the planner class, a simple API and efficient mechanisms to allow a Flux scheduler to keep track of the state of resource aggregates of a composite resource. [From its header file] Planner provides a simple API and efficient mechanisms to allow a Flux scheduler to keep track of the state of resource aggregates of a composite resource. In a resource hierarchy used by flux-sched (e.g., hardware hierarchy), a composite resource is represented as a tree graph in which a higher-level vertex has essentially pointers to its immediate child resources, each of which also has pointers to its immediate children etc. With such an organization, the scheduler must essentially walk "all" of the vertices below any composite resource in order to determine if the "sub-resources" requirement can be met. When the scheduler performs such walks excessively in particular, on large graph, however, this can quickly become a performance and scalability bottleneck. Planner addresses this problem by allowing the scheduler to track the "sub-resources" summary information (i.e., aggregates) efficiently at each upper-level composite resource vertex and to use this aggregate information to prune unneccessary descent down into the subtree. Planner offers update and query APIs to support these schemes. Through a planner API, the scheduler can ask a high-level composite a question: "given a request of x, y, z "sub-resources" in aggregate for d time unit, when is the earliest time t at which this request can be satisfied?" Another example would be to answer, "from time t to t+d, does this composite resource vertex has y, z sub-resources available in aggregate. By composing these queries at different levels in a resource hierarchy, the scheduler can significantly reduce the numbers of tree walks. Ultimately, planner will be integrated into our preorder tree-walk pruning filter in our future visitor-pattern-based resource matching scheme.
Added 100 some tests.
Replace resrc's twindow using planner-based API. two things to note: 1. Because of an API mismatch between resrc's "avail" functions and planner's, this commit changes the signiture of the resrc's "avail" functions a bit and adjust the resrc test cases. 2. Because there is yet no use case for resrc's copy constructor, this commit doesn't make a deep copy of the planner-based twindow member. The exact semantics of the copy constructor needs to be discussed.
Add get_resrc_reqst and move the logic to fill resrc_reqst to this function. Enhance readability of schedule_job which has grown somewhat large. More importantly, being ready to encode resource aggregate info without having to modify the main schedule_job function.
This second experimental PR is strictly for @lipari. I replaced the linear twindow searching with planner's API which should be more scalable.
I confirmed all of the unit and integration tests pass with this commit. But I will ultimately want to run a larger emulation (e.g. borrowing from @SteVwonder's work) and make sure the planner-based scheduler produces identical schedules as the original twindow implementation.
As you and I discussed this morning, I decided to continue with this branch and add an initial version of scheduler-driven update on top of this before requesting an official PR review. This is a bit of a plan chang, as I initially planned to fork off a new matcher/visitor branch and implement the first version of aggregate work in that branch.
In any case, since I will have to work on other projects/tasks for a while, this is to inform you with my progress.