Experimental planner PR #2: twindow replacement #233

dongahn · 2017-01-03T21:14:28Z

This second experimental PR is strictly for @lipari. I replaced the linear twindow searching with planner's API which should be more scalable.

I confirmed all of the unit and integration tests pass with this commit. But I will ultimately want to run a larger emulation (e.g. borrowing from @SteVwonder's work) and make sure the planner-based scheduler produces identical schedules as the original twindow implementation.

As you and I discussed this morning, I decided to continue with this branch and add an initial version of scheduler-driven update on top of this before requesting an official PR review. This is a bit of a plan chang, as I initially planned to fork off a new matcher/visitor branch and implement the first version of aggregate work in that branch.

In any case, since I will have to work on other projects/tasks for a while, this is to inform you with my progress.

coveralls · 2017-01-03T21:53:56Z

Coverage increased (+1.0%) to 73.647% when pulling 4348bff on dongahn:planner_twindow into 8f43078 on flux-framework:master.

codecov-io · 2017-01-03T22:13:13Z

Codecov Report

Merging #233 into master will increase coverage by 1.59%.
The diff coverage is 68.39%.

@@            Coverage Diff             @@
##           master     #233      +/-   ##
==========================================
+ Coverage   55.05%   56.65%   +1.59%     
==========================================
  Files          25       30       +5     
  Lines        5184     6019     +835     
  Branches     1168     1389     +221     
==========================================
+ Hits         2854     3410     +556     
- Misses       1626     1811     +185     
- Partials      704      798      +94

Impacted Files	Coverage Δ
resrc/resrc.h	`100% <ø> (ø)`	✅
src/common/libutil/interval_tree.c	`0% <ø> (ø)`
resrc/resrc_flow.c	`32.19% <ø> (ø)`	✅
src/common/libutil/rbtree.h	`100% <100%> (ø)`
resrc/planner.c	`64.29% <64.29%> (ø)`
resrc/resrc_reqst.c	`74.54% <66.66%> (-0.76%)`	❌
src/common/libutil/rbtree_augmented.h	`68.51% <68.51%> (ø)`
src/common/libutil/rbtree.c	`76.87% <76.87%> (ø)`
resrc/resrc.c	`67.83% <80%> (-0.83%)`	❌
sched/sched.c	`57.04% <84.37%> (+0.74%)`	✅
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b95852...419038d. Read the comment docs.

dongahn · 2017-01-14T01:52:49Z

@lipari: I spent sometime today to map out a few initial steps for scheduler-driven aggregate updates. Here is what I sketched -- so that I can get your feedback better if we have a chance to talk next week.

resrc

Start simple and place the planner-based aggregate structure only on a small set of resource type:

At the resrc vertex of cluster type, maintain aggregates on nodes and cores under its subtree;
At each resrc vertex of node type, maintain aggregates on cores under its subtree;
Initialize and populate these aggregate planners at resrc generation time;
Avoid using graph elements (e.g., BW and power), for now.

resrc_reqst

Instead of computing aggregate requirements of the resrc_reqst by walking its hierarchy at each match-check time, compute node/core aggregate requirements upfront and stick them into resrc_reqst at certain levels:

Stick the total node and core count requirement in aggregate to the top-level request vertex;
Stick the core count requirement in aggregate to the node-level request vertex.

match_resource

Augment match_resource to query the aggregate planner and avoid unnecessary resrc tree decent

insertion point at which the scheduler updates the aggregate planners right after a job is successfully scheduled

I still need to do some hacking to find the appropriate place. I initially envisioned I do this at the job tagging time but I didn't have a chance to look at the code today.

Performance testers and target

Introduce large graph stress test -- we should in general benefit from adding another resrc unit test with a large cluster configuration with common walk patterns (e.g., including a large graph with available resources varying from many to small);
Significantly speed up and scale for scheduling through many unsatisfiable high-priority jobs in the pending queue;
Significantly speed up and scale for the case where many jobs need to complete and free resources before a large job can be scheduled under the conservative backfill policy

Correctness

I will certainly utilize existing tests and make sure all tests pass. But test cases to validate/verify the correctness of our scheduling algorithms are still somewhat limited, and it feels like we need some solutions there.

lipari · 2017-01-17T01:41:46Z

@dongahn, the resource matching implementation that currently exists is a brute-force approach. Its advantage is that is very versatile in the complex resource requests that can be matched. The drawbacks are the performance penalty for visiting each child resource individually.
It is therefore ripe for optimizing and your efforts are welcome. My only request at this point is that you think of a way to maintain the old brute force approach as a fall-back option. If we dig too deeply to convert to an aggregate solution, we may discover someday a resource request that is complex enough to be un-satisfiable using the optimized code, but possible using the original, brute force method.

dongahn · 2017-01-17T04:13:47Z

@lipari: Thanks. Yes, this is a great idea for debugging and verification!

coveralls · 2017-02-17T19:57:16Z

Coverage increased (+1.0%) to 73.639% when pulling 9732495 on dongahn:planner_twindow into 8f43078 on flux-framework:master.

coveralls · 2017-02-17T20:13:05Z

Coverage increased (+1.0%) to 73.673% when pulling f62705b on dongahn:planner_twindow into 8f43078 on flux-framework:master.

Pull in Linux red black tree and interval tree adapted for user space use. Original repo is at https://github.com/markfasheh/interval-tree

Add the planner class, a simple API and efficient mechanisms to allow a Flux scheduler to keep track of the state of resource aggregates of a composite resource. [From its header file] Planner provides a simple API and efficient mechanisms to allow a Flux scheduler to keep track of the state of resource aggregates of a composite resource. In a resource hierarchy used by flux-sched (e.g., hardware hierarchy), a composite resource is represented as a tree graph in which a higher-level vertex has essentially pointers to its immediate child resources, each of which also has pointers to its immediate children etc. With such an organization, the scheduler must essentially walk "all" of the vertices below any composite resource in order to determine if the "sub-resources" requirement can be met. When the scheduler performs such walks excessively in particular, on large graph, however, this can quickly become a performance and scalability bottleneck. Planner addresses this problem by allowing the scheduler to track the "sub-resources" summary information (i.e., aggregates) efficiently at each upper-level composite resource vertex and to use this aggregate information to prune unneccessary descent down into the subtree. Planner offers update and query APIs to support these schemes. Through a planner API, the scheduler can ask a high-level composite a question: "given a request of x, y, z "sub-resources" in aggregate for d time unit, when is the earliest time t at which this request can be satisfied?" Another example would be to answer, "from time t to t+d, does this composite resource vertex has y, z sub-resources available in aggregate. By composing these queries at different levels in a resource hierarchy, the scheduler can significantly reduce the numbers of tree walks. Ultimately, planner will be integrated into our preorder tree-walk pruning filter in our future visitor-pattern-based resource matching scheme.

Added 100 some tests.

Replace resrc's twindow using planner-based API. two things to note: 1. Because of an API mismatch between resrc's "avail" functions and planner's, this commit changes the signiture of the resrc's "avail" functions a bit and adjust the resrc test cases. 2. Because there is yet no use case for resrc's copy constructor, this commit doesn't make a deep copy of the planner-based twindow member. The exact semantics of the copy constructor needs to be discussed.

Add get_resrc_reqst and move the logic to fill resrc_reqst to this function. Enhance readability of schedule_job which has grown somewhat large. More importantly, being ready to encode resource aggregate info without having to modify the main schedule_job function.

coveralls · 2017-02-22T02:42:48Z

Coverage increased (+0.8%) to 73.477% when pulling 419038d on dongahn:planner_twindow into 7b95852 on flux-framework:master.

dongahn · 2017-09-05T22:00:41Z

As discussed with @morrone and @tpatki today, I'm closing this in preparation for a new planner-only PR.

dongahn added 6 commits February 21, 2017 18:29

util: Add Linux red black tree

bb64c2b

Pull in Linux red black tree and interval tree adapted for user space use. Original repo is at https://github.com/markfasheh/interval-tree

planner: Add unit tests for planner APIs

c16464d

Added 100 some tests.

resrc_reqst: add resurce aggregates request directly in resrc_reqst

419038d

dongahn mentioned this pull request Jun 18, 2017

[Seeking Feedback] Flux Resource Model Strawman #247

Closed

dongahn closed this Sep 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental planner PR #2: twindow replacement #233

Experimental planner PR #2: twindow replacement #233

dongahn commented Jan 3, 2017

coveralls commented Jan 3, 2017

codecov-io commented Jan 3, 2017 •

edited

Loading

dongahn commented Jan 14, 2017

lipari commented Jan 17, 2017

dongahn commented Jan 17, 2017

coveralls commented Feb 17, 2017

coveralls commented Feb 17, 2017

coveralls commented Feb 22, 2017

dongahn commented Sep 5, 2017

Experimental planner PR #2: twindow replacement #233

Experimental planner PR #2: twindow replacement #233

Conversation

dongahn commented Jan 3, 2017

coveralls commented Jan 3, 2017

codecov-io commented Jan 3, 2017 • edited Loading

Codecov Report

dongahn commented Jan 14, 2017

resrc

resrc_reqst

match_resource

insertion point at which the scheduler updates the aggregate planners right after a job is successfully scheduled

Performance testers and target

Correctness

lipari commented Jan 17, 2017

dongahn commented Jan 17, 2017

coveralls commented Feb 17, 2017

coveralls commented Feb 17, 2017

coveralls commented Feb 22, 2017

dongahn commented Sep 5, 2017

codecov-io commented Jan 3, 2017 •

edited

Loading