582 Add ability to model load values used by LB strategies #897

PhilMiller · 2020-06-26T19:33:50Z

Fixes: #582

PhilMiller · 2020-06-26T19:35:48Z

I think the key thing I want to push through with this is a relatively straighforward way for application code to interpose a different model, so that we can do as much experimentation as possible without crossing the streams of modifying and recompiling both vt and application code.

PhilMiller · 2020-06-26T19:39:22Z

A couple models that it may make sense to bake into vt as generic facilities are l norms with various powers and max, each applicable over a configurable subset of subphases.

codecov · 2020-06-26T20:14:16Z

Codecov Report

Merging #897 into develop will increase coverage by 0.10%.
The diff coverage is 94.79%.

@@             Coverage Diff             @@
##           develop     #897      +/-   ##
===========================================
+ Coverage    82.79%   82.90%   +0.10%     
===========================================
  Files          356      361       +5     
  Lines        11232    11333     +101     
===========================================
+ Hits          9300     9396      +96     
- Misses        1932     1937       +5

Impacted Files	Coverage Δ
src/vt/vrt/collection/balance/elm_stats.impl.h	`92.68% <ø> (ø)`
src/vt/vrt/collection/balance/lb_comm.h	`9.09% <0.00%> (-2.03%)`	⬇️
src/vt/vrt/collection/manager.h	`100.00% <ø> (ø)`
src/vt/vrt/collection/manager.impl.h	`87.26% <ø> (ø)`
src/vt/utils/stats/linear_regression.h	`95.83% <95.83%> (ø)`
...t/collection/test_model_per_collection.extended.cc	`96.07% <96.07%> (ø)`
...c/vt/vrt/collection/balance/lb_invoke/lb_manager.h	`60.00% <100.00%> (+2.85%)`	⬆️
...c/vt/vrt/collection/balance/model/composed_model.h	`100.00% <100.00%> (ø)`
src/vt/vrt/collection/balance/model/load_model.h	`100.00% <100.00%> (ø)`
tests/unit/utils/test_linear_regression.cc	`100.00% <100.00%> (ø)`
... and 7 more

PhilMiller · 2020-07-06T16:26:31Z

I've overhauled the API to I think rationalize responsibilities between user code and the runtime system, and leave models and strategies relatively testable in isolation, without crazy inter-dependencies.

One thing that's niggling at me is the local vs collective nature of the models. I think we need to define them collectively, and possible ensure that at least some calls to them are made similarly.

PhilMiller · 2020-07-06T16:32:31Z

On the hypothesis that we may want to implement models that communicate to set up their predictions (e.g. to compute global covariance or other such statistics), I have the updating step in a separate global epoch before LBManager calls the strategy that may depend on the model's output.

PhilMiller · 2020-07-06T18:35:34Z

One thing that's not really considered in the present design is composition across multiple collections, and collections coming from multiple parts of an application. In some respects, load modeling is a global concern that should be configured at the whole-application/job level. In others, part of the expected benefit of load modeling is being able to import knowledge specific to individual collections into the otherwise oblivious LB system. I don't have a clear idea of how to resolve that tension yet.

One possibility that I've been turning over in my head is whether load models should not be one-off, but rather more of a compositional design. For instance, de-noising might consider some history, and provide an adjusted retrospective load, which would then make a more solid base for a prediction.

PhilMiller · 2020-07-07T18:26:16Z

Per discussion, possibly replace setFocusedSubphase() with a model that does the same.

PhilMiller · 2020-07-17T17:03:41Z

I should write up some test cases for the various models provided.

src/vt/vrt/collection/balance/model/naive_persistence.cc

src/vt/vrt/collection/balance/model/load_model.h

src/vt/vrt/collection/balance/model/naive_persistence.h

src/vt/vrt/collection/balance/model/raw_data.h

src/vt/vrt/collection/balance/model/select_subphases.cc

lifflander · 2020-07-20T22:56:22Z

Overall, this looks like a good direction to me. There is some missing documentation on what these actually do. Also, I'm not sure how the user can control them. Does this solve the problem of multiple collections with different models?

PhilMiller · 2020-07-21T15:23:30Z

Also, I'm not sure how the user can control them.

Users can control them by constructing their own instances, either of a provided type or one they've defined, and passing it to LBManager::setLoadModel(). I've exposed LBManager::getLoadModel() and LBManager::getBaseLoadModel() so that user instances can stack themselves on top of whatever the system would do by default - i.e. if it's configured with noise filtering.

Does this solve the problem of multiple collections with different models?

I just started implementing a PerCollection model that would switch on what collection an element is part of to call collection-specific models for each. However, I ran into the hurdle that there doesn't seem to be any way to get from ElementIDType to the collection's VirtualProxyType or whatever else I would use to identify/distinguish them. As far as data in ProcStats go, all elements of all collections are just part of one big conglomerated sequence.

PhilMiller · 2020-07-21T15:34:07Z

Have a look at the commented out bits in per_collection.{h,cc}. If I had an implementation for those, we'd be set.

src/vt/vrt/collection/balance/model/per_collection.h

docs/md/lb-manager.md

lifflander · 2020-07-22T02:18:52Z

@PhilMiller Take a look at the test I just finished

A few things:

We should probably provide aliases for all the models. Writing out the namespace is a little annoying.
The ObjectIterator segfaults if you don't have any objects. Before I called nextPhase/startPhaseCollective, I tried to iterate through with a for loop. There should have been no objects, but instead it just segfaulted:

  auto model = theLBManager()->getLoadModel();
  for (auto&& obj : *model) { }

==4728== Process terminating with default action of signal 11 (SIGSEGV)
==4728==  Access not within mapped region at address 0xFFFFFFFFFFFFFFD8
==4728==    at 0x4E3C0A: std::_Hashtable<unsigned long, std::pair<unsigned long const, double>, std::allocator<std::pair<unsigned long const, double> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_begin() const (hashtable.h:377)
==4728==    by 0x4E3D72: std::_Hashtable<unsigned long, std::pair<unsigned long const, double>, std::allocator<std::pair<unsigned long const, double> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::begin() const (hashtable.h:496)
==4728==    by 0x4E2689: std::unordered_map<unsigned long, double, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, double> > >::begin() const (unordered_map.h:333)
==4728==    by 0x568DF0: vt::vrt::collection::balance::RawData::begin() (raw_data.h:69)
==4728==    by 0x5652CA: vt::vrt::collection::balance::ComposedModel::begin() (composed_model.cc:64)
==4728==    by 0x154071: vt::tests::unit::TestModelPerCollection_test_model_per_collection_1_Test::TestBody() (test_model_per_collection.extended.cc:111)
==4728==    by 0x2EA52E: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2433)
==4728==    by 0x2E4208: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2469)
==4728==    by 0x2C1689: testing::Test::Run() (gtest.cc:2508)
==4728==    by 0x2C200E: testing::TestInfo::Run() (gtest.cc:2684)
==4728==    by 0x2C2704: testing::TestSuite::Run() (gtest.cc:2816)
==4728==    by 0x2CE116: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5338)

…meant for conversions

…disabled

lifflander · 2020-07-23T01:17:34Z

Here is an overview of what got changed by this pull request:

Clones added
============
- src/vt/vrt/collection/balance/proc_stats.cc  2
- src/vt/vrt/collection/balance/model/per_collection.h  1
- src/vt/vrt/collection/balance/model/composed_model.h  1

See the complete overview on Codacy

PhilMiller requested review from lifflander, nlslatt and ppebay June 26, 2020 19:33

PhilMiller changed the title ~~582 model loads~~ 582 Add ability to model load values used by LB strategies Jun 26, 2020

PhilMiller force-pushed the 582-model-loads branch from 2eb887a to 491026e Compare June 26, 2020 21:02

PhilMiller marked this pull request as ready for review July 6, 2020 19:11

lifflander mentioned this pull request Jul 14, 2020

Meeting Agenda [do not close] #925

Open

lifflander approved these changes Jul 20, 2020

View reviewed changes

PhilMiller marked this pull request as draft July 20, 2020 17:29

PhilMiller force-pushed the 582-model-loads branch from afda0ed to 17c9c6b Compare July 20, 2020 17:59

lifflander reviewed Jul 20, 2020

View reviewed changes

PhilMiller force-pushed the 582-model-loads branch 4 times, most recently from b7241e8 to 7f086ed Compare July 21, 2020 16:36

lifflander reviewed Jul 21, 2020

View reviewed changes

src/vt/vrt/collection/balance/model/per_collection.h Outdated Show resolved Hide resolved

lifflander reviewed Jul 21, 2020

View reviewed changes

docs/md/lb-manager.md Outdated Show resolved Hide resolved

lifflander reviewed Jul 21, 2020

View reviewed changes

docs/md/lb-manager.md Outdated Show resolved Hide resolved

PhilMiller and others added 26 commits July 22, 2020 21:15

#582: Add high-level documentation of load models

ea8e005

#582: Add explicit annotations on single-argument constructors not …

f5fc311

…meant for conversions

#582: Format docs a bit better

0fb3b02

#582: Fix typo in docs

fb2922b

#582: Implement collection proxy lookup for element IDs during LB

aafbc34

#582: Add the lookup for collection-based models in PerCollection

7715758

#582: Write a test for PerCollection models

7d2f8c6

#582: Update docs that PerCollection is implemented

63f53e1

#582: Remove some dead code from test (copy-paste error)

d40d9fd

#582: Implement a basic comm overhead model

39a4196

#582: Implement a utility for linear regression

07a7302

#582: Implement a linear model based on past window

623bf33

#582: Fix some tabs copied from other code

ea18691

#582: Remove unimplemented note from doc

bf84807

#582: Make CommOverhead subphase attribution proportional to work

b7594fe

#582: Factor out completed phase enumeration

ea448e6

#582: Clear out cruft from before subphase enumeration was factored out

26a7d28

#582: Clarify comment

d367a96

#582: Clean up some header dependencies

eda40b3

#582: Disable part of test that's sensitive to LB being compile-time …

3a78054

…disabled

#582: tests: Implement test for linear regression

f8cc004

#582: Docs: Add missing classes

a0eb5b4

#582: Docs: Organize and annotate classes

af3b28b

#582: Docs: Describe the categories

8608e26

#582: Docs: Finish elaborating on missing bits

76f814e

#582: Add MultiplePhases blocking predictor

24f93c7

PhilMiller force-pushed the 582-model-loads branch from dc4b3f9 to 24f93c7 Compare July 23, 2020 01:15

lifflander merged commit e96ec02 into develop Jul 23, 2020

cz4rs mentioned this pull request Jun 15, 2021

move vt version number into a separate file #1467

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

582 Add ability to model load values used by LB strategies #897

582 Add ability to model load values used by LB strategies #897

PhilMiller commented Jun 26, 2020 •

edited

Loading

PhilMiller commented Jun 26, 2020

PhilMiller commented Jun 26, 2020 •

edited

Loading

codecov bot commented Jun 26, 2020 •

edited

Loading

PhilMiller commented Jul 6, 2020

PhilMiller commented Jul 6, 2020

PhilMiller commented Jul 6, 2020

PhilMiller commented Jul 7, 2020

PhilMiller commented Jul 17, 2020

lifflander commented Jul 20, 2020

PhilMiller commented Jul 21, 2020

PhilMiller commented Jul 21, 2020

lifflander commented Jul 22, 2020 •

edited

Loading

lifflander commented Jul 23, 2020

582 Add ability to model load values used by LB strategies #897

582 Add ability to model load values used by LB strategies #897

Conversation

PhilMiller commented Jun 26, 2020 • edited Loading

PhilMiller commented Jun 26, 2020

PhilMiller commented Jun 26, 2020 • edited Loading

codecov bot commented Jun 26, 2020 • edited Loading

Codecov Report

PhilMiller commented Jul 6, 2020

PhilMiller commented Jul 6, 2020

PhilMiller commented Jul 6, 2020

PhilMiller commented Jul 7, 2020

PhilMiller commented Jul 17, 2020

lifflander commented Jul 20, 2020

PhilMiller commented Jul 21, 2020

PhilMiller commented Jul 21, 2020

lifflander commented Jul 22, 2020 • edited Loading

lifflander commented Jul 23, 2020

PhilMiller commented Jun 26, 2020 •

edited

Loading

PhilMiller commented Jun 26, 2020 •

edited

Loading

codecov bot commented Jun 26, 2020 •

edited

Loading

lifflander commented Jul 22, 2020 •

edited

Loading