Balance workloads within synchronized subphases (vector loads rather than scalar) #708

PhilMiller · 2020-02-25T19:56:32Z

What Needs to be Done?

Add an API for objects to indicate which subphase their work is part of
Extend LB instrumentation to record a subphase-indexed vector of loads per object per phase, rather than the current scalar(s)
Add API for load balancing strategies to query vector loads
~~Modify LB strategy migration criterion to disallow candidates that would increase vector imbalance~~
Output load vectors to LB stats vom file

Is your feature request related to a problem? Please describe.

In EMPIRE, each step consists of a number of subphases. Those subphases are separated by global synchronization events. Objects loads across subphases are not directly proportional. Balancing by the objects' aggregate load during each entire step leaves substantial imbalances within each subphase. This limits performance and scalability.

Describe potential solution outcome

Describe alternatives you've considered

Additional context

PhilMiller · 2020-02-25T19:57:42Z

@lifflander @ppebay will want to track this

PhilMiller · 2020-02-25T22:49:44Z

@ppebay What do you want for the format of subphase object times in the stats file?

lifflander · 2020-02-25T23:05:21Z

I think it make sense for the sub-phases to be sorted in order as the phases are in the output. We could put it at the end to maintain some sort of positional compatibility or just add it after the phase, which probably makes more sense: i.e., in the 2nd position for each line, for both comp and comm.

PhilMiller · 2020-03-10T23:14:37Z

Here are a few improvement criteria that could be used in a vector strategy.

Suppose we have

TimeType getObjLoad(ObjectID, Phase, Subphase);
TimeType getProcLoad(ProcID, Phase, Subphase);

vector<TimeType> subphase_maximums(num_subphases, 0);
vector<TimeType> subphase_averages(num_subphases, 0);
vector<TimeType> subphase_long_poles(num_subphases, 0);
vector<TimeType> subphase_targets(num_subphases, 0);

for (int j = 0; j < num_subphases; j++) {
  for (int i = 0; i < num_procs; ++i) {
    subphase_maximum[j] = max(subphase_maximum[j], getProcLoad(i, phase, j));
    subphase_averages[j] += getProcLoad(i, phase, j);
  }
  subphase_averages[j] /= num_procs;

  for (int i = 0; i < num_objs; ++i) {
    subphase_long_poles[j] = max(subphase_long_poles[j], getObjLoad(i, phase, j));
  }

  subphase_targets[j] = max(subphase_averages[j], subphase_long_poles[j]);
}

The strongest criterion would be that a proposed migration not create any overload in any subphase:

bool MigrationCreatesNoOverload(ObjectID obj, ProcID src_proc, ProcID dst_proc) {
  for (int i = 0; i < num_subphases; ++i) {
    if (getObjLoad(obj, phase, i) + getProcLoad(dst_proc, phase, i) > subphase_targets[i])
      return false;
  }
  return true;
}

To be continued...

PhilMiller · 2020-03-10T23:45:17Z

Another criterion would be that the relieved overload on src_proc is greater than any created overload on dst_proc:

bool MigrationRelievesNetOverload(ObjectID obj, ProcID src_proc, ProcID dst_proc) {
  TimeType obj_load = getObjLoad(obj, phase, i);

  TimeType relieved_overload = 0;
  for (int i = 0; i < num_subphases; ++i) {
    TimeType load_before = getProcLoad(src_proc, phase, i);
    TimeType load_after = load_before - obj_load;

    if (load_after >= subphase_targets[i])
      relieved_overload += obj_load;
    else if (load_before >= subphase_targets[i]) // Corrected to *else* if
      relieved_overload += load_before - subphase_targets[i];
  }

  TimeType created_overload = 0;
  for (int i = 0; i < num_subphases; ++i) {
    TimeType load_before = getProcLoad(dst_proc, phase, i);
    TimeType load_after = load_before + obj_load;

    if (load_before >= subphase_targets[i])
      created_overload += obj_load;
    else if (load_after >= subphase_targets[i])
      created_overload += load_after - subphase_targets[i];
  }

  return relieved_overload > created_overload;
}

PhilMiller · 2020-03-10T23:50:17Z

A slightly trickier one to implement, requiring updated global load knowledge (equivalently, a centralized strategy that collects the stats in one place), would be whether the destination processor becomes a 'long pole' in any subphase.

A modification of that considers whether it would be a long pole assuming the maxes are fixed. This is a valid criterion for global improvement, but probably not a very good one. It would need a lot of iterative migration to smooth everything down.

PhilMiller · 2020-03-10T23:54:41Z

It's very likely I'm duplicating at least some work of @rbuch here

rbuch · 2020-03-12T16:27:37Z

Indeed, I am looking at similar things in Charm++. I'm currently focused on pushing through some of the internal LB infrastructure changes so a lot of the strategy design and validation is still future work, but I have done some preliminary work on that. In any case, we should coordinate to avoid making the same mistakes and avoiding duplication of work where we can.

PhilMiller · 2020-03-12T17:09:17Z

Another one, that any created overload in any phase on the destination processor is less than any remaining overload on the source processor:

bool MigrationUniformlyMitigatesOverload(ObjectID obj, ProcID src_proc, ProcID dst_proc) {
  TimeType obj_load = getObjLoad(obj, phase, i);

  TimeType relieved_overload = 0;
  for (int i = 0; i < num_subphases; ++i) {
    TimeType src_load_before = getProcLoad(src_proc, phase, i);
    TimeType src_load_after = src_load_before - obj_load;

    TimeType dst_load_before = getProcLoad(dst_proc, phase, i);
    TimeType dst_load_after = dst_load_before + obj_load;

    if (dst_load_after >= subphase_targets[i] && 
         dst_load_after > src_load_after)
      return false;
  }

  return true;
}

I believe this is, again, an example of a 'strict improvement criterion', while still being more lenient than MigrationCreatesNoOverload

PhilMiller · 2020-03-14T22:51:03Z

I just amended the preliminary and all of the criterion definitions to clarify that they should consider the best attainable target in each subphase, which is the larger of the heaviest object or the average processor load. If there's a long pole, there's no sense trying to offload a processor that's already shorter than it.

PhilMiller · 2020-03-17T18:15:58Z

A slight variation on MigrationUniformlyMitigatesOverload would make the last comparison dst_load_after > src_load_before. This would allow migrations that improve some phases, as long as other phases are no worse off than they would have been without the migration.

…g mixed comparison warnings)

…s desired

#708: Pass subphase timings through to ProcStats, and write them out

…s desired

PhilMiller added Epic type: feature type: task 1.1.0 labels Feb 25, 2020

PhilMiller self-assigned this Feb 25, 2020

PhilMiller pushed a commit that referenced this issue Mar 4, 2020

#708: Add API to identify workload subphases and record time for them

7dfc57a

PhilMiller pushed a commit that referenced this issue Mar 4, 2020

#708: Add API to identify workload subphases and record time for them

0bf0405

PhilMiller pushed a commit that referenced this issue Mar 4, 2020

#708: Add API to identify workload subphases and record time for them

f979996

PhilMiller pushed a commit that referenced this issue Apr 17, 2020

#708: Add API to identify workload subphases and record time for them

62e685e

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Adjust misleading resize - inc may be >1

584153f

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Add API to identify workload subphases and record time for them

b5b0e3c

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Fix seemingly wrong assertion

7e6486e

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Use a distinct typedef for subphase (in anticipation of annoyin…

ba84770

…g mixed comparison warnings)

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Add interface to get load from a single subphase

f0663e0

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Add serialization for added fields

bd168d8

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Add API for picking a subphase to represent the overall phase load

c549453

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Make focused subphase per-collection, not universal

cc6389a

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Use existing variable

9a76f5c

lifflander pushed a commit that referenced this issue Apr 20, 2020

#708: Add missing fallback case

ead6dd2

lifflander pushed a commit that referenced this issue May 6, 2020

#708: Fix typo in printed output

b41fbde

lifflander pushed a commit that referenced this issue May 6, 2020

#708: Add in a newly-missing header

dfade61

PhilMiller pushed a commit that referenced this issue Jun 1, 2020

#708: Carry subphase timings into ProcStats

5a5d6e6

PhilMiller pushed a commit that referenced this issue Jun 1, 2020

#708: Write out subphase loads in LB stats file

996d32a

PhilMiller pushed a commit that referenced this issue Jun 1, 2020

#708: ProcStats: add method for LB strategies to get subphase loads a…

5501d7d

…s desired

PhilMiller mentioned this issue Jun 1, 2020

#708: Pass subphase timings through to ProcStats, and write them out #826

Merged

PhilMiller pushed a commit that referenced this issue Jun 9, 2020

#708: Carry subphase timings into ProcStats

2b78eec

PhilMiller pushed a commit that referenced this issue Jun 9, 2020

#708: Write out subphase loads in LB stats file

0115591

PhilMiller pushed a commit that referenced this issue Jun 9, 2020

#708: ProcStats: add method for LB strategies to get subphase loads a…

7981158

…s desired

PhilMiller pushed a commit that referenced this issue Jun 9, 2020

#708: Carry subphase timings into ProcStats

18a5fb4

PhilMiller pushed a commit that referenced this issue Jun 9, 2020

#708: Write out subphase loads in LB stats file

a3d84d7

PhilMiller pushed a commit that referenced this issue Jun 9, 2020

#708: ProcStats: add method for LB strategies to get subphase loads a…

23e74fe

…s desired

lifflander pushed a commit that referenced this issue Jun 10, 2020

#708: Carry subphase timings into ProcStats

e53205e

lifflander pushed a commit that referenced this issue Jun 10, 2020

#708: Write out subphase loads in LB stats file

e508be0

lifflander pushed a commit that referenced this issue Jun 10, 2020

#708: ProcStats: add method for LB strategies to get subphase loads a…

45a6f50

…s desired

lifflander pushed a commit that referenced this issue Jun 10, 2020

#708: Carry subphase timings into ProcStats

6bff57d

lifflander pushed a commit that referenced this issue Jun 10, 2020

#708: Write out subphase loads in LB stats file

10ed887

lifflander pushed a commit that referenced this issue Jun 10, 2020

#708: ProcStats: add method for LB strategies to get subphase loads a…

00e5d6b

…s desired

lifflander closed this as completed in #826 Jun 10, 2020

lifflander added a commit that referenced this issue Jun 10, 2020

Merge pull request #826 from DARMA-tasking/708-vector-lb-stats

84bb174

#708: Pass subphase timings through to ProcStats, and write them out

PhilMiller mentioned this issue Jun 23, 2020

Add support for subphases from vt applications DARMA-tasking/LB-analysis-framework#53

Closed

lifflander pushed a commit that referenced this issue Jul 15, 2020

#708: Fix comment to match described class, and typo

466e360

lifflander pushed a commit that referenced this issue Jul 15, 2020

#708: Comment description for added API

0b12b4e

lifflander pushed a commit that referenced this issue Jul 15, 2020

#708: Fix typo in printed output

cdabe95

lifflander pushed a commit that referenced this issue Jul 15, 2020

#708: Carry subphase timings into ProcStats

7e7d07a

lifflander pushed a commit that referenced this issue Jul 15, 2020

#708: Write out subphase loads in LB stats file

62f20af

lifflander pushed a commit that referenced this issue Jul 15, 2020

#708: ProcStats: add method for LB strategies to get subphase loads a…

a27b524

…s desired

lifflander added a commit that referenced this issue Jul 15, 2020

#708: lb: fix missing typedef from rebase

7aad546

lifflander added a commit that referenced this issue Jul 15, 2020

#708: lb: move migratable constructor (missed in rebase)

51a4208

lifflander added a commit that referenced this issue Jul 15, 2020

#708: lb: add ProcStats subphase vector (missed in rebase)

caa15be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balance workloads within synchronized subphases (vector loads rather than scalar) #708

Balance workloads within synchronized subphases (vector loads rather than scalar) #708

PhilMiller commented Feb 25, 2020 •

edited

Loading

PhilMiller commented Feb 25, 2020

PhilMiller commented Feb 25, 2020

lifflander commented Feb 25, 2020

PhilMiller commented Mar 10, 2020 •

edited

Loading

PhilMiller commented Mar 10, 2020 •

edited

Loading

PhilMiller commented Mar 10, 2020 •

edited

Loading

PhilMiller commented Mar 10, 2020

rbuch commented Mar 12, 2020

PhilMiller commented Mar 12, 2020 •

edited

Loading

PhilMiller commented Mar 14, 2020 •

edited

Loading

PhilMiller commented Mar 17, 2020

Balance workloads within synchronized subphases (vector loads rather than scalar) #708

Balance workloads within synchronized subphases (vector loads rather than scalar) #708

Comments

PhilMiller commented Feb 25, 2020 • edited Loading

PhilMiller commented Feb 25, 2020

PhilMiller commented Feb 25, 2020

lifflander commented Feb 25, 2020

PhilMiller commented Mar 10, 2020 • edited Loading

PhilMiller commented Mar 10, 2020 • edited Loading

PhilMiller commented Mar 10, 2020 • edited Loading

PhilMiller commented Mar 10, 2020

rbuch commented Mar 12, 2020

PhilMiller commented Mar 12, 2020 • edited Loading

PhilMiller commented Mar 14, 2020 • edited Loading

PhilMiller commented Mar 17, 2020

PhilMiller commented Feb 25, 2020 •

edited

Loading

PhilMiller commented Mar 10, 2020 •

edited

Loading

PhilMiller commented Mar 10, 2020 •

edited

Loading

PhilMiller commented Mar 10, 2020 •

edited

Loading

PhilMiller commented Mar 12, 2020 •

edited

Loading

PhilMiller commented Mar 14, 2020 •

edited

Loading