-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Balance workloads within synchronized subphases (vector loads rather than scalar) #708
Comments
@lifflander @ppebay will want to track this |
@ppebay What do you want for the format of subphase object times in the stats file? |
I think it make sense for the sub-phases to be sorted in order as the phases are in the output. We could put it at the end to maintain some sort of positional compatibility or just add it after the phase, which probably makes more sense: i.e., in the 2nd position for each line, for both comp and comm. |
Here are a few improvement criteria that could be used in a vector strategy. Suppose we have TimeType getObjLoad(ObjectID, Phase, Subphase);
TimeType getProcLoad(ProcID, Phase, Subphase);
vector<TimeType> subphase_maximums(num_subphases, 0);
vector<TimeType> subphase_averages(num_subphases, 0);
vector<TimeType> subphase_long_poles(num_subphases, 0);
vector<TimeType> subphase_targets(num_subphases, 0);
for (int j = 0; j < num_subphases; j++) {
for (int i = 0; i < num_procs; ++i) {
subphase_maximum[j] = max(subphase_maximum[j], getProcLoad(i, phase, j));
subphase_averages[j] += getProcLoad(i, phase, j);
}
subphase_averages[j] /= num_procs;
for (int i = 0; i < num_objs; ++i) {
subphase_long_poles[j] = max(subphase_long_poles[j], getObjLoad(i, phase, j));
}
subphase_targets[j] = max(subphase_averages[j], subphase_long_poles[j]);
} The strongest criterion would be that a proposed migration not create any overload in any subphase: bool MigrationCreatesNoOverload(ObjectID obj, ProcID src_proc, ProcID dst_proc) {
for (int i = 0; i < num_subphases; ++i) {
if (getObjLoad(obj, phase, i) + getProcLoad(dst_proc, phase, i) > subphase_targets[i])
return false;
}
return true;
} To be continued... |
Another criterion would be that the relieved overload on bool MigrationRelievesNetOverload(ObjectID obj, ProcID src_proc, ProcID dst_proc) {
TimeType obj_load = getObjLoad(obj, phase, i);
TimeType relieved_overload = 0;
for (int i = 0; i < num_subphases; ++i) {
TimeType load_before = getProcLoad(src_proc, phase, i);
TimeType load_after = load_before - obj_load;
if (load_after >= subphase_targets[i])
relieved_overload += obj_load;
else if (load_before >= subphase_targets[i]) // Corrected to *else* if
relieved_overload += load_before - subphase_targets[i];
}
TimeType created_overload = 0;
for (int i = 0; i < num_subphases; ++i) {
TimeType load_before = getProcLoad(dst_proc, phase, i);
TimeType load_after = load_before + obj_load;
if (load_before >= subphase_targets[i])
created_overload += obj_load;
else if (load_after >= subphase_targets[i])
created_overload += load_after - subphase_targets[i];
}
return relieved_overload > created_overload;
} |
A slightly trickier one to implement, requiring updated global load knowledge (equivalently, a centralized strategy that collects the stats in one place), would be whether the destination processor becomes a 'long pole' in any subphase. A modification of that considers whether it would be a long pole assuming the maxes are fixed. This is a valid criterion for global improvement, but probably not a very good one. It would need a lot of iterative migration to smooth everything down. |
It's very likely I'm duplicating at least some work of @rbuch here |
Indeed, I am looking at similar things in Charm++. I'm currently focused on pushing through some of the internal LB infrastructure changes so a lot of the strategy design and validation is still future work, but I have done some preliminary work on that. In any case, we should coordinate to avoid making the same mistakes and avoiding duplication of work where we can. |
Another one, that any created overload in any phase on the destination processor is less than any remaining overload on the source processor: bool MigrationUniformlyMitigatesOverload(ObjectID obj, ProcID src_proc, ProcID dst_proc) {
TimeType obj_load = getObjLoad(obj, phase, i);
TimeType relieved_overload = 0;
for (int i = 0; i < num_subphases; ++i) {
TimeType src_load_before = getProcLoad(src_proc, phase, i);
TimeType src_load_after = src_load_before - obj_load;
TimeType dst_load_before = getProcLoad(dst_proc, phase, i);
TimeType dst_load_after = dst_load_before + obj_load;
if (dst_load_after >= subphase_targets[i] &&
dst_load_after > src_load_after)
return false;
}
return true;
} I believe this is, again, an example of a 'strict improvement criterion', while still being more lenient than |
I just amended the preliminary and all of the criterion definitions to clarify that they should consider the best attainable target in each subphase, which is the larger of the heaviest object or the average processor load. If there's a long pole, there's no sense trying to offload a processor that's already shorter than it. |
A slight variation on |
#708: Pass subphase timings through to ProcStats, and write them out
What Needs to be Done?
Modify LB strategy migration criterion to disallow candidates that would increase vector imbalanceIs your feature request related to a problem? Please describe.
In EMPIRE, each step consists of a number of subphases. Those subphases are separated by global synchronization events. Objects loads across subphases are not directly proportional. Balancing by the objects' aggregate load during each entire step leaves substantial imbalances within each subphase. This limits performance and scalability.
Describe potential solution outcome
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: