-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support binpack policy #380
Conversation
@hzxuzhonghu @TommyLike @k82cn Now, we have another XL PR. An e2e test case is included. Please have a look on it. |
Hey @lminzhw, TravisCI finished with status TravisBuddy Request Identifier: 52b4ca80-ad5b-11e9-aa77-bf7f12915990 |
/approve |
/cc @hex108 @kinglion811 |
// All resource with weight should be calculated, because the weightSum need it, | ||
// even the node have no this resource. | ||
for name, weight := range weight.BinPackingResources { | ||
weightSum += weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If score calculated at L202 is zero, whether we need to add weight
to weightSumt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero is also a valid value for score, so I think add it to sum is also valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take GPU for example:
Node A has no GPU, so its GPU score will be 0, Node B has GPU capacity 6 and use 3, then it has a score for GPU resource. Suppose that Node A and B's other resource's scores are same, if we add weight
in this case, Node B's total score will be higher than Node A, then it will choose Node B, is it expected? Node A might be a better choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the resource is not exist on the node, we should also skip weight for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to choose NodeA, maybe we should make the GPU score in NodeB to be nagetive in this case?
For example, request need resource Ra/Rb, but the NodeA has resource Ra/Rb, NodeB has resource Ra/Rb/Rc.
Policy Current is the score now. Policy Skip will ignore the Sc/Wc in NodeA but could not guarantee that NodeA have a higher score than NodeB. Policy Negative would treat the Sc in NodeB as negative score.
policy NodeA NodeB
current Sa+Sb+0/Wa+Wb+Wc Sa+Sb+Sc/Wa+Wb+Wc
skip Sa+Sb/Wa+Wb Sa+Sb+Sc/Wa+Wb+Wc
negative Sa+Sb+0/Wa+Wb+Wc Sa+Sb-Sc/Wa+Wb+Wc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any better idea for this now. Just some thought:
- If request does not need resource C, then we do not calculate the score of this resource, it means that we compare
Sa+Sb/Wa+Wb
(for NodeA) withSa+Sb/Wa+Wb
(for Node B).
or
- If NodeA does not has the resource, we regards its RequestedToCapacityRatio to 1 instead of 0, it means that it is full, 0 seems it is empty.
or
We could solve it in later PR after finding better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1 is better to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactor the code in BinPackingScore
. Now, only resources that pod requested can affect the binpack score.
binpack.cpu: 5 | ||
binpack.memory: 1 | ||
binpack.resources: nvidia.com/gpu, example.com/foo | ||
binpack.resources.nvidia.com/gpu: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/binpack.resources.nvidia.com/gpu
/binpack.nvidia.com/gpu
/? Then it is same as cpu/memory, it is just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, all additional resource is listed in binpack.resources
, so the key for weight of resource should have prefix of it.
LGTM, thanks :) |
Thanks! LGTM |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: k82cn, lminzhw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Motivation
While running workloads on kubernetes which use accelerator devices the default scheduler spreads pods across nodes resulting in fragmentation of extended resources. This fragmentation prevents scheduling pods with larger device resource requirements, and they remain in the pending state.
Goals