-
Notifications
You must be signed in to change notification settings - Fork 388
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize and complete the document of QOS
- Loading branch information
kaiyuechen
committed
Oct 14, 2022
1 parent
cd75bf2
commit dc22027
Showing
13 changed files
with
905 additions
and
760 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
51 changes: 51 additions & 0 deletions
51
docs/tutorials/qos-accurately-perform-avoidance-actions.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
### Accurately Perform Avoidance Actions | ||
Through the following two points, the excessive operation of low-quality pod can be avoided, and the gap between the metrics and the specified watermark can be reduced faster, so as to ensure that the high-priority service is not affected | ||
1. Sort pod | ||
|
||
Crane implements some general sorting methods (which will be improved later): | ||
|
||
ClassAndPriority: compare the QOSClass and class value of two pods, compare QOSClass first, and then class value; Those with high priority are ranked later and have higher priority | ||
|
||
runningTime: compare the running time of two pods. The one with long running time is ranked later and has higher priority | ||
|
||
If you only need to use these two sorting strategies, you can use the default sorting method: you will first compare the priority of the pod, then compare the consumption of the corresponding indicators of the pod, and then compare the running time of the pod. There is a dimension that can compare the results, that is, the sorting results of the pod | ||
|
||
Taking the ranking of CPU usage metric as an example, it also extends some ranking strategies related to its own metric, such as the ranking of CPU usage, which will compare the priority of two pods in turn. If the priority is the same, then compare the CPU consumption. If the CPU consumption is also the same, continue to compare the extended CPU resource consumption, and finally compare the running time of pod, when there is a difference in an indicator, the comparison result can be returned: `orderedby (classandpriority, CpuUsage, extcpuusage, runningtime) Sort(pods)` | ||
|
||
2. Refer to the watermark and pod usage to perform avoidance action | ||
```go | ||
//Divide all the metrics that trigger the watermark threshold into two parts according to their quantified attribute | ||
metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified() | ||
// If there is a metric that cannot be quantified, obtain the metric of a throttleable with the highest actionpriority to operate on all selected pods | ||
if len(MetricsNotThrottleQuantified) != 0 { | ||
highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric() | ||
t.throttlePods(ctx, &totalReleased, highestPrioriyMetric) | ||
} else { | ||
//Get the latest usage, get the gap to watermark | ||
ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc()) | ||
//If the real-time consumption of metric in the trigger watermark threshold cannot be obtained, chose the metric which is throttleable with the highest actionpriority to suppress all selected pods | ||
if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() { | ||
highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric() | ||
errPodKeys = throttlePods(ctx, &totalReleased, highestPrioriyMetric) | ||
} else { | ||
var released ReleaseResource | ||
//Traverse the quantifiable metrics in the metrics that trigger the watermark: if the metric has a sorting method, use its sortfunc to sort the pod directly, | ||
//otherwise use generalsorter to sort; Then use its corresponding operation method to operate the pod, and calculate the amount of resources released from the corresponding metric until the gap between the corresponding metric and the watermark no longer exists | ||
for _, m := range metricsQuantified { | ||
if m.SortAble { | ||
m.SortFunc(ThrottleDownPods) | ||
} else { | ||
GeneralSorter(ThrottleDownPods) | ||
} | ||
|
||
for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) { | ||
for index, _ := range ThrottleDownPods { | ||
released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased) | ||
ThrottoleDownGapToWaterLines[m] -= released[m] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
About extending user-defined metrics and sorting, it is introduced in "User-defined metrics interference detection avoidance and user-defined sorting". |
53 changes: 53 additions & 0 deletions
53
docs/tutorials/qos-accurately-perform-avoidance-actions.zh.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
### 精确执行回避动作 | ||
通过如下两点进行,避免了对于低优pod的过度操作的同时能够更快地降低指标到指定水位线的差距,保障高优业务不受影响 | ||
1. 排序pod | ||
|
||
|
||
crane实现了一些通用的排序方法(之后会更多地完善): | ||
|
||
classAndPriority: 比较两个pod的QOSClass和class value,优先比较QOSClass,再比较class value;priority高的排在后面优先级更高 | ||
|
||
runningTime:比较两个pod的运行时间,运行时间长的排在后面优先级更高 | ||
|
||
如果仅需使用这两个排序策略,使用默认的排序方法即可:会首先比较pod的优先级,之后比较pod对应指标的用量,之后比较pod的运行时长,有一个维度可以比较出结果即为pod的排序结果 | ||
|
||
以cpu usage指标的排序为例,还扩展了一些与自身指标相关的排序策略, 如cpu usage 使用量的排序,会依次比较两个pod的优先级,如果优先级相同的情况下,再比较cpu用量,如果cpu用量也相同的情况下继续比较扩展cpu资源用量, 最后比较pod的运行时长,当某一个指标存在差异时即可返回比较结果:`orderedBy(classAndPriority, cpuUsage, extCpuUsage, runningTime).Sort(pods)` | ||
|
||
|
||
2. 参考水位线和pod用量执行回避动作 | ||
```go | ||
//将所有触发水位线的metrics根据其Quantified属性区分为两部分 | ||
metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified() | ||
// 如果存在不可Quantified的metric,获取具有最高ActionPriority的一个throttleAble的metric对所选择的所有pod进行操作 | ||
if len(MetricsNotThrottleQuantified) != 0 { | ||
highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric() | ||
t.throttlePods(ctx, &totalReleased, highestPrioriyMetric) | ||
} else { | ||
//获取节点和workload的最新用量,构造和水位线差距 | ||
ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc()) | ||
//如果触发水位线中存在metric的实时用量无法获取,则获取具有最高ActionPriority的一个throttleAble的metric对所选择的所有pod进行压制操作 | ||
if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() { | ||
highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric() | ||
errPodKeys = throttlePods(ctx, &totalReleased, highestPrioriyMetric) | ||
} else { | ||
var released ReleaseResource | ||
//遍历触发水位线的metric中可以量化的metric:如果metric具有排序方法则直接使用其SortFunc对pod进行排序,否则使用GeneralSorter排序; | ||
//之后使用其对应的操作方法对pod执行操作,并计算释放出来的对应metric的资源量,直到对应metric到水位线的差距已不存在 | ||
for _, m := range metricsQuantified { | ||
if m.SortAble { | ||
m.SortFunc(ThrottleDownPods) | ||
} else { | ||
GeneralSorter(ThrottleDownPods) | ||
} | ||
|
||
for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) { | ||
for index, _ := range ThrottleDownPods { | ||
released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased) | ||
ThrottoleDownGapToWaterLines[m] -= released[m] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
关于扩展自定义指标和排序参考 "自定义指标干扰检测回避和自定义排序" 部分 |
49 changes: 49 additions & 0 deletions
49
...utorials/qos-customized-metrics-interference-detection-avoidance-and-sorting.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
## User-defined metrics interference detection avoidance and user-defined sorting | ||
The use of user-defined metrics interference detection avoidance and user-defined sorting is the same as the process described in the "Accurately Perform Avoidance Actions". Here is how to customize your own metrics to participate in the interference detection avoidance process | ||
|
||
In order to better sort and accurately control metrics configured based on NodeQoSEnsurancePolicy, the concept of attributes is introduced into metrics. | ||
|
||
The attributes of metric include the following, and these fields can be realized by customized indicators: | ||
|
||
1. Name Indicates the name of metric, which should be consistent with the metric name collected in the collector module | ||
2. ActionPriority Indicates the priority of the metric. 0 is the lowest and 10 is the highest | ||
3. SortAble Indicates whether the metric can be sorted. If it is true, the corresponding SortFunc needs to be implemented | ||
4. SortFunc The corresponding sorting method. The sorting method can be arranged and combined with some general methods, and then combined with the sorting of the metric itself, which will be introduced in detail below | ||
5. ThrottleAble Indicates whether pod can be suppressed for this metric. For example, for the metric of CPU usage, there are corresponding suppression methods, but for the metric of memory usage, pod can only be evicted, and effective suppression cannot be carried out | ||
6. ThrottleQuantified Indicates whether the amount of resources corresponding to metric released after suppressing (restoring) a pod can be accurately calculated. We call the metric that can be accurately quantified as quantifiable, otherwise it is not quantifiable; | ||
For example, the CPU usage can be suppressed by limiting the CGroup usage, and the CPU usage released after suppression can be calculated by the current running value and the value after suppression; Memory usage does not belong to suppression quantifiable metric, because memory has no corresponding throttle implementation, so it is impossible to accurately measure the specific amount of memory resources released after suppressing a pod; | ||
7. ThrottleFunc The specific method of executing throttle action. If throttle is not available, the returned released is null | ||
8. RestoreFunc After being throttled, the specific method of performing the recovery action. If restore is not allowed, the returned released is null | ||
9. Evictable, EvictQuantified and EvictFunc The relevant definitions of evict action are similar to those of throttle action | ||
|
||
```go | ||
type metric struct { | ||
Name WaterLineMetric | ||
|
||
ActionPriority int | ||
|
||
SortAble bool | ||
SortFunc func(pods []podinfo.PodContext) | ||
|
||
ThrottleAble bool | ||
ThrottleQuantified bool | ||
ThrottleFunc func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) | ||
RestoreFunc func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) | ||
|
||
EvictAble bool | ||
EvictQuantified bool | ||
EvictFunc func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource) | ||
} | ||
``` | ||
|
||
After the construction is completed, register the metric through registerMetricMap() | ||
|
||
For the metrics that need to be customized, you can easily realize the flexible customized sorting of pod by combining the following methods with general sorting methods to represent the customized metric indicators, <metric-sort-func> represents the customized sorting strategy | ||
|
||
```yaml | ||
func <metric>Sorter(pods []podinfo.PodContext) { | ||
orderedBy(classAndPriority, <metric-sort-func>, runningTime).Sort(pods) | ||
} | ||
``` | ||
Among them, the following sorting method `<metric-sort-func>` needs to be implemented | ||
`func (p1, p2 podinfo.PodContext) int32` |
48 changes: 48 additions & 0 deletions
48
...rials/qos-customized-metrics-interference-detection-avoidance-and-sorting.zh.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
## 自定义指标干扰检测回避和自定义排序 | ||
自定义指标干扰检测回避和自定义排序的使用同 精确执行回避动作 部分中介绍的流程,此处介绍如何自定义自己的指标参与干扰检测回避流程 | ||
|
||
为了更好的基于NodeQOSEnsurancePolicy配置的metric进行排序和精准控制,对metric引入属性的概念。 | ||
|
||
metric的属性包含如下几个,自定义的指标实现这些字段即可: | ||
|
||
1. Name 表明了metric的名称,需要同collector模块中收集到的指标名称一致 | ||
2. ActionPriority 表示指标的优先级,0为最低,10为最高 | ||
3. SortAble 表明该指标是否可以排序,如果为true,需实现对应的SortFunc | ||
4. SortFunc 对应的排序方法,排序方法可以排列组合一些通用方法,再结合指标自身的排序,将在下文详细介绍 | ||
5. ThrottleAble 表明针对该指标,是否可以对pod进行压制,例如针对cpu使用量这个metric,就有相对应的压制手段,但是对于memory使用量这种指标,就只能进行pod的驱逐,无法进行有效的压制 | ||
6. ThrottleQuantified 表明压制(restore)一个pod后,能否准确计算出经过压制后释放出的对应metric的资源量,我们将可以准确量化的指标称为可Quantified,否则为不可Quantified; | ||
比如cpu用量,可以通过限制cgroup用量进行压制,同时可以通过当前运行值和压制后的值计算压制后释放的cpu使用量;而比如memory usage就不属于压制可量化metric,因为memory没有对应的throttle实现,也就无法准确衡量压制一个pod后释放出来的memory资源具体用量; | ||
7. ThrottleFunc,执行Throttle动作的具体方法,如果不可Throttle,返回的released为空 | ||
8. RestoreFunc,被Throttle后,执行恢复动作的具体方法,如果不可Restore,返回的released为空 | ||
9. EvictAble,EvictQuantified,EvictFunc 对evict动作的相关定义,具体内容和Throttle动作类似 | ||
|
||
```go | ||
type metric struct { | ||
Name WaterLineMetric | ||
|
||
ActionPriority int | ||
|
||
SortAble bool | ||
SortFunc func(pods []podinfo.PodContext) | ||
|
||
ThrottleAble bool | ||
ThrottleQuantified bool | ||
ThrottleFunc func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) | ||
RestoreFunc func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) | ||
|
||
EvictAble bool | ||
EvictQuantified bool | ||
EvictFunc func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource) | ||
} | ||
``` | ||
|
||
用户可以自行定义自己的metric,在构造完成后,通过registerMetricMap()进行注册 | ||
|
||
针对需要自定义的指标,可以通过实现如下的方法,搭配通用的排序方法即可方便地实现pod的灵活自定义排序,以代表自定义metric指标,<metric-sort-func>代表自定义的针对的排序策略 | ||
```yaml | ||
func <metric>Sorter(pods []podinfo.PodContext) { | ||
orderedBy(classAndPriority, <metric-sort-func>, runningTime).Sort(pods) | ||
} | ||
``` | ||
其中`<metric-sort-func>`需要实现如下的排序方法 | ||
`func (p1, p2 podinfo.PodContext) int32` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
## Dynamic resource oversold enhanced by prediction algorithm | ||
In order to improve the stability, users usually set the request value higher than the actual usage when deploying applications, resulting in a waste of resources. In order to improve the resource utilization of nodes, users will deploy some besteffort applications in combination, using idle resources to realize oversold; | ||
However, due to the lack of resource limit and request constraints and related information in these applications, scheduler may still schedule these pods to nodes with high load, which is inconsistent with our original intention, so it is best to schedule based on the free resources of nodes. | ||
|
||
Crane collects the idle resources of nodes in the following two ways, and takes them as the idle resources of nodes after synthesis, which enhances the accuracy of resource evaluation: | ||
|
||
Take cpu as an example, crane also supports the recovery of memory idle resources. | ||
|
||
1. CPU usage information collected locally | ||
|
||
`nodeCpuCannotBeReclaimed := nodeCpuUsageTotal + exclusiveCPUIdle - extResContainerCpuUsageTotal` | ||
|
||
ExclusiveCPUIdle refers to the idle amount of CPU occupied by the pod whose CPU manager policy is exclusive. Although this part of resources is idle, it cannot be reused because of monopoly, so it is counted as used | ||
|
||
ExtResContainerCpuUsageTotal refers to the CPU consumption used as dynamic resources, which needs to be subtracted to avoid secondary calculation | ||
|
||
2. Create a TSP of node CPU usage, which is automatically created by default, and will predict node CPU usage based on history | ||
```yaml | ||
apiVersion: v1 | ||
data: | ||
spec: | | ||
predictionMetrics: | ||
- algorithm: | ||
algorithmType: dsp | ||
dsp: | ||
estimators: | ||
fft: | ||
- highFrequencyThreshold: "0.05" | ||
lowAmplitudeThreshold: "1.0" | ||
marginFraction: "0.2" | ||
maxNumOfSpectrumItems: 20 | ||
minNumOfSpectrumItems: 10 | ||
historyLength: 3d | ||
sampleInterval: 60s | ||
resourceIdentifier: cpu | ||
type: ExpressionQuery | ||
expressionQuery: | ||
expression: 'sum(count(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}) by (mode, cpu)) - sum(irate(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}[5m]))' | ||
predictionWindowSeconds: 3600 | ||
kind: ConfigMap | ||
metadata: | ||
name: noderesource-tsp-template | ||
namespace: default | ||
``` | ||
Combine the prediction algorithm with the current actual consumption to calculate the remaining available resources of the node, and give it to the node as an extended resource. Pod can indicate that the extended resource is used as an offline job to use the idle resources, so as to improve the resource utilization rate of the node; | ||
How to use: | ||
When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>`, as follows | ||
```yaml | ||
spec: | ||
containers: | ||
- image: nginx | ||
imagePullPolicy: Always | ||
name: extended-resource-demo-ctr | ||
resources: | ||
limits: | ||
gocrane.io/cpu: "2" | ||
gocrane.io/memory: "2000Mi" | ||
requests: | ||
gocrane.io/cpu: "2" | ||
gocrane.io/memory: "2000Mi" | ||
``` | ||
|
||
## Elastic resource restriction function | ||
The native besteffort application lacks a fair guarantee of resource usage. Crane guarantees that the CPU usage of the besteffort pod using dynamic resources is limited within the reasonable range of its allowable use. The agent guarantees that the actual consumption of the pod using extended resources will not exceed its stated limit. At the same time, when the CPU competes, it can also compete fairly according to its stated amount; At the same time, pod using elastic resources will also be managed by the watermark function. | ||
|
||
How to use: | ||
When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>` | ||
|
||
## suitable scene | ||
In order to increase the load of nodes, some offline jobs or less important jobs can be scheduled and deployed to the cluster by using dynamic resources. Such jobs will use idle elastic resources. | ||
With the watermark guarantee of QOS, when the node has a high load, it will be evicted and throttled first, and the utilization of the node will be improved on the premise of ensuring the stability of high-priority services. | ||
See the section "Used with dynamic resources" in qos-interference-detection-and-active-avoidance.md. |
Oops, something went wrong.