-
Notifications
You must be signed in to change notification settings - Fork 385
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add qos doc to site,readme and introduction
- Loading branch information
kaiyuechen
committed
Oct 14, 2022
1 parent
dc22027
commit 710c296
Showing
24 changed files
with
1,012 additions
and
791 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
|
||
--- | ||
title: "QOS" | ||
weight: 9 | ||
description: > | ||
Introduction to QOS related capabilities. | ||
--- |
57 changes: 57 additions & 0 deletions
57
site/content/en/docs/Tutorials/QOS/qos-accurately-perform-avoidance-actions.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
--- | ||
title: "QoS: Accurately Perform Avoidance Actions" | ||
description: "Accurately Perform Avoidance Actions" | ||
weight: 21 | ||
--- | ||
|
||
## Accurately Perform Avoidance Actions | ||
Through the following two points, the excessive operation of low-quality pod can be avoided, and the gap between the metrics and the specified watermark can be reduced faster, so as to ensure that the high-priority service is not affected | ||
1. Sort pod | ||
|
||
Crane implements some general sorting methods (which will be improved later): | ||
|
||
ClassAndPriority: compare the QOSClass and class value of two pods, compare QOSClass first, and then class value; Those with high priority are ranked later and have higher priority | ||
|
||
runningTime: compare the running time of two pods. The one with long running time is ranked later and has higher priority | ||
|
||
If you only need to use these two sorting strategies, you can use the default sorting method: you will first compare the priority of the pod, then compare the consumption of the corresponding indicators of the pod, and then compare the running time of the pod. There is a dimension that can compare the results, that is, the sorting results of the pod | ||
|
||
Taking the ranking of CPU usage metric as an example, it also extends some ranking strategies related to its own metric, such as the ranking of CPU usage, which will compare the priority of two pods in turn. If the priority is the same, then compare the CPU consumption. If the CPU consumption is also the same, continue to compare the extended CPU resource consumption, and finally compare the running time of pod, when there is a difference in an indicator, the comparison result can be returned: `orderedby (classandpriority, CpuUsage, extcpuusage, runningtime) Sort(pods)` | ||
|
||
2. Refer to the watermark and pod usage to perform avoidance action | ||
```go | ||
//Divide all the metrics that trigger the watermark threshold into two parts according to their quantified attribute | ||
metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified() | ||
// If there is a metric that cannot be quantified, obtain the metric of a throttleable with the highest actionpriority to operate on all selected pods | ||
if len(MetricsNotThrottleQuantified) != 0 { | ||
highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric() | ||
t.throttlePods(ctx, &totalReleased, highestPrioriyMetric) | ||
} else { | ||
//Get the latest usage, get the gap to watermark | ||
ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc()) | ||
//If the real-time consumption of metric in the trigger watermark threshold cannot be obtained, chose the metric which is throttleable with the highest actionpriority to suppress all selected pods | ||
if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() { | ||
highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric() | ||
errPodKeys = throttlePods(ctx, &totalReleased, highestPrioriyMetric) | ||
} else { | ||
var released ReleaseResource | ||
//Traverse the quantifiable metrics in the metrics that trigger the watermark: if the metric has a sorting method, use its sortfunc to sort the pod directly, | ||
//otherwise use generalsorter to sort; Then use its corresponding operation method to operate the pod, and calculate the amount of resources released from the corresponding metric until the gap between the corresponding metric and the watermark no longer exists | ||
for _, m := range metricsQuantified { | ||
if m.SortAble { | ||
m.SortFunc(ThrottleDownPods) | ||
} else { | ||
GeneralSorter(ThrottleDownPods) | ||
} | ||
|
||
for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) { | ||
for index, _ := range ThrottleDownPods { | ||
released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased) | ||
ThrottoleDownGapToWaterLines[m] -= released[m] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
About extending user-defined metrics and sorting, it is introduced in "User-defined metrics interference detection avoidance and user-defined sorting". |
55 changes: 55 additions & 0 deletions
55
...ials/QOS/qos-customized-metrics-interference-detection-avoidance-and-sorting.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
title: "QoS: Define your watermark" | ||
description: "How to customized your watermark" | ||
weight: 22 | ||
--- | ||
|
||
## User-defined metrics interference detection avoidance and user-defined sorting | ||
The use of user-defined metrics interference detection avoidance and user-defined sorting is the same as the process described in the "Accurately Perform Avoidance Actions". Here is how to customize your own metrics to participate in the interference detection avoidance process | ||
|
||
In order to better sort and accurately control metrics configured based on NodeQoSEnsurancePolicy, the concept of attributes is introduced into metrics. | ||
|
||
The attributes of metric include the following, and these fields can be realized by customized indicators: | ||
|
||
1. Name Indicates the name of metric, which should be consistent with the metric name collected in the collector module | ||
2. ActionPriority Indicates the priority of the metric. 0 is the lowest and 10 is the highest | ||
3. SortAble Indicates whether the metric can be sorted. If it is true, the corresponding SortFunc needs to be implemented | ||
4. SortFunc The corresponding sorting method. The sorting method can be arranged and combined with some general methods, and then combined with the sorting of the metric itself, which will be introduced in detail below | ||
5. ThrottleAble Indicates whether pod can be suppressed for this metric. For example, for the metric of CPU usage, there are corresponding suppression methods, but for the metric of memory usage, pod can only be evicted, and effective suppression cannot be carried out | ||
6. ThrottleQuantified Indicates whether the amount of resources corresponding to metric released after suppressing (restoring) a pod can be accurately calculated. We call the metric that can be accurately quantified as quantifiable, otherwise it is not quantifiable; | ||
For example, the CPU usage can be suppressed by limiting the CGroup usage, and the CPU usage released after suppression can be calculated by the current running value and the value after suppression; Memory usage does not belong to suppression quantifiable metric, because memory has no corresponding throttle implementation, so it is impossible to accurately measure the specific amount of memory resources released after suppressing a pod; | ||
7. ThrottleFunc The specific method of executing throttle action. If throttle is not available, the returned released is null | ||
8. RestoreFunc After being throttled, the specific method of performing the recovery action. If restore is not allowed, the returned released is null | ||
9. Evictable, EvictQuantified and EvictFunc The relevant definitions of evict action are similar to those of throttle action | ||
|
||
```go | ||
type metric struct { | ||
Name WaterLineMetric | ||
|
||
ActionPriority int | ||
|
||
SortAble bool | ||
SortFunc func(pods []podinfo.PodContext) | ||
|
||
ThrottleAble bool | ||
ThrottleQuantified bool | ||
ThrottleFunc func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) | ||
RestoreFunc func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) | ||
|
||
EvictAble bool | ||
EvictQuantified bool | ||
EvictFunc func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource) | ||
} | ||
``` | ||
|
||
After the construction is completed, register the metric through registerMetricMap() | ||
|
||
For the metrics that need to be customized, you can easily realize the flexible customized sorting of pod by combining the following methods with general sorting methods to represent the customized metric indicators, <metric-sort-func> represents the customized sorting strategy | ||
|
||
```yaml | ||
func <metric>Sorter(pods []podinfo.PodContext) { | ||
orderedBy(classAndPriority, <metric-sort-func>, runningTime).Sort(pods) | ||
} | ||
``` | ||
Among them, the following sorting method `<metric-sort-func>` needs to be implemented | ||
`func (p1, p2 podinfo.PodContext) int32` |
81 changes: 81 additions & 0 deletions
81
site/content/en/docs/Tutorials/QOS/qos-dynamic-resource-oversold-and-limit.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
--- | ||
title: "QoS: Dynamic resource oversold and limit" | ||
description: "How offline jobs use Crane" | ||
weight: 20 | ||
--- | ||
|
||
|
||
## Dynamic resource oversold enhanced by prediction algorithm | ||
In order to improve the stability, users usually set the request value higher than the actual usage when deploying applications, resulting in a waste of resources. In order to improve the resource utilization of nodes, users will deploy some besteffort applications in combination, using idle resources to realize oversold; | ||
However, due to the lack of resource limit and request constraints and related information in these applications, scheduler may still schedule these pods to nodes with high load, which is inconsistent with our original intention, so it is best to schedule based on the free resources of nodes. | ||
|
||
Crane collects the idle resources of nodes in the following two ways, and takes them as the idle resources of nodes after synthesis, which enhances the accuracy of resource evaluation: | ||
|
||
Take cpu as an example, crane also supports the recovery of memory idle resources. | ||
|
||
1. CPU usage information collected locally | ||
|
||
`nodeCpuCannotBeReclaimed := nodeCpuUsageTotal + exclusiveCPUIdle - extResContainerCpuUsageTotal` | ||
|
||
ExclusiveCPUIdle refers to the idle amount of CPU occupied by the pod whose CPU manager policy is exclusive. Although this part of resources is idle, it cannot be reused because of monopoly, so it is counted as used | ||
|
||
ExtResContainerCpuUsageTotal refers to the CPU consumption used as dynamic resources, which needs to be subtracted to avoid secondary calculation | ||
|
||
2. Create a TSP of node CPU usage, which is automatically created by default, and will predict node CPU usage based on history | ||
```yaml | ||
apiVersion: v1 | ||
data: | ||
spec: | | ||
predictionMetrics: | ||
- algorithm: | ||
algorithmType: dsp | ||
dsp: | ||
estimators: | ||
fft: | ||
- highFrequencyThreshold: "0.05" | ||
lowAmplitudeThreshold: "1.0" | ||
marginFraction: "0.2" | ||
maxNumOfSpectrumItems: 20 | ||
minNumOfSpectrumItems: 10 | ||
historyLength: 3d | ||
sampleInterval: 60s | ||
resourceIdentifier: cpu | ||
type: ExpressionQuery | ||
expressionQuery: | ||
expression: 'sum(count(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}) by (mode, cpu)) - sum(irate(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}[5m]))' | ||
predictionWindowSeconds: 3600 | ||
kind: ConfigMap | ||
metadata: | ||
name: noderesource-tsp-template | ||
namespace: default | ||
``` | ||
Combine the prediction algorithm with the current actual consumption to calculate the remaining available resources of the node, and give it to the node as an extended resource. Pod can indicate that the extended resource is used as an offline job to use the idle resources, so as to improve the resource utilization rate of the node; | ||
How to use: | ||
When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>`, as follows | ||
```yaml | ||
spec: | ||
containers: | ||
- image: nginx | ||
imagePullPolicy: Always | ||
name: extended-resource-demo-ctr | ||
resources: | ||
limits: | ||
gocrane.io/cpu: "2" | ||
gocrane.io/memory: "2000Mi" | ||
requests: | ||
gocrane.io/cpu: "2" | ||
gocrane.io/memory: "2000Mi" | ||
``` | ||
|
||
## Elastic resource restriction function | ||
The native besteffort application lacks a fair guarantee of resource usage. Crane guarantees that the CPU usage of the besteffort pod using dynamic resources is limited within the reasonable range of its allowable use. The agent guarantees that the actual consumption of the pod using extended resources will not exceed its stated limit. At the same time, when the CPU competes, it can also compete fairly according to its stated amount; At the same time, pod using elastic resources will also be managed by the watermark function. | ||
|
||
How to use: | ||
When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>` | ||
|
||
## suitable scene | ||
In order to increase the load of nodes, some offline jobs or less important jobs can be scheduled and deployed to the cluster by using dynamic resources. Such jobs will use idle elastic resources. | ||
With the watermark guarantee of QOS, when the node has a high load, it will be evicted and throttled first, and the utilization of the node will be improved on the premise of ensuring the stability of high-priority services. | ||
See the section "Used with dynamic resources" in qos-interference-detection-and-active-avoidance.md. |
38 changes: 38 additions & 0 deletions
38
site/content/en/docs/Tutorials/QOS/qos-enhanced-bypass-cpuset-management.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
--- | ||
title: "QoS: Enhanced bypass cpuset management capability" | ||
description: "Enhanced bypass cpuset management capability" | ||
weight: 23 | ||
--- | ||
|
||
## Enhanced bypass cpuset management capability | ||
Kubelet supports the static CPU manager strategy. When the guaranteed pod runs on the node, kebelet will allocate the specified dedicated CPU for the pod, which cannot be occupied by other processes. This ensures the CPU monopoly of the guaranteed pod, but also causes the low utilization of CPU and nodes, resulting in a certain waste. | ||
Crane agent provides a new strategy for cpuset management, allowing pod and other pod to share CPU. When it specifies CPU binding core, it can make use of the advantages of less context switching and higher cache affinity of binding core, and also allow other workload to deploy and share, so as to improve resource utilization. | ||
|
||
1. Three types of pod cpuset are provided: | ||
|
||
- Exclusive: after binding the core, other containers can no longer use the CPU and monopolize the CPU | ||
- Share: other containers can use the CPU after binding the core | ||
- None: select the CPU that is not occupied by the container of exclusive pod, can use the binding core of share type | ||
|
||
Share type binding strategy can make use of the advantages of less context switching and higher cache affinity, and can also be shared by other workload deployments to improve resource utilization | ||
|
||
2. Relax the restrictions on binding cores in kubelet | ||
|
||
Originally, it was required that the CPU limit of all containers be equal to the CPU request. Here, it is only required that the CPU limit of any container be greater than or equal to 1 and equal to the CPU request to set the binding core for the container | ||
|
||
|
||
3. Support modifying the cpuset policy of pod during the running of pod, which will take effect immediately | ||
|
||
The CPU manager policy of pod is converted from none to share and from exclusive to share without restart | ||
|
||
How to use: | ||
1. Set the cpuset manager of kubelet to "None" | ||
2. Set CPU manager policy through pod annotation | ||
`qos.gocrane.io/cpu-manager: none/exclusive/share` | ||
```yaml | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
annotations: | ||
qos.gocrane.io/cpu-manager: none/exclusive/share | ||
``` |
Oops, something went wrong.