Add qos doc to site,readme and introduction

gocrane · Oct 14, 2022 · 710c296 · 710c296
1 parent dc22027
commit 710c296
Show file tree

Hide file tree

Showing 24 changed files with 1,012 additions and 791 deletions.
diff --git a/README.md b/README.md
@@ -46,8 +46,9 @@ EffectiveHorizontalPodAutoscaler supports prediction-driven autoscaling. With th
 
 Provide a simple but efficient scheduler that schedule pods based on actual node utilization data，and filters out those nodes with high load to balance the cluster. [learn more](docs/tutorials/scheduling-pods-based-on-actual-node-load.md).
 
-**Colocation with Enhanced QoS**
+**Colocation with Enhanced QOS**
 
+QOS-related capabilities ensure the running stability of Pods on Kubernetes. It has the ability of interference detection and active avoidance under the condition of multi-dimensional metrics, and supports reasonable operation and custom metrics access; it has the ability to oversell elastic resources enhanced by the prediction algorithm, reuse and limit the idle resources in the cluster; it has the enhanced bypass cpuset Management capabilities, improve resource utilization efficiency while binding cores. [learn more](docs/tutorials/using-qos-ensurance.md).
 
 ## Architecture
 

diff --git a/README_zh.md b/README_zh.md
@@ -46,8 +46,9 @@ EffectiveHorizontalPodAutoscaler 支持了预测驱动的弹性。它基于社
 
 动态调度器根据实际的节点利用率构建了一个简单但高效的模型，并过滤掉那些负载高的节点来平衡集群。[了解更多](docs/tutorials/scheduling-pods-based-on-actual-node-load.zh.md)。
 
-**基于 QoS 的混部**
+**基于 QOS 的混部**
 
+QOS相关能力保证了运行在 Kubernetes 上的 Pod 的稳定性。具有多维指标条件下的干扰检测和主动回避能力，支持精确操作和自定义指标接入；具有预测算法增强的弹性资源超卖能力，复用和限制集群内的空闲资源；具备增强的旁路cpuset管理能力，在绑核的同时提升资源利用效率。[了解更多](docs/tutorials/using-qos-ensurance.zh.md)。
 
 ## 架构
 

diff --git a/docs/tutorials/qos-interference-detection-and-active-avoidance.md b/docs/tutorials/qos-interference-detection-and-active-avoidance.md
@@ -83,9 +83,6 @@ metadata:
 spec:
   allowedActions:
     - disablescheduling
-  resourceQOS:
-    cpuQOS:
-      cpuPriority: 7
   labelSelector:
     matchLabels:
       preemptible_job: "true"

diff --git a/site/content/en/docs/Getting started/introduction.md b/site/content/en/docs/Getting started/introduction.md
@@ -39,8 +39,9 @@ EffectiveHorizontalPodAutoscaler supports prediction-driven autoscaling. With th
 
 Provide a simple but efficient scheduler that schedule pods based on actual node utilization data，and filters out those nodes with high load to balance the cluster. [learn more](/docs/tutorials/scheduling-pods-based-on-actual-node-load).
 
-**Colocation with Enhanced QoS**
+**Colocation with Enhanced QOS**
 
+QOS-related capabilities ensure the running stability of Pods on Kubernetes. It has the ability of interference detection and active avoidance under the condition of multi-dimensional metrics, and supports reasonable operation and custom metrics access; it has the ability to oversell elastic resources enhanced by the prediction algorithm, reuse and limit the idle resources in the cluster; it has the enhanced bypass cpuset Management capabilities, improve resource utilization efficiency while binding cores. [learn more](/docs/tutorials/using-qos-ensurance.md).
 
 ## Architecture
 

diff --git a/site/content/en/docs/Tutorials/QOS/_index.md b/site/content/en/docs/Tutorials/QOS/_index.md
@@ -0,0 +1,7 @@
+
+---
+title: "QOS"
+weight: 9
+description: >
+  Introduction to QOS related capabilities.
+---
diff --git a/site/content/en/docs/Tutorials/QOS/qos-accurately-perform-avoidance-actions.md b/site/content/en/docs/Tutorials/QOS/qos-accurately-perform-avoidance-actions.md
@@ -0,0 +1,57 @@
+---
+title: "QoS: Accurately Perform Avoidance Actions"
+description: "Accurately Perform Avoidance Actions"
+weight: 21
+---
+
+## Accurately Perform Avoidance Actions
+Through the following two points, the excessive operation of low-quality pod can be avoided, and the gap between the metrics and the specified watermark can be reduced faster, so as to ensure that the high-priority service is not affected
+1. Sort pod
+
+Crane implements some general sorting methods (which will be improved later):
+
+ClassAndPriority: compare the QOSClass and class value of two pods, compare QOSClass first, and then class value; Those with high priority are ranked later and have higher priority
+
+runningTime: compare the running time of two pods. The one with long running time is ranked later and has higher priority
+
+If you only need to use these two sorting strategies, you can use the default sorting method: you will first compare the priority of the pod, then compare the consumption of the corresponding indicators of the pod, and then compare the running time of the pod. There is a dimension that can compare the results, that is, the sorting results of the pod
+
+Taking the ranking of CPU usage metric as an example, it also extends some ranking strategies related to its own metric, such as the ranking of CPU usage, which will compare the priority of two pods in turn. If the priority is the same, then compare the CPU consumption. If the CPU consumption is also the same, continue to compare the extended CPU resource consumption, and finally compare the running time of pod, when there is a difference in an indicator, the comparison result can be returned: `orderedby (classandpriority, CpuUsage, extcpuusage, runningtime) Sort(pods)`
+
+2. Refer to the watermark and pod usage to perform avoidance action
+```go
+//Divide all the metrics that trigger the watermark threshold into two parts according to their quantified attribute
+metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified()
+// If there is a metric that cannot be quantified, obtain the metric of a throttleable with the highest actionpriority to operate on all selected pods
+if len(MetricsNotThrottleQuantified) != 0 {
+    highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric()
+    t.throttlePods(ctx, &totalReleased, highestPrioriyMetric)
+} else {
+    //Get the latest usage, get the gap to watermark
+    ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc())
+    //If the real-time consumption of metric in the trigger watermark threshold cannot be obtained, chose the metric which is throttleable with the highest actionpriority to suppress all selected pods
+    if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() {
+        highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric()
+        errPodKeys = throttlePods(ctx, &totalReleased, highestPrioriyMetric)
+    } else {
+        var released ReleaseResource
+        //Traverse the quantifiable metrics in the metrics that trigger the watermark: if the metric has a sorting method, use its sortfunc to sort the pod directly, 
+        //otherwise use generalsorter to sort; Then use its corresponding operation method to operate the pod, and calculate the amount of resources released from the corresponding metric until the gap between the corresponding metric and the watermark no longer exists
+        for _, m := range metricsQuantified {
+            if m.SortAble {
+                m.SortFunc(ThrottleDownPods)
+            } else {
+                GeneralSorter(ThrottleDownPods)
+            }
+
+            for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) {
+                for index, _ := range ThrottleDownPods {
+                    released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased)
+                    ThrottoleDownGapToWaterLines[m] -= released[m]
+                }
+            }
+        }
+    }
+}
+```
+About extending user-defined metrics and sorting, it is introduced in "User-defined metrics interference detection avoidance and user-defined sorting".
diff --git a/...ials/QOS/qos-customized-metrics-interference-detection-avoidance-and-sorting.md b/...ials/QOS/qos-customized-metrics-interference-detection-avoidance-and-sorting.md
@@ -0,0 +1,55 @@
+---
+title: "QoS: Define your watermark"
+description: "How to customized your watermark"
+weight: 22
+---
+
+## User-defined metrics interference detection avoidance and user-defined sorting
+The use of user-defined metrics interference detection avoidance and user-defined sorting is the same as the process described in the "Accurately Perform Avoidance Actions". Here is how to customize your own metrics to participate in the interference detection avoidance process
+
+In order to better sort and accurately control metrics configured based on NodeQoSEnsurancePolicy, the concept of attributes is introduced into metrics.
+
+The attributes of metric include the following, and these fields can be realized by customized indicators:
+
+1. Name Indicates the name of metric, which should be consistent with the metric name collected in the collector module
+2. ActionPriority Indicates the priority of the metric. 0 is the lowest and 10 is the highest
+3. SortAble Indicates whether the metric can be sorted. If it is true, the corresponding SortFunc needs to be implemented
+4. SortFunc The corresponding sorting method. The sorting method can be arranged and combined with some general methods, and then combined with the sorting of the metric itself, which will be introduced in detail below
+5. ThrottleAble Indicates whether pod can be suppressed for this metric. For example, for the metric of CPU usage, there are corresponding suppression methods, but for the metric of memory usage, pod can only be evicted, and effective suppression cannot be carried out
+6. ThrottleQuantified Indicates whether the amount of resources corresponding to metric released after suppressing (restoring) a pod can be accurately calculated. We call the metric that can be accurately quantified as quantifiable, otherwise it is not quantifiable;
+   For example, the CPU usage can be suppressed by limiting the CGroup usage, and the CPU usage released after suppression can be calculated by the current running value and the value after suppression; Memory usage does not belong to suppression quantifiable metric, because memory has no corresponding throttle implementation, so it is impossible to accurately measure the specific amount of memory resources released after suppressing a pod;
+7. ThrottleFunc The specific method of executing throttle action. If throttle is not available, the returned released is null
+8. RestoreFunc After being throttled, the specific method of performing the recovery action. If restore is not allowed, the returned released is null
+9. Evictable, EvictQuantified and EvictFunc The relevant definitions of evict action are similar to those of throttle action
+
+```go
+type metric struct {
+	Name WaterLineMetric
+
+	ActionPriority int
+
+	SortAble bool
+	SortFunc func(pods []podinfo.PodContext)
+
+	ThrottleAble      bool
+	ThrottleQuantified bool
+	ThrottleFunc      func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource)
+	RestoreFunc       func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource)
+
+	EvictAble      bool
+	EvictQuantified bool
+	EvictFunc      func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource)
+}
+```
+
+After the construction is completed, register the metric through registerMetricMap()
+
+For the metrics that need to be customized, you can easily realize the flexible customized sorting of pod by combining the following methods with general sorting methods to represent the customized metric indicators, <metric-sort-func> represents the customized sorting strategy
+
+```yaml
+func <metric>Sorter(pods []podinfo.PodContext) {
+  orderedBy(classAndPriority, <metric-sort-func>, runningTime).Sort(pods)
+}
+```
+Among them, the following sorting method `<metric-sort-func>` needs to be implemented
+`func (p1, p2 podinfo.PodContext) int32` 
diff --git a/site/content/en/docs/Tutorials/QOS/qos-dynamic-resource-oversold-and-limit.md b/site/content/en/docs/Tutorials/QOS/qos-dynamic-resource-oversold-and-limit.md
@@ -0,0 +1,81 @@
+---
+title: "QoS: Dynamic resource oversold and limit"
+description: "How offline jobs use Crane"
+weight: 20
+---
+
+
+## Dynamic resource oversold enhanced by prediction algorithm
+In order to improve the stability, users usually set the request value higher than the actual usage when deploying applications, resulting in a waste of resources. In order to improve the resource utilization of nodes, users will deploy some besteffort applications in combination, using idle resources to realize oversold;
+However, due to the lack of resource limit and request constraints and related information in these applications, scheduler may still schedule these pods to nodes with high load, which is inconsistent with our original intention, so it is best to schedule based on the free resources of nodes.
+
+Crane collects the idle resources of nodes in the following two ways, and takes them as the idle resources of nodes after synthesis, which enhances the accuracy of resource evaluation:
+
+Take cpu as an example, crane also supports the recovery of memory idle resources.
+
+1. CPU usage information collected locally
+
+`nodeCpuCannotBeReclaimed := nodeCpuUsageTotal + exclusiveCPUIdle - extResContainerCpuUsageTotal`
+
+ExclusiveCPUIdle refers to the idle amount of CPU occupied by the pod whose CPU manager policy is exclusive. Although this part of resources is idle, it cannot be reused because of monopoly, so it is counted as used
+
+ExtResContainerCpuUsageTotal refers to the CPU consumption used as dynamic resources, which needs to be subtracted to avoid secondary calculation
+
+2. Create a TSP of node CPU usage, which is automatically created by default, and will predict node CPU usage based on history
+```yaml
+apiVersion: v1
+data:
+  spec: |
+    predictionMetrics:
+    - algorithm:
+        algorithmType: dsp
+        dsp:
+          estimators:
+            fft:
+            - highFrequencyThreshold: "0.05"
+              lowAmplitudeThreshold: "1.0"
+              marginFraction: "0.2"
+              maxNumOfSpectrumItems: 20
+              minNumOfSpectrumItems: 10
+          historyLength: 3d
+          sampleInterval: 60s
+      resourceIdentifier: cpu
+      type: ExpressionQuery
+      expressionQuery:
+        expression: 'sum(count(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}) by (mode, cpu)) - sum(irate(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}[5m]))'
+    predictionWindowSeconds: 3600
+kind: ConfigMap
+metadata:
+  name: noderesource-tsp-template
+  namespace: default
+```
+
+Combine the prediction algorithm with the current actual consumption to calculate the remaining available resources of the node, and give it to the node as an extended resource. Pod can indicate that the extended resource is used as an offline job to use the idle resources, so as to improve the resource utilization rate of the node;
+
+How to use:
+When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>`, as follows
+```yaml
+spec: 
+   containers:
+   - image: nginx
+     imagePullPolicy: Always
+     name: extended-resource-demo-ctr
+     resources:
+       limits:
+         gocrane.io/cpu: "2"
+         gocrane.io/memory: "2000Mi"
+       requests:
+         gocrane.io/cpu: "2"
+         gocrane.io/memory: "2000Mi"
+```
+
+## Elastic resource restriction function
+The native besteffort application lacks a fair guarantee of resource usage. Crane guarantees that the CPU usage of the besteffort pod using dynamic resources is limited within the reasonable range of its allowable use. The agent guarantees that the actual consumption of the pod using extended resources will not exceed its stated limit. At the same time, when the CPU competes, it can also compete fairly according to its stated amount; At the same time, pod using elastic resources will also be managed by the watermark function.
+
+How to use:
+When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>`
+
+## suitable scene
+In order to increase the load of nodes, some offline jobs or less important jobs can be scheduled and deployed to the cluster by using dynamic resources. Such jobs will use idle elastic resources.
+With the watermark guarantee of QOS, when the node has a high load, it will be evicted and throttled first, and the utilization of the node will be improved on the premise of ensuring the stability of high-priority services.
+See the section "Used with dynamic resources" in qos-interference-detection-and-active-avoidance.md.
diff --git a/site/content/en/docs/Tutorials/QOS/qos-enhanced-bypass-cpuset-management.md b/site/content/en/docs/Tutorials/QOS/qos-enhanced-bypass-cpuset-management.md
@@ -0,0 +1,38 @@
+---
+title: "QoS: Enhanced bypass cpuset management capability"
+description: "Enhanced bypass cpuset management capability"
+weight: 23
+---
+
+## Enhanced bypass cpuset management capability
+Kubelet supports the static CPU manager strategy. When the guaranteed pod runs on the node, kebelet will allocate the specified dedicated CPU for the pod, which cannot be occupied by other processes. This ensures the CPU monopoly of the guaranteed pod, but also causes the low utilization of CPU and nodes, resulting in a certain waste.
+Crane agent provides a new strategy for cpuset management, allowing pod and other pod to share CPU. When it specifies CPU binding core, it can make use of the advantages of less context switching and higher cache affinity of binding core, and also allow other workload to deploy and share, so as to improve resource utilization.
+
+1. Three types of pod cpuset are provided:
+
+- Exclusive: after binding the core, other containers can no longer use the CPU and monopolize the CPU
+- Share: other containers can use the CPU after binding the core
+- None: select the CPU that is not occupied by the container of exclusive pod, can use the binding core of share type
+
+Share type binding strategy can make use of the advantages of less context switching and higher cache affinity, and can also be shared by other workload deployments to improve resource utilization
+
+2. Relax the restrictions on binding cores in kubelet
+
+Originally, it was required that the CPU limit of all containers be equal to the CPU request. Here, it is only required that the CPU limit of any container be greater than or equal to 1 and equal to the CPU request to set the binding core for the container
+
+
+3. Support modifying the cpuset policy of pod during the running of pod, which will take effect immediately
+
+The CPU manager policy of pod is converted from none to share and from exclusive to share without restart
+
+How to use:
+1. Set the cpuset manager of kubelet to "None"
+2. Set CPU manager policy through pod annotation
+   `qos.gocrane.io/cpu-manager: none/exclusive/share`
+   ```yaml
+   apiVersion: v1
+   kind: Pod
+   metadata:
+     annotations:
+       qos.gocrane.io/cpu-manager: none/exclusive/share
+   ```