diff --git a/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent-CN.md b/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent-CN.md new file mode 100644 index 000000000..92814f26f --- /dev/null +++ b/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent-CN.md @@ -0,0 +1,272 @@ +# Pod Sorting And Precise Execution For Crane Agent +该proposal丰富了crane-agent的排序策略,完善了通用排序。并且实现了一套精准操作(压制/驱逐)的框架,在执行压制/驱逐等操作时,操作到用户指定的水位线即停止的精确操作逻辑,避免了对于低优pod的过度操作; + +具体来说: +- 丰富了crane-agent的排序策略,完善了通用排序和cpu usage为主要参考的cpu维度排序; + +- 针对cpu usage,实现了执行压制/驱逐等操作时,操作到用户指定的水位线即停止的精确操作逻辑,避免了对于低优pod的过度操作; + +- 实现了一套精确操作(压制/驱逐)的框架,通过完善自定义指标的一些列属性和实现,即可在无需关心具体细节的情况下,同样具有同cpu usage一样的精确操作能力,具有一定的普适性和扩展性。 + +## Table of Contents + + + +- [Pod Sorting And Precise Execution For Crane Agent](#Pod Sorting And Precise Execution For Crane Agent) + - [Table of Contents](#table-of-contents) + - [Motivation](#motivation) + - [Goals](#goals) + - [Proposal](#proposal) + - [丰富pod的排序策略](#丰富pod的排序策略) + - [metric属性的定义](#metric属性的定义) + - [如何根据水位线进行精准控制](#如何根据水位线进行精准控制) + - [以水位线为基准进行pod的精确操作](#以水位线为基准进行pod的精确操作) + - [analyzer阶段](#analyzer阶段) + - [executor阶段](#executor阶段) + - [Non-Goals/Future Work](#non-goalsfuture-work) + - [User Stories](#user-stories) + + +## Motivation +当前在crane-agent中,当超过NodeQOSEnsurancePolicy中指定的水位线后,执行evict,throttle等操作时先对低优先级的pod进行排序,当前排序的依据是pod的ProrityClass,然后在排序的pod进行throttle或者evict操作; + +目前存在的问题有: + +1. 排序只参考ProrityClass,无法满足基于其他特性的排序;同时也无法满足按照水位线精确操作对灵活排序的需求,无法满足尽快让节点达到指定的水位线的要求。例如我们希望尽快降低低优先级业务的cpu使用量时,应该选出cpu使用量较多的pod,这样能够更快地降低cpu用量,保障高优业务不受影响。 + +2. 在触发NodeQOSEnsurancePolicy中指定的水位线后,会对于节点上的所有低于指定ProrityClass的pod进行操作;例如,当前节点上有10个pod低于指定ProrityClass,在触发水位线后,会对这10个pod都进行操作,但是实际上可能在操作完成对第一个pod的操作后就可以低于NodeQOSEnsurancePolicy中的指标值了,对剩下的pod的操作,属于过度操作,是可以避免的。如果能以NodeQOSEnsurancePolicy中的指标值作为水位线对pod进行精确的操作,操作到刚好低于水位线是更为合适的,就能避免对低优先级服务的过度影响。 + +### Goals + +- 丰富了crane-agent的排序策略,包括以pod cpu用量为主要参照的排序,以pod内存用量为主要参照的排序,基于运行时间的排序,基于扩展资源使用率的排序。 +- 实现一套包含排序和精确操作的框架,支持对不同的指标丰富排序规则,并且实现精确操作。 +- 实现针对cpu usage和memmory usage的精确操作,当整机负载超过NodeQOSEnsurancePolicy中指定的水位线后,会先对低优先级的pod进行排序,然后按照顺序操作到刚好低于水位线为止。 + +## Proposal + +### 丰富pod的排序策略 + +- 该proposal实现了一些通用的排序方法(之后会更多地完善): + + classAndPriority: 比较两个pod的QOSClass和class value,优先比较QOSClass,再比较class value;priority高的排在后面优先级更高 + + runningTime:比较两个pod的运行时间,运行时间长的排在后面优先级更高 + + 如果仅需使用这两个排序策略,使用默认的排序方法即可:会首先比较pod的优先级,之后比较pod对应指标的用量,之后比较pod的运行时长,有一个维度可以比较出结果即为pod的排序结果 + ```go + func GeneralSorter(pods []podinfo.PodContext) { + orderedBy(classAndPriority, runningTime).Sort(pods) + } + ``` + +- cpu usage 使用量的排序 + + 会依次比较两个pod的优先级,如果优先级相同的情况下,再比较cpu用量,如果cpu用量也相同的情况下继续比较ext cpu资源用量(这个是cpu属性较为特殊的一点), 最后比较pod的运行时长,当某一个指标存在差异时即可返回比较结果 + + ```go + func CpuUsageSorter(pods []podinfo.PodContext) { + orderedBy(classAndPriority, cpuUsage, extCpuUsage, runningTime).Sort(pods) + } + ``` + +- ext cpu usage 使用量的排序 + + 会首先比较两个pod是否使用了扩展的cpu资源,在都使用了的情况下,比较 扩展cpu资源使用量/ 扩展cpu资源limit的比值 + + +- 针对需要自定义的指标,可以通过实现如下的方法,并且随意搭配通用的排序方法即可方便地实现pod的灵活自定义排序,以代表自定义metric指标,代表自定义的针对的排序策略 + ```go + func Sorter(pods []podinfo.PodContext) { + orderedBy(classAndPriority, , runningTime).Sort(pods) + } + ``` + 其中只需要实现如下的排序方法即可 + ```go + func (p1, p2 podinfo.PodContext) int32 + ``` + + +### metric属性的定义 + +为了更好的基于NodeQOSEnsurancePolicy配置的metric进行排序和精准控制,对metric引入属性的概念。 + +metric的属性包含如下几个: +1. Name 表明了metric的名称,需要同collector模块中收集到的指标名称一致 +2. ActionPriority 表示指标的优先级,0为最低,10为最高 +3. SortAble 表明该指标是否可以排序 +4. SortFunc 对应的排序方法,排序方法可以排列组合一些通用方法,再结合指标自身的排序,将在下文详细介绍 +5. ThrottleAble 表明针对该指标,是否可以对pod进行压制,例如针对cpu使用量这个metric,就有相对应的压制手段,但是对于memory使用量这种指标,就只能进行pod的驱逐,无法进行有效的压制 +6. ThrottleQuantified 表明压制(restore)一个pod后,能否准确计算出经过压制后释放出的对应metric的资源量,我们将可以准确量化的指标称为可Quantified,否则为不可Quantified; + 比如cpu用量,可以通过限制cgroup用量进行压制,同时可以通过当前运行值和压制后的值计算压制后释放的cpu使用量;而比如memory usage就不属于压制可量化metric,因为memory没有对应的throttle实现,也就无法准确衡量压制一个pod后释放出来的memory资源具体用量; +7. ThrottleFunc,执行Throttle动作的具体方法,如果不可Throttle,返回的released为空 +8. RestoreFunc,被Throttle后,执行恢复动作的具体方法,如果不可Throttle,返回的released为空 +9. EvictAble,EvictQuantified,EvictFunc 对evict动作的相关定义,具体内容和Throttle动作类似 + + +```go +type metric struct { + Name WaterLineMetric + + ActionPriority int + + SortAble bool + SortFunc func(pods []podinfo.PodContext) + + ThrottleAble bool + ThrottleQuantified bool + ThrottleFunc func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) + RestoreFunc func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) + + EvictAble bool + EvictQuantified bool + EvictFunc func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource) +} +``` + +用户可以自行定义自己的metric,在构造完成后,通过registerMetricMap()进行注册即可 + +### 如何根据水位线进行精准控制 + +- 根据多个NodeQOSEnsurancePolicy及其中的objectiveEnsurances构建多条水位线: + 1. 按照objectiveEnsurances对应的action进行分类,目前crane-agent有3个针对节点Qos进行保障的操作,分别是Evict,ThtottleDown(当前用量高于objectiveEnsurances中的值时对pod进行用量压制)和ThrottleUp(当前用量低于objectiveEnsurances中的值时对pod的用量进行放宽恢复),因此会有三个水位线集合,分别是 + ThrottleDownWaterLine,ThrottleUpWaterLine和EvictWaterLine + + 2. 再对同一操作种类中的水位线按照其metric rule(图中以metric A,metric Z作为示意)进行分类,并记录每个objectiveEnsurances水位线的值,记为waterLine; + + ThrottleDownWaterLine,ThrottleUpWaterLine和EvictWaterLine的结构是这样的: + `type WaterLines map[WaterLineMetric]*WaterLine` + + 其中WaterLineMetric就是上面的metric的Name字段,value的WaterLine就是资源数值 + `type WaterLine resource.Quantity` + + 最终形成一个类似下图的数据存储: + ![](waterline-construct.png) + +- 构造实时用量到水位线的差值: +结合当前节点的指标实时用量与WaterLines中该指标对应的水位线中最小值的差值构造如下的数据结构,代表到当前用量到水位线的差值 + `type GapToWaterLines map[WaterLineMetric]float64` + + 其中key值为metric的Name字段,value为用量到水位线的差值; + + 需要注意对于ThrottleUp,需要用水位线最小值-当前用量作为gap值,对于其他两者,使用当前用量-水位线最小值作为gap值,即始终保持gap值为正 + + 下面三个数据分别代表了需要执行evict,ThtottleDown和ThrottleUp操作的指标及其对应的到最低水位线的差值 + ```go + EvictGapToWaterLines[metrics] + ThrottoleDownGapToWaterLines[metrics] + ThrottleUpGapWaterLine[metrics] + ``` + +- 以CpuUsage这个metric为例,构造节点cpu用量相关的waterline的流程和相关数据结构如下: + ![](cpu-usage-water-line.png) + +### 以水位线为基准进行pod的精确操作 +该proposal为了实现以水位线为基准进行pod的精确操作,将对analyzer部分和executor部分做一定的修改,大体流程是: + +在analyzer阶段构造针对不同操作(驱逐,压制等)和不同metric的水位线,将原先的排序逻辑删除,后移到需要进行正式操作的executor阶段,并且可能会需要进行多轮排序; + +在executor阶段,根据水位线中的涉及的指标进行其相应的排序,获取最新用量,构造GapToWaterLines,并进行精确操作 + +#### analyzer阶段 +在该阶段进行NodeQOSEnsurancePolicy到WaterLines的转换,并对相同actionName和metricrule的规则进行合并,具体内容上文已经介绍过了 + +#### executor阶段 +压制过程: + +1. 首先分析ThrottoleDownGapToWaterLines中涉及的metrics,将这些metrics根据其Quantified属性区分为两部分,如果存在不可Quantified的metric,则通过GetHighestPriorityThrottleAbleMetric获取具有最高ActionPriority的一个throttleAble(具有throttleFunc)的metric对所选择的所有pod进行压制操作,因为但凡存在一个不可Quantified的metric,就无法进行精确的操作 + +2. 通过getStateFunc()获取当前节点和workload的最新用量,依据ThrottoleDownGapToWaterLines和实时用量构造GapToWaterLine(需要注意的是,在构造GapToWaterLine时,会以注册过的metric进行遍历,所以最终构造出来的GapToWaterLine中的metrics,会是ThrottoleDownGapToWaterLines + 中注册过的metric,避免了在NodeQOSEnsurancePolicy中配置错误不存在或未注册metric的情况) + +3. 如果GapToWaterLine中有metric的实时用量无法获取(HasUsageMissedMetric),则通过GetHighestPriorityThrottleAbleMetric获取具有最高ActionPriority的一个throttleAble(具有throttleFunc)的metric对所选择的所有pod进行压制操作,因为如果存在metric实时用量无法获取,就无法获知和水位线的gap,也就无法进行精确的操作 + +4. 如果不存在3中的情况,则遍历ThrottoleDownGapToWaterLines中可以量化的metric:如果metric具有排序方法则直接使用其SortFunc对pod进行排序,如果没有就使用GeneralSorter进行排序,之后使用其对应的ThrottleFunc对pod进行压制,并计算释放出来的对应metric的资源量,直到ThrottoleDownGapToWaterLines中该metric对应的gap已不存在 +```go +metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified() +if len(MetricsNotThrottleQuantified) != 0 { + highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = t.throttlePods(ctx, &totalReleased, highestPrioriyMetric) + } +} else { + ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc()) + if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() { + highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = throttlePods(ctx, &totalReleased, highestPrioriyMetric) + } + } else { + var released ReleaseResource + for _, m := range metricsQuantified { + if m.SortAble { + m.SortFunc(ThrottleDownPods) + } else { + GeneralSorter(ThrottleDownPods) + } + + for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) { + for index, _ := range ThrottleDownPods { + errKeys, released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased) + errPodKeys = append(errPodKeys, errKeys...) + ThrottoleDownGapToWaterLines[m] -= released[m] + } + } + } + } +} +``` + +驱逐过程: + +驱逐和压制的流程是一样的,除了在对pod进行操作的时候需要额外判断一下pod是否已经被驱逐了;取出一个没有执行过的pod,执行驱逐操作,并计算释放出的各metric资源量,同时在对应水位线中减去释放的值,直到满足当前metric水位线要求 +```go +metricsEvictQuantified, MetricsNotEvcitQuantified := EvictWaterLine.DivideMetricsByEvictQuantified() + +if len(MetricsNotEvcitQuantified) != 0 { + highestPrioriyMetric := e.EvictWaterLine.GetHighestPriorityEvictAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = e.evictPods(ctx, &totalReleased, highestPrioriyMetric) + } +} else { + EvictGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc(), ThrottleExecutor{}, *e) + if EvictGapToWaterLines.HasUsageMissedMetric() { + highestPrioriyMetric := EvictWaterLine.GetHighestPriorityEvictAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = e.evictPods(ctx, &totalReleased, highestPrioriyMetric) + } + } else { + wg := sync.WaitGroup{} + var released ReleaseResource + for _, m := range metricsEvictQuantified { + if MetricMap[m].SortAble { + MetricMap[m].SortFunc(e.EvictPods) + } else { + execsort.GeneralSorter(e.EvictPods) + } + + for !EvictGapToWaterLines.TargetGapsRemoved(m) { + if podinfo.HasNoExecutedPod(e.EvictPods) { + index := podinfo.GetFirstNoExecutedPod(e.EvictPods) + errKeys, released = MetricMap[m].EvictFunc(&wg, ctx, index, &totalReleased, e.EvictPods) + errPodKeys = append(errPodKeys, errKeys...) + + e.EvictPods[index].HasBeenActioned = true + ctx.EvictGapToWaterLines[m] -= released[m] + } + } + } + wg.Wait() + } + +} +``` + +### Non-Goals/Future Work + +- 当前只支持cpu usage的精确操作,但是框架可以复用,后续可以基于精准控制的框架,实现更多维度指标的精准控制。 +- 在做精准控制时,目前只考虑metric本身释放量,未考虑不同metric之间的相互影响。比如压制cpu usage时,memory usage也会受到影响。如果指标非常多,不同指标之间的关系会非常复杂,所以暂时不考虑不同metric直接的相互影响。 + +### User Stories + +- 用户可以使用crane-agent进行更好的QoS保障。支持更快速的降低节点负载,以保障高优先级业务不受影响。同时对低优先级业务的压制/驱逐动作,进行精确控制,避免过度操作。 +- 用户可以借助实现的精准操作(压制/驱逐)的框架,在无需关心细节的情况下,通过实现自定义metric相关的属性和方法,即可方便地实现以自定义metric为核心的具有精确操作和排序能力的QoS功能。 \ No newline at end of file diff --git a/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent.md b/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent.md new file mode 100644 index 000000000..85a575e39 --- /dev/null +++ b/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent.md @@ -0,0 +1,275 @@ +# Pod Sorting And Precise Execution For Crane Agent +The proposal enriches the sorting strategy of the crane agent and perfects the general sorting. In addition, a framework of precise operation (throttle/eviction) is implemented. When performing throttle, eviction and other operations, the precise operation logic of operating to the water level specified by the user, i.e. stopping, avoids excessive operation of low optimal pod; + +Specifically: + +- Enriches the sorting strategy of crane agent, and perfects the general sorting and CPU dimension sorting with CPU usage as the main reference; + +- For CPU usage, the precise operation logic that stops when operating to the water level specified by the user when throttle/eviction is implemented, which avoids the excessive operation of low optimal pod; + +- A framework of precise operation (throttle/eviction) is implemented. By improving some column attributes and implementation of user-defined indicators, it can also have the same precise operation ability as CPU usage without caring about specific details, and has certain universality and scalability. + +## Table of Contents + + + +- [Pod Sorting And Precise Execution For Crane Agent](#Pod Sorting And Precise Execution For Crane Agent) + - [Table of Contents](#table-of-contents) + - [Motivation](#motivation) + - [Goals](#goals) + - [Proposal](#proposal) + - [Enrich the sorting strategy of pod](#Enrich the sorting strategy of pod) + - [Definition of metric attribute](#Definition of metric attribute) + - [How to control accurately according to the water level](#How to control accurately according to the water level) + - [Precise operation of pod based on water level](#Precise operation of pod based on water level) + - [Analyzer phase](#Analyzer phase) + - [Executor phase](#Executor phase) + - [Non-Goals/Future Work](#non-goalsfuture-work) + - [User Stories](#user-stories) + + +## Motivation +Currently, in the crane agent, when the water level specified in the NodeQosEnsurancePolicy is exceeded, perform throttle, eviction and other operations to sort the low priority pods first. The current sorting is based on the prority class of the pod, and then perform throttle or eviction on the sorted pods; + +The existing problems are: + +1. sorting only refers to prority class, which cannot meet the sorting based on other features; At the same time, it can not meet the requirements of flexible sequencing according to the precise operation of the water level line, and can not meet the requirements of making the nodes reach the specified water level as soon as possible. For example, when we want to reduce the CPU usage of low priority services as soon as possible, we should select the pod with more CPU usage, which can reduce the CPU usage faster and ensure that high-quality services are not affected. + +2. after triggering the watermark specified in NodeQosEnsurancePolicy, all pods on the node that are lower than the specified prolityclass will be operated; For example, there are 10 pods on the current node that are lower than the specified prority class. After the water level is triggered, operations will be performed on all 10 pods. However, in fact, after the operation on the first pod is completed, it may be lower than the index value in NodeQosEnsurancePolicy. The operation on the remaining pods is excessive and can be avoided. If the index value in NodeQosEnsurancePolicy can be used as the watermark to accurately operate the pod, it is more appropriate to operate it just below the watermark, so as to avoid excessive impact on low priority services. + +### Goals + +- Enriches the sorting strategy of crane agent, including the sorting with pod CPU consumption as the main reference, the sorting with pod memory consumption as the main reference, the sorting based on runtime, and the sorting based on extended resource utilization. + +- Implement a framework including sorting and a precise operation, support to enrich sorting rules for different indicators, and realize precise operation. + +- To achieve a precise operation for CPU usage and memory usage, when the machine load exceeds the water level specified in NodeQosEnsurancePolicy, the low priority pods will be sorted first, and then the operation will be carried out in order until it is just below the water level. + +## Proposal + +### Enrich the sorting strategy of pod + +- The proposal implements some general sorting methods (which will be improved later): + + classAndPriority: Compare the Qos class and class value of two pods. Compare Qos class first and then class value; Those with high priority are ranked later and have higher priority + + runningTime:Compare the running time of two pods. The one with a long running time is ranked later and has a higher priority + + If you only need to use these two sorting strategies, you can use the default sorting method: you will first compare the priority of the pod, then compare the usage of the corresponding indicators of the pod, and then compare the running time of the pod. There is a dimension that can compare the results, that is, the sorting results of the pod + ```go + func GeneralSorter(pods []podinfo.PodContext) { + orderedBy(classAndPriority, runningTime).Sort(pods) + } + ``` + +- Sorting of CPU usage + + The priority of two pods will be compared in turn. If the priority is the same, then compare the CPU usage. If the CPU usage is also the same, continue to compare the EXT CPU resource usage (this is a special point of the CPU attribute). Finally, compare the running time of the pod. When there is a difference in a certain index, the comparison result can be returned + + ```go + func CpuUsageSorter(pods []podinfo.PodContext) { + orderedBy(classAndPriority, cpuUsage, extCpuUsage, runningTime).Sort(pods) + } + ``` + +- Sorting of ext CPU usage + + First, it will compare whether the extended CPU resources are used by two pods. If both are used, it will compare the ratio of the extended CPU resource usage / the extended CPU resource limit + + +- For the indicators that need to be customized, the following methods can be implemented, and the flexible and customized sorting of pods can be easily realized by freely matching the general sorting methods. The represents the customized metric indicators, and the represents the customized sorting strategy for + ```go + func Sorter(pods []podinfo.PodContext) { + orderedBy(classAndPriority, , runningTime).Sort(pods) + } + ``` + The only needs to implement the following sorting methods + ```go + func (p1, p2 podinfo.PodContext) int32 + ``` + + +### Definition of metric attribute + +In order to better sort and precisely control metrics configured based on NodeQosEnsurancePolicy, the concept of attributes is introduced into metrics. + +The attributes of metrics include the following: +1. Name indicates the name of the metric, which should be consistent with the indicator name collected in the collector module +2. ActionPriority indicates the priority of the indicator. 0 is the lowest and 10 is the highest +3. SortAble indicates whether the indicator can be sorted +4. Sorting methods corresponding to SortFunc. Sorting methods can be arranged and combined with some general methods, and then combined with the sorting of indicators, which will be introduced in detail below +5. ThrottleAble indicates whether pod can be suppressed for this indicator. For example, for the metric of CPU usage, there are corresponding suppression methods. However, for the indicator of memory usage, the pod can only be expelled, and effective suppression cannot be carried out +6. ThrottleQuantified indicates whether the corresponding metric resources released after the suppression can be accurately calculated after a pod is restored. We call the indicators that can be accurately quantified quantifiable, otherwise, they are not quantifiable; + For example, the CPU usage can be suppressed by limiting the CGroup usage, and the CPU usage released after suppression can be calculated by the current running value and the value after suppression; For example, memory usage does not belong to the suppression quantifiable metric, because memory has no corresponding throttle implementation, so it is impossible to accurately measure the specific amount of memory resources released after suppressing a pod; +7. ThrottleFunc, the specific method to execute the throttle action. If throttling is not available, the returned released is null +8. RestoreFunc: after being throttled, the specific method to execute the recovery action. If throttling is not allowed, the returned released is null +9. Relevant definitions of evicting actions by evictable, evictquantified, and evictfunc are similar to those of throttle actions + + +```go +type metric struct { + Name WaterLineMetric + + ActionPriority int + + SortAble bool + SortFunc func(pods []podinfo.PodContext) + + ThrottleAble bool + ThrottleQuantified bool + ThrottleFunc func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) + RestoreFunc func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) + + EvictAble bool + EvictQuantified bool + EvictFunc func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource) +} +``` + +You can define your own metric. After the construction is completed, you can register it through registermetricmap() + +### How to control accurately according to the water level + +- Build multiple waterlines according to multiple nodeqosensurancepolicies and objectiveinsurances: + 1. Classified according to the actions corresponding to objectiveinsurances, the crane agent currently has three operations to guarantee node QoS, namely, evict, thtottledown (to suppress pod usage when the current usage is higher than the value in objectiveinsurances) and throttleup (to relax and recover pod usage when the current usage is lower than the value in objectiveinsurances). Therefore, there will be three waterline sets, namely, throttledownwaterline, Throttleupwaterline and evictwaterline + + 2. Then classify the waterlines in the same operation category according to their metric rules (metric A and metric Z are used as schematic in the figure), and record the value of each objectiveinsurances water level line, which is recorded as waterline; + + The structures of throttledownwaterline, throttleupwaterline and evictwaterline are as follows: + `type WaterLines map[WaterLineMetric]*WaterLine` + + Where waterlinemetric is the name field of the above metric, and waterline of value is the resource value + `type WaterLine resource.Quantity` + + Finally, a data store similar to the following figure is formed: + ![](waterline-construct.png) + +- Construct the difference between real-time consumption and waterline: + The following data structure is constructed by combining the difference between the real-time consumption of the indicator at the current node and the minimum value in the waterline corresponding to the indicator in waterlines, representing the difference between the current consumption and the waterline + `type GapToWaterLines map[WaterLineMetric]float64` + + Where the key value is the name field of metric, and the value is the difference between the consumption and the waterline; + + It should be noted that for throttleup, the minimum waterline - current usage is used as the gap value. For the other two, the minimum waterline - current usage is used as the gap value, that is, the gap value is always kept positive + + The following three data represent the indicators that need to perform evict, thatttledown and throttleup operations and their corresponding differences to the lowest waterline + ```go + EvictGapToWaterLines[metrics] + ThrottoleDownGapToWaterLines[metrics] + ThrottleUpGapWaterLine[metrics] + ``` + +- Taking the metric CpuUsage as an example, the process and data structure of constructing the waterline related to node CPU usage are as follows: + ![](cpu-usage-water-line.png) + +### Precise operation of pod based on water level +In order to realize the precise operation of pod based on the water level, the proposal will modify the analyzer and executor. The general process is as follows: + +In the analyzer phase, construct waterlines for different operations (eviction, throttle, etc.) and different metrics, delete the original sorting logic, and move it to the executor phase where formal operations are required, and multiple rounds of sorting may be required; + +In the executor stage, the corresponding sorting is carried out according to the indicators involved in the waterline, the latest consumption is obtained, gaptowaterlines is constructed, and precise operations are carried out + +#### Analyzer phase +At this stage, the NodeQosEnsurancePolicy is converted to waterlines, and the rules of the same actionname and metricreule are merged. The details have been described above + +#### Executor phase + +Throttle: + +1. Firstly, analyze the metrics involved in the ThrottoleDownGapToWaterLines, and divide these metrics into two parts according to their quantized attribute. If there is a metric that cannot be quantized, get the metric of a throttleable (with a throttlefunc) with the highest action priority through gethighstprioritythottleablemetric to suppress all the selected pods, because if there is a metric that cannot be quantized, It is impossible to carry out a precise operation + +2. Get the latest usage of the current node and workload through getstatefunc(), Construct the gaptowaterline according to the ThrottoleDownGapToWaterLines and real-time usage (note that when constructing the gaptowaterline, it will traverse with the registered metric, so the finally constructed metric in the gaptowaterline will be the metric registered in the ThrottoleDownGapToWaterLines, avoiding the situation that the configuration error does not exist or the metric is not registered in the nodeqosensancepolicy) + +3. If there is a metric in the gaptowaterline whose real-time usage cannot be obtained (hasusagemissedmetric), obtain the metric of a throttleable (with throttlefunc) with the highest action priority through GetHighestPriorityThrottleAbleMetric to suppress all the selected pods, because if there is a metric whose real-time usage cannot be obtained, the gap with the waterline cannot be known, and precise operations cannot be performed + +4. If the situation in 3 does not exist, traverse the quantifiable metrics in the ThrottoleDownGapToWaterLines: if the metric has a sorting method, it directly uses its sortfunc to sort the pods. If not, it uses generalsorter to sort the pods, and then uses its corresponding throttlefunc to suppress the pods, and calculate the released resources of the corresponding metric, Until the gap corresponding to this metric in ThrottoleDownGapToWaterLines no longer exists + +```go +metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified() +if len(MetricsNotThrottleQuantified) != 0 { + highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = t.throttlePods(ctx, &totalReleased, highestPrioriyMetric) + } +} else { + ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc()) + if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() { + highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = throttlePods(ctx, &totalReleased, highestPrioriyMetric) + } + } else { + var released ReleaseResource + for _, m := range metricsQuantified { + if m.SortAble { + m.SortFunc(ThrottleDownPods) + } else { + GeneralSorter(ThrottleDownPods) + } + + for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) { + for index, _ := range ThrottleDownPods { + errKeys, released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased) + errPodKeys = append(errPodKeys, errKeys...) + ThrottoleDownGapToWaterLines[m] -= released[m] + } + } + } + } +} +``` + +Eviction: + +The process of eviction and throttle is the same, except that it is necessary to judge whether the pod has been expelled when operating the pod; Take out a pod that has not been executed, execute the eviction operation, calculate the released metric resources, and subtract the released value from the corresponding water level until the current metric waterline requirements are met +```go +metricsEvictQuantified, MetricsNotEvcitQuantified := EvictWaterLine.DivideMetricsByEvictQuantified() + +if len(MetricsNotEvcitQuantified) != 0 { + highestPrioriyMetric := e.EvictWaterLine.GetHighestPriorityEvictAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = e.evictPods(ctx, &totalReleased, highestPrioriyMetric) + } +} else { + EvictGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc(), ThrottleExecutor{}, *e) + if EvictGapToWaterLines.HasUsageMissedMetric() { + highestPrioriyMetric := EvictWaterLine.GetHighestPriorityEvictAbleMetric() + if highestPrioriyMetric != "" { + errPodKeys = e.evictPods(ctx, &totalReleased, highestPrioriyMetric) + } + } else { + wg := sync.WaitGroup{} + var released ReleaseResource + for _, m := range metricsEvictQuantified { + if MetricMap[m].SortAble { + MetricMap[m].SortFunc(e.EvictPods) + } else { + execsort.GeneralSorter(e.EvictPods) + } + + for !EvictGapToWaterLines.TargetGapsRemoved(m) { + if podinfo.HasNoExecutedPod(e.EvictPods) { + index := podinfo.GetFirstNoExecutedPod(e.EvictPods) + errKeys, released = MetricMap[m].EvictFunc(&wg, ctx, index, &totalReleased, e.EvictPods) + errPodKeys = append(errPodKeys, errKeys...) + + e.EvictPods[index].HasBeenActioned = true + ctx.EvictGapToWaterLines[m] -= released[m] + } + } + } + wg.Wait() + } + +} +``` + +### Non-Goals/Future Work + +- Currently, only the precise operation of CPU usage is supported, but the framework can be reused. In the future, the framework based on precise control can achieve precise control of more dimensional indicators. +- In the process of precise control, only the release of metric is considered at present, and the interaction between different metrics is not considered. For example, when pressing CPU usage, memory usage will also be affected. If there are many indicators, the relationship between different indicators will be very complex, so the direct interaction of different metrics will not be considered for the time being. + +### User Stories + +- Users can use crane agent for better QoS guarantees. Support faster node load reduction to ensure that high priority services are not affected. At the same time, the throttle/eviction of low priority services is precisely controlled to avoid excessive operation. +- With the help of the framework of precise operation (throttle/eviction), users can easily realize the QoS function with precise operation and sorting capability based on the user-defined metric without paying attention to details by implementing the attributes and methods related to the user-defined metric. \ No newline at end of file diff --git a/docs/proposals/cpu-usage-water-line.png b/docs/proposals/cpu-usage-water-line.png new file mode 100644 index 000000000..5f9b68eef Binary files /dev/null and b/docs/proposals/cpu-usage-water-line.png differ diff --git a/docs/proposals/waterline-construct.png b/docs/proposals/waterline-construct.png new file mode 100644 index 000000000..70cdb65a1 Binary files /dev/null and b/docs/proposals/waterline-construct.png differ