Merge pull request #634 from qmhu/recommender-docs

Recommender docs
gocrane · Nov 30, 2022 · e4c27d7 · e4c27d7
2 parents 474e405 + 359d690
commit e4c27d7
Show file tree

Hide file tree

Showing 5 changed files with 290 additions and 38 deletions.
diff --git a/site/content/en/docs/Tutorials/Recommendation/idlenode-recommendation.md b/site/content/en/docs/Tutorials/Recommendation/idlenode-recommendation.md
@@ -10,7 +10,7 @@ By scanning the status and utilization of nodes, the idle node recommendation he
 
 In Kubernetes cluster, some nodes often idle due to such factors as node taint, label selector, low packing rate and low utilization rate, which wastes a lot of costs. IdleNode recommendation tries to help users find these nodes to reduce cost.
 
-## Example
+## Sample
 
 ```yaml
 kind: Recommendation
@@ -48,15 +48,17 @@ status:
   lastUpdateTime: '2022-11-30T07:46:57Z'
 ```
 
-In this example：
+In this sample：
 
 - Recommendation's TargetRef Point to Node：worker-node-1
 - Recommendation type is IdleNode 
 - action is Delete，but offline a node is a complicated operation, we only give recommended advise.
 
+How to create a IdleNode recommendation please refer to：[**Recommendation Framework**](/zh-cn/docs/tutorials/recommendation/recommendation-framework)
+
 ## Implement
 
 Perform the following steps to complete a recommendation process for idle nodes:
 
 1. Scan all nodes and pods in the cluster
-2. If all Pods on a node are DaemonSet, the node is considered to be idle
+2. If all Pods on a node are DaemonSet pods, the node is considered to be idle
diff --git a/site/content/en/docs/Tutorials/Recommendation/replicas-recommendation.md b/site/content/en/docs/Tutorials/Recommendation/replicas-recommendation.md
@@ -6,44 +6,136 @@ weight: 13
 
 Kubernetes' users often set the replicas based on empirical values when creating application resources. Based on the replicas recommendation, you can analyze the actual application usage and recommend a more suitable replicas configuration. You can use it to improve the resource utilization of the cluster.
 
-## Implement
+## Motivation
+
+Kubernetes workload replicas allows you to control the Pods for quick scaling. However, how to set a reasonable replicas has always been a problem for application administrators. Too large may lead to a lot of waste of resources, while too low may cause stability problems.
+
+The HPA in community provides a dynamic autoscaling mechanism based on realtime metrics, meanwhile Crane's EffectiveHPA supports prediction-driven autoscaling based on HPA. However, in the real world, only some workloads can scale horizontally all the time, many workloads require a fixed number of pods.
+
+The figure below shows a workload with low utilization, it has 30% of the resource wasted between the Pod's peak historical usage and its Request.
+
+![Resource Waste](/images/resource-waste.jpg)
+
+Replica recommendation attempts to reduce the complexity of how to know the replicas of workloads by analyzing the historical usage.
+
+## Sample
+
+A Replicas recommendation sample yaml looks like below:
+
+```yaml
+kind: Recommendation
+apiVersion: analysis.crane.io/v1alpha1
+metadata:
+  name: workloads-rule-replicas-p84jv
+  namespace: kube-system
+  labels:
+    addonmanager.kubernetes.io/mode: Reconcile
+    analysis.crane.io/recommendation-rule-name: workloads-rule
+    analysis.crane.io/recommendation-rule-recommender: Replicas
+    analysis.crane.io/recommendation-rule-uid: 18588495-f325-4873-b45a-7acfe9f1ba94
+    k8s-app: kube-dns
+    kubernetes.io/cluster-service: 'true'
+    kubernetes.io/name: CoreDNS
+  ownerReferences:
+    - apiVersion: analysis.crane.io/v1alpha1
+      kind: RecommendationRule
+      name: workloads-rule
+      uid: 18588495-f325-4873-b45a-7acfe9f1ba94
+      controller: false
+      blockOwnerDeletion: false
+spec:
+  targetRef:
+    kind: Deployment
+    namespace: kube-system
+    name: coredns
+    apiVersion: apps/v1
+  type: Replicas
+  completionStrategy:
+    completionStrategyType: Once
+  adoptionType: StatusAndAnnotation
+status:
+  recommendedValue:
+    replicasRecommendation:
+      replicas: 1
+  targetRef: { }
+  recommendedInfo: '{"spec":{"replicas":1}}'
+  currentInfo: '{"spec":{"replicas":2}}'
+  action: Patch
+  conditions:
+    - type: Ready
+      status: 'True'
+      lastTransitionTime: '2022-11-28T08:07:36Z'
+      reason: RecommendationReady
+      message: Recommendation is ready
+  lastUpdateTime: '2022-11-29T11:07:45Z'
+```
 
-Based on the historical Workload CPU loads, find the workload's lowest CPU usage per hour in the past seven days, and calculate the replicas with 50% (configurable) cpu usage that should be configured
+In this sample：
 
-### Filter Phase
+- Recommendation TargetRef point to a Deployment in kube-system namespace：coredns
+- Recommendation type is Replicas
+- adoptionType is StatusAndAnnotation，indicated that put recommendation result in recommendation.status and Deployment 的 Annotation
+- recommendedInfo shows the recommended replicas（recommendedValue is deprecated），currentInfo shows the current replicas.The format is Json that can be updated for TargetRef by `Kubectl Patch`
+  TargetRef
 
-1. workload with low replicas: If the replicas is too low, it may not have high recommendation demand. Associated configuration: 'workload-min-replicas'
-2. There is a certain percentage of the not running pods for workload: if the Pod of workload mostly can't run normally, may not be suitable for recommendation, associated configuration: `pod-min-ready-seconds` | `pod-available-ratio`
+How to create a Replicas recommendation please refer to：[**Recommendation Framework**](/docs/tutorials/recommendation/recommendation-framework)
 
-### Prepare Phase
+## Implement
 
-Query the workload cpu usage in the past week.
+The process for one Replicas recommendation:
 
-### Recommend Phase
+1. Query the historical CPU and Memory usage of the Workload for the past week by monitoring system.
+2. Use DSP algorithm to predict the CPU usage in the future.
+3. Calculate the replicas for both CPU and memory, then choose a larger one.
 
-1. Calculate the lowest value of the median workload usage per hour in the past seven days (to prevent the impact of the minimum value): workload_cpu_usage_medium_min
-2. The number of replicas corresponding to the target utilization:
+### Algorithm 
+
+Use cpu usage as an example. Assume that the P99 of the historical CPU usage of the workload is 10 cores, the Pod CPU Request is 5 cores, and the target peak utilization is 50%. Therefore, we know that 4(10 / 50% / 5) pods can meet the target peak utilization.
 
 ```go
-   	replicas := int32(math.Ceil(workload_cpu_usage_medium_min / (rr.TargetUtilization * float64(requestTotal) / 1000.)))
+    replicas := int32(math.Ceil(workloadUsage / (TargetUtilization * float64(requestTotal))))
 ```
 
-3. In order to prevent too low replicas，replicas should be larger than or equal to default-min-replicas
+### Abnormal workloads
+
+The following types of abnormal workloads are not recommended:
+
+1. workload with low replicas: If the replicas is too low, it may not have high recommendation demand. Associated configuration: 'workload-min-replicas'
+2. There is a certain percentage of the not running pods for workload: if the Pod of workload mostly can't run normally, may not be suitable for recommendation, associated configuration: `pod-min-ready-seconds` | `pod-available-ratio`
 
-### Observe Phase
+### Prometheus Metrics
 
 Record recommended replicas to Metric: crane_analytics_replicas_recommendation
 
+## How to verify the accuracy of recommendation results
+
+Users can get the Workload resource usage through the following Prom-query, when you get the workload usage, put it into the algorithm above.
+
+Taking Deployment Craned in crane-system as an example, you can use your container, namespace to replace it in following Prom-query.
+
+```shell
+sum(irate(container_cpu_usage_seconds_total{namespace="crane-system",pod=~"^craned-.*$",container!=""}[3m]))  # cpu usage
+```
+
+```shell
+sum(container_memory_working_set_bytes{namespace="crane-system",pod=~"^craned-.*$",container!=""})  # memory usage
+```
+
 ## Accepted resources
 
 Support StatefulSet and Deployment by default，but all workloads that support `Scale SubResource` are supported.
 
 ## Configuration
 
-| Configuration items    | Default | Description                                                         |
-|------------------------|---------|---------------------------------------------------------------------|
-| workload-min-replicas  | 1       | Workload replicas than less than this value are not recommended     |
-| pod-min-ready-seconds  | 30      | Defines the min seconds to identify Pod is ready                    |
+| Configuration items    | Default | Description                                                            |
+|------------------------|---------|------------------------------------------------------------------------|
+| workload-min-replicas  | 1       | Workload replicas than less than this value are not recommended        |
+| pod-min-ready-seconds  | 30      | Defines the min seconds to identify Pod is ready                       |
 | pod-available-ratio    | 0.5     | Workload ready Pod ratio that less than this value are not recommended |
-| default-min-replicas   | 1       | default minReplicas                                                 |
-| cpu-target-utilization | 0.5     | Calculate the minimum replicas based on this cpu utilization      |
+| default-min-replicas   | 1       | default minReplicas                                                    |
+| cpu-percentile         | 0.95 | Percentile for historical cpu usage                                    |
+| mem-percentile         | 0.95 | Percentile for historical memory usage                                 |
+| cpu-target-utilization | 0.5  | Target of CPU peak historical usage                                    |
+| mem-target-utilization | 0.5  | Target of Memory peak historical usage                                 |
+
+How to update recommendation configuration please refer to：[**Recommendation Framework**](/docs/tutorials/recommendation/recommendation-framework)