Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommender docs #634

Merged
merged 1 commit into from
Nov 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ By scanning the status and utilization of nodes, the idle node recommendation he

In Kubernetes cluster, some nodes often idle due to such factors as node taint, label selector, low packing rate and low utilization rate, which wastes a lot of costs. IdleNode recommendation tries to help users find these nodes to reduce cost.

## Example
## Sample

```yaml
kind: Recommendation
Expand Down Expand Up @@ -48,15 +48,17 @@ status:
lastUpdateTime: '2022-11-30T07:46:57Z'
```

In this example
In this sample

- Recommendation's TargetRef Point to Node:worker-node-1
- Recommendation type is IdleNode
- action is Delete,but offline a node is a complicated operation, we only give recommended advise.

How to create a IdleNode recommendation please refer to:[**Recommendation Framework**](/zh-cn/docs/tutorials/recommendation/recommendation-framework)

## Implement

Perform the following steps to complete a recommendation process for idle nodes:

1. Scan all nodes and pods in the cluster
2. If all Pods on a node are DaemonSet, the node is considered to be idle
2. If all Pods on a node are DaemonSet pods, the node is considered to be idle
130 changes: 111 additions & 19 deletions site/content/en/docs/Tutorials/Recommendation/replicas-recommendation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,136 @@ weight: 13

Kubernetes' users often set the replicas based on empirical values when creating application resources. Based on the replicas recommendation, you can analyze the actual application usage and recommend a more suitable replicas configuration. You can use it to improve the resource utilization of the cluster.

## Implement
## Motivation

Kubernetes workload replicas allows you to control the Pods for quick scaling. However, how to set a reasonable replicas has always been a problem for application administrators. Too large may lead to a lot of waste of resources, while too low may cause stability problems.

The HPA in community provides a dynamic autoscaling mechanism based on realtime metrics, meanwhile Crane's EffectiveHPA supports prediction-driven autoscaling based on HPA. However, in the real world, only some workloads can scale horizontally all the time, many workloads require a fixed number of pods.

The figure below shows a workload with low utilization, it has 30% of the resource wasted between the Pod's peak historical usage and its Request.

![Resource Waste](/images/resource-waste.jpg)

Replica recommendation attempts to reduce the complexity of how to know the replicas of workloads by analyzing the historical usage.

## Sample

A Replicas recommendation sample yaml looks like below:

```yaml
kind: Recommendation
apiVersion: analysis.crane.io/v1alpha1
metadata:
name: workloads-rule-replicas-p84jv
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
analysis.crane.io/recommendation-rule-name: workloads-rule
analysis.crane.io/recommendation-rule-recommender: Replicas
analysis.crane.io/recommendation-rule-uid: 18588495-f325-4873-b45a-7acfe9f1ba94
k8s-app: kube-dns
kubernetes.io/cluster-service: 'true'
kubernetes.io/name: CoreDNS
ownerReferences:
- apiVersion: analysis.crane.io/v1alpha1
kind: RecommendationRule
name: workloads-rule
uid: 18588495-f325-4873-b45a-7acfe9f1ba94
controller: false
blockOwnerDeletion: false
spec:
targetRef:
kind: Deployment
namespace: kube-system
name: coredns
apiVersion: apps/v1
type: Replicas
completionStrategy:
completionStrategyType: Once
adoptionType: StatusAndAnnotation
status:
recommendedValue:
replicasRecommendation:
replicas: 1
targetRef: { }
recommendedInfo: '{"spec":{"replicas":1}}'
currentInfo: '{"spec":{"replicas":2}}'
action: Patch
conditions:
- type: Ready
status: 'True'
lastTransitionTime: '2022-11-28T08:07:36Z'
reason: RecommendationReady
message: Recommendation is ready
lastUpdateTime: '2022-11-29T11:07:45Z'
```

Based on the historical Workload CPU loads, find the workload's lowest CPU usage per hour in the past seven days, and calculate the replicas with 50% (configurable) cpu usage that should be configured
In this sample:

### Filter Phase
- Recommendation TargetRef point to a Deployment in kube-system namespace:coredns
- Recommendation type is Replicas
- adoptionType is StatusAndAnnotation,indicated that put recommendation result in recommendation.status and Deployment 的 Annotation
- recommendedInfo shows the recommended replicas(recommendedValue is deprecated),currentInfo shows the current replicas.The format is Json that can be updated for TargetRef by `Kubectl Patch`
TargetRef

1. workload with low replicas: If the replicas is too low, it may not have high recommendation demand. Associated configuration: 'workload-min-replicas'
2. There is a certain percentage of the not running pods for workload: if the Pod of workload mostly can't run normally, may not be suitable for recommendation, associated configuration: `pod-min-ready-seconds` | `pod-available-ratio`
How to create a Replicas recommendation please refer to:[**Recommendation Framework**](/docs/tutorials/recommendation/recommendation-framework)

### Prepare Phase
## Implement

Query the workload cpu usage in the past week.
The process for one Replicas recommendation:

### Recommend Phase
1. Query the historical CPU and Memory usage of the Workload for the past week by monitoring system.
2. Use DSP algorithm to predict the CPU usage in the future.
3. Calculate the replicas for both CPU and memory, then choose a larger one.

1. Calculate the lowest value of the median workload usage per hour in the past seven days (to prevent the impact of the minimum value): workload_cpu_usage_medium_min
2. The number of replicas corresponding to the target utilization:
### Algorithm

Use cpu usage as an example. Assume that the P99 of the historical CPU usage of the workload is 10 cores, the Pod CPU Request is 5 cores, and the target peak utilization is 50%. Therefore, we know that 4(10 / 50% / 5) pods can meet the target peak utilization.

```go
replicas := int32(math.Ceil(workload_cpu_usage_medium_min / (rr.TargetUtilization * float64(requestTotal) / 1000.)))
replicas := int32(math.Ceil(workloadUsage / (TargetUtilization * float64(requestTotal))))
```

3. In order to prevent too low replicas,replicas should be larger than or equal to default-min-replicas
### Abnormal workloads

The following types of abnormal workloads are not recommended:

1. workload with low replicas: If the replicas is too low, it may not have high recommendation demand. Associated configuration: 'workload-min-replicas'
2. There is a certain percentage of the not running pods for workload: if the Pod of workload mostly can't run normally, may not be suitable for recommendation, associated configuration: `pod-min-ready-seconds` | `pod-available-ratio`

### Observe Phase
### Prometheus Metrics

Record recommended replicas to Metric: crane_analytics_replicas_recommendation

## How to verify the accuracy of recommendation results

Users can get the Workload resource usage through the following Prom-query, when you get the workload usage, put it into the algorithm above.

Taking Deployment Craned in crane-system as an example, you can use your container, namespace to replace it in following Prom-query.

```shell
sum(irate(container_cpu_usage_seconds_total{namespace="crane-system",pod=~"^craned-.*$",container!=""}[3m])) # cpu usage
```

```shell
sum(container_memory_working_set_bytes{namespace="crane-system",pod=~"^craned-.*$",container!=""}) # memory usage
```

## Accepted resources

Support StatefulSet and Deployment by default,but all workloads that support `Scale SubResource` are supported.

## Configuration

| Configuration items | Default | Description |
|------------------------|---------|---------------------------------------------------------------------|
| workload-min-replicas | 1 | Workload replicas than less than this value are not recommended |
| pod-min-ready-seconds | 30 | Defines the min seconds to identify Pod is ready |
| Configuration items | Default | Description |
|------------------------|---------|------------------------------------------------------------------------|
| workload-min-replicas | 1 | Workload replicas than less than this value are not recommended |
| pod-min-ready-seconds | 30 | Defines the min seconds to identify Pod is ready |
| pod-available-ratio | 0.5 | Workload ready Pod ratio that less than this value are not recommended |
| default-min-replicas | 1 | default minReplicas |
| cpu-target-utilization | 0.5 | Calculate the minimum replicas based on this cpu utilization |
| default-min-replicas | 1 | default minReplicas |
| cpu-percentile | 0.95 | Percentile for historical cpu usage |
| mem-percentile | 0.95 | Percentile for historical memory usage |
| cpu-target-utilization | 0.5 | Target of CPU peak historical usage |
| mem-target-utilization | 0.5 | Target of Memory peak historical usage |

How to update recommendation configuration please refer to:[**Recommendation Framework**](/docs/tutorials/recommendation/recommendation-framework)
Loading