Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IP reclaim: differentiate stateless workload under deleting-timeout state on ready node and not-ready node #3002

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

Icarus9913
Copy link
Collaborator

Refer to #2967

we should add switch to choose whether GC the IP for the following situations

terminating timeout pod on not-ready node
terminating timeout pod on ready node

Signed-off-by: Icarus9913 icaruswu66@qq.com

What this PR does / why we need it:
new feature

@Icarus9913 Icarus9913 added pr/not-ready not ready for merging release/feature-new release note for new feature labels Dec 25, 2023
Copy link

codecov bot commented Dec 25, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ca8606e) 81.10% compared to head (65e406f) 81.07%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3002      +/-   ##
==========================================
- Coverage   81.10%   81.07%   -0.04%     
==========================================
  Files          49       50       +1     
  Lines        5351     5358       +7     
==========================================
+ Hits         4340     4344       +4     
- Misses        854      856       +2     
- Partials      157      158       +1     
Flag Coverage Δ
unittests 81.07% <100.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/nodemanager/utils.go 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

@weizhoublue
Copy link
Collaborator

weizhoublue commented Dec 27, 2023

For the stateful case, it barely make sense to release ip after dectecting ip conflict . So this feature should only work for stateless workload

@weizhoublue
Copy link
Collaborator

any update on this ?

@Icarus9913 Icarus9913 added pr/ready-review This pull is ready for review and removed pr/not-ready not ready for merging labels Jan 9, 2024

节点意外宕机后,集群中的 Pod 永久处于 `deleting` 状态,Pod 占用的 IP 地址无法被释放。

- 对处于 `Terminating` 状态的 Pod,Spiderpool 将在 Pod 的 `spec.terminationGracePeriodSecond` 后,自动释放其 IP 地址。该功能可通过环境变量 `SPIDERPOOL_GC_TERMINATING_POD_IP_ENABLED` 来控制。该能力能够用以解决 `节点意外宕机` 的故障场景。
- 对处于 `Terminating` 状态的 Pod,Spiderpool 将在 Pod 的 `spec.terminationGracePeriodSecond` 后,自动释放其 IP 地址。该功能可通过环境变量 `SPIDERPOOL_GC_TERMINATING_NODE_NOT_READY_POD_IP_ENABLED` 来控制。该能力能够用以解决 `节点意外宕机` 的故障场景。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句话是我理解错还是描述不对:

节点意外宕机后,集群中的 Pod 永久处于 deleting 状态

对处于 deleting 状态的 Pod?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

换成了Terminating

docs/reference/spiderpool-controller.md Outdated Show resolved Hide resolved
docs/reference/spiderpool-controller.md Outdated Show resolved Hide resolved
docs/concepts/ipam-des-zh_CN.md Outdated Show resolved Hide resolved
@@ -161,10 +161,12 @@ spec:
value: {{ .Values.spiderpoolController.httpPort | quote }}
- name: SPIDERPOOL_GC_IP_ENABLED
value: {{ .Values.ipam.gc.enabled | quote }}
- name: SPIDERPOOL_GC_TERMINATING_POD_IP_ENABLED
value: {{ .Values.ipam.gc.GcDeletingTimeOutPod.enabled | quote }}
- name: SPIDERPOOL_GC_TERMINATING_NODE_READY_POD_IP_ENABLED
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPIDERPOOL_GC_STATELESS_TERMINATING_POD_ON_READY_NODE_ENABLE

value: {{ .Values.ipam.gc.GcDeletingTimeOutPod.enabled | quote }}
- name: SPIDERPOOL_GC_TERMINATING_NODE_READY_POD_IP_ENABLED
value: {{ .Values.ipam.gc.enableGcDeletingTimeOutPodWithNodeReady | quote }}
- name: SPIDERPOOL_GC_TERMINATING_NODE_NOT_READY_POD_IP_ENABLED
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPIDERPOOL_GC_STATELESS_TERMINATING_POD_ON_NOT_READY_NODE_ENABLE

## @param ipam.gc.GcDeletingTimeOutPod.enabled enable retrieve IP for the pod who times out of deleting graceful period
enabled: true
## @param ipam.gc.enableGcDeletingTimeOutPodWithNodeReady enable reclaim IP for the pod who times out of deleting graceful period with its node ready
enableGcDeletingTimeOutPodWithNodeReady: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableGcStatelesTerminatingPodOnReadyNode

@@ -75,6 +75,8 @@ spec:
detectIPConflict: true # Enable detectIPConflict
```

> 若 IP 冲突检查发现某 IP 已被集群中其他处于 `Terminating` 状态的 Pod 所占用,请参考 [IP 回收机制](./ipam-des-zh_CN.md#ip-回收机制) 相关配置。
Copy link
Collaborator

@weizhoublue weizhoublue Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是个歧义,“IP 冲突检查发现”功能并不能 发现 冲突的 ip 是否处于 Terminating` 状态,只可能是 可能的原因 之一

@Icarus9913 Icarus9913 force-pushed the feat/wk/gc-env branch 2 times, most recently from e014491 to 94d9827 Compare January 10, 2024 03:13
cyclinder
cyclinder previously approved these changes Jan 10, 2024
@weizhoublue weizhoublue changed the title supplement IP GC ENV for various scenarios IP reclaim: differentiate stateless workload under deleting-timeout state on ready node and not-ready node Jan 15, 2024
@weizhoublue
Copy link
Collaborator

现有有了两种场景,是否有不同的 用例 闭环 验证

## @param ipam.gc.GcDeletingTimeOutPod.enabled enable retrieve IP for the pod who times out of deleting graceful period
enabled: true
## @param ipam.gc.enableGcStatelesTerminatingPodOnReadyNode enable reclaim IP for the stateless pod who is over deleting graceful period on a ready node
enableGcStatelesTerminatingPodOnReadyNode: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableGcStatelesTerminatingPodOnReadyNode
enableGcStatelesTerminatingPodOnNotReadyNode

调整为如下,是否会清爽点,将来可能还有 stateful 其它层级 其中 zombie pod 定义为 deleting 超时了的 pod

StatelesPod:
    zombieOnReadyNode: true
    zombieOnNotReadyNode: true

@Icarus9913 Icarus9913 force-pushed the feat/wk/gc-env branch 4 times, most recently from 92d1f4e to 63213d4 Compare January 24, 2024 09:57
Signed-off-by: Icarus9913 <icaruswu66@qq.com>
@Icarus9913 Icarus9913 merged commit af4632d into spidernet-io:main Jan 25, 2024
44 checks passed
@Icarus9913 Icarus9913 deleted the feat/wk/gc-env branch January 25, 2024 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/ready-review This pull is ready for review release/feature-new release note for new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants