-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[deployments-k8s#1174] Add NSMgr heal tests #1640
Closed
Bolodya1997
wants to merge
1
commit into
networkservicemesh:main
from
Bolodya1997:deployments-k8s#1174/nsmgr-healing
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,13 @@ | ||
# Heal examples | ||
|
||
This document contain links for heal examples of NSM. | ||
This document contains links for heal examples of NSM. | ||
|
||
## Requires | ||
|
||
To run any heal example follow steps for [Basic NSM setup](../basic) | ||
|
||
## Includes | ||
|
||
- [Local Forwarder restart](./local-forwarder-healing) | ||
- [Local Forwarder restart](./local-forwarder-healing) | ||
- [Local NSMgr restart](./local-nsmgr-restart) | ||
- [Remote NSMgr restart](./remote-nsmgr-restart) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
# Local NSMgr restart | ||
|
||
This example shows that NSM keeps working after the local NSMgr restart. | ||
|
||
NSC and NSE are using the `kernel` mechanism to connect to its local forwarder. | ||
Forwarders are using the `vxlan` mechanism to connect with each other. | ||
|
||
## Requires | ||
|
||
Make sure that you have completed steps from [basic](../../basic) or [memory](../../memory) setup. | ||
|
||
## Run | ||
|
||
Create test namespace: | ||
```bash | ||
NAMESPACE=($(kubectl create -f ../namespace.yaml)[0]) | ||
NAMESPACE=${NAMESPACE:10} | ||
``` | ||
|
||
Register namespace in `spire` server: | ||
```bash | ||
kubectl exec -n spire spire-server-0 -- \ | ||
/opt/spire/bin/spire-server entry create \ | ||
-spiffeID spiffe://example.org/ns/${NAMESPACE}/sa/default \ | ||
-parentID spiffe://example.org/ns/spire/sa/spire-agent \ | ||
-selector k8s:ns:${NAMESPACE} \ | ||
-selector k8s:sa:default | ||
``` | ||
|
||
Get nodes exclude control-plane: | ||
```bash | ||
NODES=($(kubectl get nodes -o go-template='{{range .items}}{{ if not .spec.taints }}{{index .metadata.labels "kubernetes.io/hostname"}} {{end}}{{end}}')) | ||
``` | ||
|
||
Create customization file: | ||
```bash | ||
cat > kustomization.yaml <<EOF | ||
--- | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
|
||
namespace: ${NAMESPACE} | ||
|
||
bases: | ||
- ../../../apps/nsc-kernel | ||
- ../../../apps/nse-kernel | ||
|
||
patchesStrategicMerge: | ||
- patch-nsc.yaml | ||
- patch-nse.yaml | ||
EOF | ||
``` | ||
|
||
Create NSC patch: | ||
```bash | ||
cat > patch-nsc.yaml <<EOF | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: nsc-kernel | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
- name: nsc | ||
env: | ||
- name: NSM_NETWORK_SERVICES | ||
value: kernel://icmp-responder/nsm-1 | ||
|
||
nodeSelector: | ||
kubernetes.io/hostname: ${NODES[0]} | ||
EOF | ||
|
||
``` | ||
Create NSE patch: | ||
```bash | ||
cat > patch-nse.yaml <<EOF | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: nse-kernel | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
- name: nse | ||
env: | ||
- name: NSE_CIDR_PREFIX | ||
value: 172.16.1.100/31 | ||
nodeSelector: | ||
kubernetes.io/hostname: ${NODES[1]} | ||
EOF | ||
``` | ||
|
||
Deploy NSC and NSE: | ||
```bash | ||
kubectl apply -k . | ||
``` | ||
|
||
Wait for applications ready: | ||
```bash | ||
kubectl wait --for=condition=ready --timeout=1m pod -l app=nsc-kernel -n ${NAMESPACE} | ||
``` | ||
```bash | ||
kubectl wait --for=condition=ready --timeout=1m pod -l app=nse-kernel -n ${NAMESPACE} | ||
``` | ||
|
||
Find NSC and NSE pods by labels: | ||
```bash | ||
NSC=$(kubectl get pods -l app=nsc-kernel -n ${NAMESPACE} --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
``` | ||
```bash | ||
NSE=$(kubectl get pods -l app=nse-kernel -n ${NAMESPACE} --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
``` | ||
|
||
Ping from NSC to NSE: | ||
```bash | ||
kubectl exec ${NSC} -n ${NAMESPACE} -- ping -c 4 172.16.1.100 | ||
``` | ||
|
||
Ping from NSE to NSC: | ||
```bash | ||
kubectl exec ${NSE} -n ${NAMESPACE} -- ping -c 4 172.16.1.101 | ||
``` | ||
|
||
Find local NSMgr pod: | ||
```bash | ||
NSMGR=$(kubectl get pods -l app=nsmgr --field-selector spec.nodeName==${NODES[0]} -n nsm-system --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
``` | ||
|
||
Restart local NSMgr: | ||
```bash | ||
kubectl delete pod ${NSMGR} -n nsm-system | ||
``` | ||
|
||
Ping from NSC to NSE again after local NSMgr restored: | ||
```bash | ||
sleep 70 | ||
``` | ||
```bash | ||
kubectl exec ${NSC} -n ${NAMESPACE} -- ping -c 4 172.16.1.100 | ||
``` | ||
|
||
Ping from NSE to NSC: | ||
```bash | ||
kubectl exec ${NSE} -n ${NAMESPACE} -- ping -c 4 172.16.1.101 | ||
``` | ||
|
||
## Cleanup | ||
|
||
Delete ns: | ||
```bash | ||
kubectl delete ns ${NAMESPACE} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
# Remote NSMgr restart | ||
|
||
This example shows that NSM keeps working after the remote NSMgr restart. | ||
|
||
NSC and NSE are using the `kernel` mechanism to connect to its local forwarder. | ||
Forwarders are using the `vxlan` mechanism to connect with each other. | ||
|
||
## Requires | ||
|
||
Make sure that you have completed steps from [basic](../../basic) or [memory](../../memory) setup. | ||
|
||
## Run | ||
|
||
Create test namespace: | ||
```bash | ||
NAMESPACE=($(kubectl create -f ../namespace.yaml)[0]) | ||
NAMESPACE=${NAMESPACE:10} | ||
``` | ||
|
||
Register namespace in `spire` server: | ||
```bash | ||
kubectl exec -n spire spire-server-0 -- \ | ||
/opt/spire/bin/spire-server entry create \ | ||
-spiffeID spiffe://example.org/ns/${NAMESPACE}/sa/default \ | ||
-parentID spiffe://example.org/ns/spire/sa/spire-agent \ | ||
-selector k8s:ns:${NAMESPACE} \ | ||
-selector k8s:sa:default | ||
``` | ||
|
||
Get nodes exclude control-plane: | ||
```bash | ||
NODES=($(kubectl get nodes -o go-template='{{range .items}}{{ if not .spec.taints }}{{index .metadata.labels "kubernetes.io/hostname"}} {{end}}{{end}}')) | ||
``` | ||
|
||
Create customization file: | ||
```bash | ||
cat > kustomization.yaml <<EOF | ||
--- | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
|
||
namespace: ${NAMESPACE} | ||
|
||
bases: | ||
- ../../../apps/nsc-kernel | ||
- ../../../apps/nse-kernel | ||
|
||
patchesStrategicMerge: | ||
- patch-nsc.yaml | ||
- patch-nse.yaml | ||
EOF | ||
``` | ||
|
||
Create NSC patch: | ||
```bash | ||
cat > patch-nsc.yaml <<EOF | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: nsc-kernel | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
- name: nsc | ||
env: | ||
- name: NSM_NETWORK_SERVICES | ||
value: kernel://icmp-responder/nsm-1 | ||
|
||
nodeSelector: | ||
kubernetes.io/hostname: ${NODES[0]} | ||
EOF | ||
|
||
``` | ||
Create NSE patch: | ||
```bash | ||
cat > patch-nse.yaml <<EOF | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: nse-kernel | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
- name: nse | ||
env: | ||
- name: NSE_CIDR_PREFIX | ||
value: 172.16.1.100/31 | ||
nodeSelector: | ||
kubernetes.io/hostname: ${NODES[1]} | ||
EOF | ||
``` | ||
|
||
Deploy NSC and NSE: | ||
```bash | ||
kubectl apply -k . | ||
``` | ||
|
||
Wait for applications ready: | ||
```bash | ||
kubectl wait --for=condition=ready --timeout=1m pod -l app=nsc-kernel -n ${NAMESPACE} | ||
``` | ||
```bash | ||
kubectl wait --for=condition=ready --timeout=1m pod -l app=nse-kernel -n ${NAMESPACE} | ||
``` | ||
|
||
Find NSC and NSE pods by labels: | ||
```bash | ||
NSC=$(kubectl get pods -l app=nsc-kernel -n ${NAMESPACE} --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
``` | ||
```bash | ||
NSE=$(kubectl get pods -l app=nse-kernel -n ${NAMESPACE} --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
``` | ||
|
||
Ping from NSC to NSE: | ||
```bash | ||
kubectl exec ${NSC} -n ${NAMESPACE} -- ping -c 4 172.16.1.100 | ||
``` | ||
|
||
Ping from NSE to NSC: | ||
```bash | ||
kubectl exec ${NSE} -n ${NAMESPACE} -- ping -c 4 172.16.1.101 | ||
``` | ||
|
||
Find remote NSMgr pod: | ||
```bash | ||
NSMGR=$(kubectl get pods -l app=nsmgr --field-selector spec.nodeName==${NODES[1]} -n nsm-system --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') | ||
``` | ||
|
||
Restart remote NSMgr: | ||
```bash | ||
kubectl delete pod ${NSMGR} -n nsm-system | ||
``` | ||
|
||
Ping from NSC to NSE again after local NSMgr restored: | ||
```bash | ||
sleep 70 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to use sleep? |
||
``` | ||
```bash | ||
kubectl exec ${NSC} -n ${NAMESPACE} -- ping -c 4 172.16.1.100 | ||
``` | ||
|
||
Ping from NSE to NSC: | ||
```bash | ||
kubectl exec ${NSE} -n ${NAMESPACE} -- ping -c 4 172.16.1.101 | ||
``` | ||
|
||
## Cleanup | ||
|
||
Delete ns: | ||
```bash | ||
kubectl delete ns ${NAMESPACE} | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heal takes 60 seconds to start, so even if we don't make any heal actions in these restore tests, we still need this wait to make sure that heal doesn't lead us to some unexpected behavior. Example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edwarnicke Do you mind if we'll use sleep in heal tests? As I can see Vlad has created an issue to consider the problem with sleep after release networkservicemesh/sdk#978
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a real bug we should fix.
Generally sleep is primarily a means of papering over bugs (not always, but often)... lets track down the bug instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edwarnicke The problem is tracked networkservicemesh/sdk#978
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to consider it at first or we can go ahead and merge this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for such confusing you, these are not some real 1..6 steps with some real bug, it is just an example of how we can possibly miss something if we will remove this sleep from tests.