-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Jenkins Scripts for Testbed Migration #6751
Conversation
XinShuYang
commented
Oct 17, 2024
- Upgraded Windows images to restore CI on the latest kubernetes cluster
- Updated external node names for testbed migration.
/test-windows-all |
3ca1e4d
to
3bf64fd
Compare
/test-windows-e2e |
3bf64fd
to
e37605b
Compare
ci/jenkins/test.sh
Outdated
antrea_images=("e2eteam/agnhost:2.13" "us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13" "e2eteam/jessie-dnsutils:1.0" "e2eteam/pause:3.2") | ||
k8s_images=("registry.k8s.io/e2e-test-images/agnhost:2.52" "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.5" "registry.k8s.io/e2e-test-images/nginx:1.14-2" "registry.k8s.io/pause:3.8" "registry.k8s.io/pause:3.10") | ||
conformance_images=("k8sprow.azurecr.io/kubernetes-e2e-test-images/agnhost:2.52" "k8sprow.azurecr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.5" "k8sprow.azurecr.io/kubernetes-e2e-test-images/nginx:1.14-2" "k8sprow.azurecr.io/kubernetes-e2e-test-images/pause:3.8" "registry.k8s.io/e2e-test-images/pause:3.10") | ||
e2e_images=("${DOCKER_REGISTRY}/antrea/toolbox:1.3-0" "registry.k8s.io/e2e-test-images/agnhost:2.40") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version of toolbox is toolbox:1.4-0, you can update it.
ci/jenkins/test.sh
Outdated
conformance_images=("k8sprow.azurecr.io/kubernetes-e2e-test-images/agnhost:2.45" "k8sprow.azurecr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.5" "k8sprow.azurecr.io/kubernetes-e2e-test-images/nginx:1.14-2" "k8sprow.azurecr.io/kubernetes-e2e-test-images/pause:3.8") | ||
e2e_images=("toolbox:1.3-0") | ||
antrea_images=("e2eteam/agnhost:2.13" "us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13" "e2eteam/jessie-dnsutils:1.0" "e2eteam/pause:3.2") | ||
k8s_images=("registry.k8s.io/e2e-test-images/agnhost:2.52" "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.5" "registry.k8s.io/e2e-test-images/nginx:1.14-2" "registry.k8s.io/pause:3.8" "registry.k8s.io/pause:3.10") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are two pause images needed? same questions for the agnhost image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agnhost image used by the Antrea e2e test is version 2.40, while the one used by the latest Kubernetes conformance test is 2.52. As for the pause image, I think we can use the latest version for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can update the e2e test codes to use the latest agnhost:2.52 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either works for me, but I would prefer updating it in a separate PR, as I haven’t tested the Windows e2e with this image before the testbed was shut down. Also, we need to update related code in all CI scripts, but the goal of this PR is just for the migration.
e37605b
to
bd2c31d
Compare
/test-vm-e2e |
b4b2a46
to
1c57f9d
Compare
ci/jenkins/test-vm.sh
Outdated
kubectl get nodes --selector=kubernetes.io/os=linux --no-headers=true -o custom-columns=IP:.status.addresses[0].address | while read -r IP; do | ||
rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" ${WORKDIR}/*.yml jenkins@${IP}:${WORKDIR}/ | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to copy the manifest to all Nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only used by the control-plane node. I have updated this part, thanks for the suggestion.
kubectl get nodes --selector=kubernetes.io/os=linux --no-headers=true -o custom-columns=IP:.status.addresses[0].address | while read -r IP; do | ||
rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" $TEMP_ANTREA_TAR jenkins@${IP}:${WORKDIR}/$TEMP_ANTREA_TAR | ||
ssh -o StrictHostKeyChecking=no -n jenkins@${IP} "crictl rmi --prune; crictl ps --state Exited; ctr -n=k8s.io images import ${WORKDIR}/$TEMP_ANTREA_TAR" || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, why didn't we have that step (loading the image to the Nodes) before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, both Docker and containerd were installed on the same node. To avoid potential compatibility issues after kubernetes removed docker runtime, we have implemented the new topology for all jenkins testbeds: one node (the jumper node) has a Jenkins agent,Docker and kubeconfig installed, while another node has containerd and Kubernetes installed. When running CI, we need to copy the image from the jumper node to the Kubernetes node.
@@ -525,7 +525,7 @@ func testANPOnVMs(t *testing.T, data *TestData, vmList []vmInfo, osType string) | |||
}) | |||
// Test FQDN rules in ANP | |||
t.Run("testANPOnExternalNodeWithFQDN", func(t *testing.T) { | |||
testANPWithFQDN(t, data, "anp-vmagent-fqdn", namespace, *appliedToVM, []string{"www.facebook.com"}, []string{"docs.google.com"}, []string{"github.com"}) | |||
testANPWithFQDN(t, data, "anp-vmagent-fqdn", namespace, *appliedToVM, []string{"docs.amazon.com"}, []string{"docs.google.com"}, []string{"github.com"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be done in a separate PR? I don't feel very strongly about it, so if you want to keep it in this PR, it is fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to ensure that the VM e2e test can pass. I found that Facebook's URL is blocked in our new infrastructure...
1c57f9d
to
080d3bd
Compare
ci/jenkins/test-vm.sh
Outdated
ctr -n k8s.io image import $TEMP_ANTREA_TAR | ||
IP=$(kubectl get nodes --selector=node-role.kubernetes.io/control-plane= --no-headers=true -o custom-columns=IP:.status.addresses[0].address) | ||
rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" $TEMP_ANTREA_TAR jenkins@${IP}:${WORKDIR}/$TEMP_ANTREA_TAR | ||
ssh -o StrictHostKeyChecking=no -n jenkins@${IP} "crictl rmi --prune; crictl ps --state Exited; ctr -n=k8s.io images import ${WORKDIR}/$TEMP_ANTREA_TAR" || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we only copy / import the image on the control plane Node, which I think is consistent with what we were doing before.
But in this case, how does the image become available to the other (worker) Nodes? Is there a different step where worker Nodes are added and we load the required images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas The vm-e2e testbed includes only one controller node in the kubernetes cluster, other nodes are non-Kubernetes rather than Kubernetes worker nodes(Their image is copied and installed using copy_antrea_agent_files_on_linux/windows
, which differs from the one used in the Kubernetes cluster).
Previously, we built Antrea image directly on the controller node, so there was no need to copy or import it again. However, we now need to build the image on the jumper node and then deliver it to the k8s controller node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I didn't realize we didn't have any separate K8s worker Nodes for this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BTW, it seems that we could still have iterated over all Nodes to load the images, not just the ones with the node-role.kubernetes.io/control-plane
label, to be on the safe side, but the current version is fine if we don't expect to add regular K8s worker Nodes for this test in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BTW, it seems that we could still have iterated over all Nodes to load the images, not just the ones with the
node-role.kubernetes.io/control-plane
label, to be on the safe side, but the current version is fine if we don't expect to add regular K8s worker Nodes for this test in the future).
Yes, this will be better for future maintenance. I've updated the related code, thanks for the suggestion.
/test-vm-e2e |
* Upgraded Windows images to restore CI on the latest kubernetes cluster * Updated external node e2e test and jenkins script for testbed migration. Signed-off-by: Shuyang Xin <gavinx@vmware.com>
080d3bd
to
e5d797e
Compare
/test-vm-e2e |
1 similar comment
/test-vm-e2e |
@antoninbas The code has passed the vm-e2e test at least once, but the testbed has flaky failures, likely due to infrastructure issues. We can proceed with merging this PR, and I will continue investigating the testbed problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making the latest changes