Update Jenkins Scripts for Testbed Migration #6751

XinShuYang · 2024-10-17T07:12:53Z

Upgraded Windows images to restore CI on the latest kubernetes cluster
Updated external node names for testbed migration.

XinShuYang · 2024-10-17T08:10:09Z

/test-windows-all

XinShuYang · 2024-10-18T03:31:30Z

/test-windows-e2e

luolanzone · 2024-10-18T07:01:56Z

ci/jenkins/test.sh

+        antrea_images=("e2eteam/agnhost:2.13" "us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13" "e2eteam/jessie-dnsutils:1.0" "e2eteam/pause:3.2")
+        k8s_images=("registry.k8s.io/e2e-test-images/agnhost:2.52" "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.5" "registry.k8s.io/e2e-test-images/nginx:1.14-2" "registry.k8s.io/pause:3.8" "registry.k8s.io/pause:3.10")
+        conformance_images=("k8sprow.azurecr.io/kubernetes-e2e-test-images/agnhost:2.52" "k8sprow.azurecr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.5" "k8sprow.azurecr.io/kubernetes-e2e-test-images/nginx:1.14-2" "k8sprow.azurecr.io/kubernetes-e2e-test-images/pause:3.8" "registry.k8s.io/e2e-test-images/pause:3.10")
+        e2e_images=("${DOCKER_REGISTRY}/antrea/toolbox:1.3-0" "registry.k8s.io/e2e-test-images/agnhost:2.40")


The latest version of toolbox is toolbox:1.4-0, you can update it.

luolanzone · 2024-10-18T07:03:48Z

ci/jenkins/test.sh

-        conformance_images=("k8sprow.azurecr.io/kubernetes-e2e-test-images/agnhost:2.45" "k8sprow.azurecr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.5" "k8sprow.azurecr.io/kubernetes-e2e-test-images/nginx:1.14-2" "k8sprow.azurecr.io/kubernetes-e2e-test-images/pause:3.8")
-        e2e_images=("toolbox:1.3-0")
+        antrea_images=("e2eteam/agnhost:2.13" "us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13" "e2eteam/jessie-dnsutils:1.0" "e2eteam/pause:3.2")
+        k8s_images=("registry.k8s.io/e2e-test-images/agnhost:2.52" "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.5" "registry.k8s.io/e2e-test-images/nginx:1.14-2" "registry.k8s.io/pause:3.8" "registry.k8s.io/pause:3.10")


why are two pause images needed? same questions for the agnhost image

The agnhost image used by the Antrea e2e test is version 2.40, while the one used by the latest Kubernetes conformance test is 2.52. As for the pause image, I think we can use the latest version for testing.

I suppose we can update the e2e test codes to use the latest agnhost:2.52 as well?

Either works for me, but I would prefer updating it in a separate PR, as I haven’t tested the Windows e2e with this image before the testbed was shut down. Also, we need to update related code in all CI scripts, but the goal of this PR is just for the migration.

XinShuYang · 2024-10-18T09:01:32Z

/test-vm-e2e

antoninbas · 2024-10-22T20:37:04Z

ci/jenkins/test-vm.sh

+    kubectl get nodes --selector=kubernetes.io/os=linux --no-headers=true -o custom-columns=IP:.status.addresses[0].address | while read -r IP; do
+        rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" ${WORKDIR}/*.yml jenkins@${IP}:${WORKDIR}/
+    done


do we need to copy the manifest to all Nodes?

It is only used by the control-plane node. I have updated this part, thanks for the suggestion.

antoninbas · 2024-10-22T20:37:53Z

ci/jenkins/test-vm.sh

+    kubectl get nodes --selector=kubernetes.io/os=linux --no-headers=true -o custom-columns=IP:.status.addresses[0].address | while read -r IP; do
+        rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" $TEMP_ANTREA_TAR jenkins@${IP}:${WORKDIR}/$TEMP_ANTREA_TAR
+        ssh -o StrictHostKeyChecking=no -n jenkins@${IP} "crictl rmi --prune; crictl ps --state Exited; ctr -n=k8s.io images import ${WORKDIR}/$TEMP_ANTREA_TAR" || true


out of curiosity, why didn't we have that step (loading the image to the Nodes) before?

Previously, both Docker and containerd were installed on the same node. To avoid potential compatibility issues after kubernetes removed docker runtime, we have implemented the new topology for all jenkins testbeds: one node (the jumper node) has a Jenkins agent,Docker and kubeconfig installed, while another node has containerd and Kubernetes installed. When running CI, we need to copy the image from the jumper node to the Kubernetes node.

antoninbas · 2024-10-22T20:39:51Z

test/e2e/vmagent_test.go

@@ -525,7 +525,7 @@ func testANPOnVMs(t *testing.T, data *TestData, vmList []vmInfo, osType string)
 	})
 	// Test FQDN rules in ANP
 	t.Run("testANPOnExternalNodeWithFQDN", func(t *testing.T) {
-		testANPWithFQDN(t, data, "anp-vmagent-fqdn", namespace, *appliedToVM, []string{"www.facebook.com"}, []string{"docs.google.com"}, []string{"github.com"})
+		testANPWithFQDN(t, data, "anp-vmagent-fqdn", namespace, *appliedToVM, []string{"docs.amazon.com"}, []string{"docs.google.com"}, []string{"github.com"})


should this be done in a separate PR? I don't feel very strongly about it, so if you want to keep it in this PR, it is fine by me.

This is just to ensure that the VM e2e test can pass. I found that Facebook's URL is blocked in our new infrastructure...

antoninbas · 2024-10-23T21:51:40Z

ci/jenkins/test-vm.sh

-    ctr -n k8s.io image import $TEMP_ANTREA_TAR
+    IP=$(kubectl get nodes --selector=node-role.kubernetes.io/control-plane= --no-headers=true -o custom-columns=IP:.status.addresses[0].address)
+    rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" $TEMP_ANTREA_TAR jenkins@${IP}:${WORKDIR}/$TEMP_ANTREA_TAR
+    ssh -o StrictHostKeyChecking=no -n jenkins@${IP} "crictl rmi --prune; crictl ps --state Exited; ctr -n=k8s.io images import ${WORKDIR}/$TEMP_ANTREA_TAR" || true


So we only copy / import the image on the control plane Node, which I think is consistent with what we were doing before.
But in this case, how does the image become available to the other (worker) Nodes? Is there a different step where worker Nodes are added and we load the required images?

@antoninbas The vm-e2e testbed includes only one controller node in the kubernetes cluster, other nodes are non-Kubernetes rather than Kubernetes worker nodes(Their image is copied and installed using copy_antrea_agent_files_on_linux/windows, which differs from the one used in the Kubernetes cluster).

Previously, we built Antrea image directly on the controller node, so there was no need to copy or import it again. However, we now need to build the image on the jumper node and then deliver it to the k8s controller node.

Thanks for the explanation, I didn't realize we didn't have any separate K8s worker Nodes for this test

(BTW, it seems that we could still have iterated over all Nodes to load the images, not just the ones with the node-role.kubernetes.io/control-plane label, to be on the safe side, but the current version is fine if we don't expect to add regular K8s worker Nodes for this test in the future).

(BTW, it seems that we could still have iterated over all Nodes to load the images, not just the ones with the node-role.kubernetes.io/control-plane label, to be on the safe side, but the current version is fine if we don't expect to add regular K8s worker Nodes for this test in the future).

Yes, this will be better for future maintenance. I've updated the related code, thanks for the suggestion.

antoninbas · 2024-10-25T17:37:57Z

/test-vm-e2e

* Upgraded Windows images to restore CI on the latest kubernetes cluster * Updated external node e2e test and jenkins script for testbed migration. Signed-off-by: Shuyang Xin <gavinx@vmware.com>

XinShuYang · 2024-10-28T15:17:27Z

/test-vm-e2e

XinShuYang · 2024-10-28T15:19:03Z

/test-vm-e2e

XinShuYang · 2024-10-28T15:44:22Z

@antoninbas The code has passed the vm-e2e test at least once, but the testbed has flaky failures, likely due to infrastructure issues. We can proceed with merging this PR, and I will continue investigating the testbed problems.

antoninbas

LGTM, thanks for making the latest changes

XinShuYang force-pushed the migrationimage branch 3 times, most recently from 3ca1e4d to 3bf64fd Compare October 17, 2024 12:10

XinShuYang force-pushed the migrationimage branch from 3bf64fd to e37605b Compare October 18, 2024 06:03

luolanzone reviewed Oct 18, 2024

View reviewed changes

XinShuYang force-pushed the migrationimage branch from e37605b to bd2c31d Compare October 18, 2024 08:30

XinShuYang force-pushed the migrationimage branch 2 times, most recently from b4b2a46 to 1c57f9d Compare October 21, 2024 08:42

XinShuYang requested review from antoninbas and luolanzone October 21, 2024 08:51

antoninbas reviewed Oct 22, 2024

View reviewed changes

XinShuYang force-pushed the migrationimage branch from 1c57f9d to 080d3bd Compare October 23, 2024 07:11

XinShuYang requested a review from antoninbas October 23, 2024 08:55

antoninbas previously approved these changes Oct 23, 2024

View reviewed changes

Update Jenkins Scripts for Testbed Migration

e5d797e

* Upgraded Windows images to restore CI on the latest kubernetes cluster * Updated external node e2e test and jenkins script for testbed migration. Signed-off-by: Shuyang Xin <gavinx@vmware.com>

XinShuYang dismissed antoninbas’s stale review via e5d797e October 28, 2024 09:43

XinShuYang force-pushed the migrationimage branch from 080d3bd to e5d797e Compare October 28, 2024 09:43

antoninbas approved these changes Oct 28, 2024

View reviewed changes

antoninbas merged commit 47ce51e into antrea-io:main Oct 28, 2024
52 of 61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Jenkins Scripts for Testbed Migration #6751

Update Jenkins Scripts for Testbed Migration #6751

XinShuYang commented Oct 17, 2024

XinShuYang commented Oct 17, 2024

XinShuYang commented Oct 18, 2024

luolanzone Oct 18, 2024

luolanzone Oct 18, 2024

XinShuYang Oct 18, 2024

luolanzone Oct 18, 2024

XinShuYang Oct 18, 2024

XinShuYang commented Oct 18, 2024

antoninbas Oct 22, 2024

XinShuYang Oct 23, 2024 •

edited

Loading

antoninbas Oct 22, 2024

XinShuYang Oct 23, 2024 •

edited

Loading

antoninbas Oct 22, 2024

XinShuYang Oct 23, 2024

antoninbas Oct 23, 2024

XinShuYang Oct 25, 2024

antoninbas Oct 25, 2024

antoninbas Oct 25, 2024

XinShuYang Oct 28, 2024

antoninbas commented Oct 25, 2024

XinShuYang commented Oct 28, 2024

XinShuYang commented Oct 28, 2024

XinShuYang commented Oct 28, 2024

antoninbas left a comment

Update Jenkins Scripts for Testbed Migration #6751

Update Jenkins Scripts for Testbed Migration #6751

Conversation

XinShuYang commented Oct 17, 2024

XinShuYang commented Oct 17, 2024

XinShuYang commented Oct 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XinShuYang commented Oct 18, 2024

Choose a reason for hiding this comment

XinShuYang Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XinShuYang Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoninbas commented Oct 25, 2024

XinShuYang commented Oct 28, 2024

XinShuYang commented Oct 28, 2024

XinShuYang commented Oct 28, 2024

antoninbas left a comment

Choose a reason for hiding this comment

XinShuYang Oct 23, 2024 •

edited

Loading

XinShuYang Oct 23, 2024 •

edited

Loading