Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Jenkins Scripts for Testbed Migration #6751

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

XinShuYang
Copy link
Contributor

  • Upgraded Windows images to restore CI on the latest kubernetes cluster
  • Updated external node names for testbed migration.

@XinShuYang
Copy link
Contributor Author

/test-windows-all

@XinShuYang XinShuYang force-pushed the migrationimage branch 3 times, most recently from 3ca1e4d to 3bf64fd Compare October 17, 2024 12:10
@XinShuYang
Copy link
Contributor Author

/test-windows-e2e

antrea_images=("e2eteam/agnhost:2.13" "us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13" "e2eteam/jessie-dnsutils:1.0" "e2eteam/pause:3.2")
k8s_images=("registry.k8s.io/e2e-test-images/agnhost:2.52" "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.5" "registry.k8s.io/e2e-test-images/nginx:1.14-2" "registry.k8s.io/pause:3.8" "registry.k8s.io/pause:3.10")
conformance_images=("k8sprow.azurecr.io/kubernetes-e2e-test-images/agnhost:2.52" "k8sprow.azurecr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.5" "k8sprow.azurecr.io/kubernetes-e2e-test-images/nginx:1.14-2" "k8sprow.azurecr.io/kubernetes-e2e-test-images/pause:3.8" "registry.k8s.io/e2e-test-images/pause:3.10")
e2e_images=("${DOCKER_REGISTRY}/antrea/toolbox:1.3-0" "registry.k8s.io/e2e-test-images/agnhost:2.40")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version of toolbox is toolbox:1.4-0, you can update it.

conformance_images=("k8sprow.azurecr.io/kubernetes-e2e-test-images/agnhost:2.45" "k8sprow.azurecr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.5" "k8sprow.azurecr.io/kubernetes-e2e-test-images/nginx:1.14-2" "k8sprow.azurecr.io/kubernetes-e2e-test-images/pause:3.8")
e2e_images=("toolbox:1.3-0")
antrea_images=("e2eteam/agnhost:2.13" "us.gcr.io/k8s-artifacts-prod/e2e-test-images/agnhost:2.13" "e2eteam/jessie-dnsutils:1.0" "e2eteam/pause:3.2")
k8s_images=("registry.k8s.io/e2e-test-images/agnhost:2.52" "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.5" "registry.k8s.io/e2e-test-images/nginx:1.14-2" "registry.k8s.io/pause:3.8" "registry.k8s.io/pause:3.10")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are two pause images needed? same questions for the agnhost image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agnhost image used by the Antrea e2e test is version 2.40, while the one used by the latest Kubernetes conformance test is 2.52. As for the pause image, I think we can use the latest version for testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can update the e2e test codes to use the latest agnhost:2.52 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works for me, but I would prefer updating it in a separate PR, as I haven’t tested the Windows e2e with this image before the testbed was shut down. Also, we need to update related code in all CI scripts, but the goal of this PR is just for the migration.

@XinShuYang
Copy link
Contributor Author

/test-vm-e2e

Comment on lines 153 to 155
kubectl get nodes --selector=kubernetes.io/os=linux --no-headers=true -o custom-columns=IP:.status.addresses[0].address | while read -r IP; do
rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" ${WORKDIR}/*.yml jenkins@${IP}:${WORKDIR}/
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to copy the manifest to all Nodes?

Copy link
Contributor Author

@XinShuYang XinShuYang Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only used by the control-plane node. I have updated this part, thanks for the suggestion.

Comment on lines +146 to +148
kubectl get nodes --selector=kubernetes.io/os=linux --no-headers=true -o custom-columns=IP:.status.addresses[0].address | while read -r IP; do
rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" $TEMP_ANTREA_TAR jenkins@${IP}:${WORKDIR}/$TEMP_ANTREA_TAR
ssh -o StrictHostKeyChecking=no -n jenkins@${IP} "crictl rmi --prune; crictl ps --state Exited; ctr -n=k8s.io images import ${WORKDIR}/$TEMP_ANTREA_TAR" || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why didn't we have that step (loading the image to the Nodes) before?

Copy link
Contributor Author

@XinShuYang XinShuYang Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, both Docker and containerd were installed on the same node. To avoid potential compatibility issues after kubernetes removed docker runtime, we have implemented the new topology for all jenkins testbeds: one node (the jumper node) has a Jenkins agent,Docker and kubeconfig installed, while another node has containerd and Kubernetes installed. When running CI, we need to copy the image from the jumper node to the Kubernetes node.

@@ -525,7 +525,7 @@ func testANPOnVMs(t *testing.T, data *TestData, vmList []vmInfo, osType string)
})
// Test FQDN rules in ANP
t.Run("testANPOnExternalNodeWithFQDN", func(t *testing.T) {
testANPWithFQDN(t, data, "anp-vmagent-fqdn", namespace, *appliedToVM, []string{"www.facebook.com"}, []string{"docs.google.com"}, []string{"github.com"})
testANPWithFQDN(t, data, "anp-vmagent-fqdn", namespace, *appliedToVM, []string{"docs.amazon.com"}, []string{"docs.google.com"}, []string{"github.com"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be done in a separate PR? I don't feel very strongly about it, so if you want to keep it in this PR, it is fine by me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to ensure that the VM e2e test can pass. I found that Facebook's URL is blocked in our new infrastructure...

antoninbas
antoninbas previously approved these changes Oct 23, 2024
ctr -n k8s.io image import $TEMP_ANTREA_TAR
IP=$(kubectl get nodes --selector=node-role.kubernetes.io/control-plane= --no-headers=true -o custom-columns=IP:.status.addresses[0].address)
rsync -avr --progress --inplace -e "ssh -o StrictHostKeyChecking=no" $TEMP_ANTREA_TAR jenkins@${IP}:${WORKDIR}/$TEMP_ANTREA_TAR
ssh -o StrictHostKeyChecking=no -n jenkins@${IP} "crictl rmi --prune; crictl ps --state Exited; ctr -n=k8s.io images import ${WORKDIR}/$TEMP_ANTREA_TAR" || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we only copy / import the image on the control plane Node, which I think is consistent with what we were doing before.
But in this case, how does the image become available to the other (worker) Nodes? Is there a different step where worker Nodes are added and we load the required images?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas The vm-e2e testbed includes only one controller node in the kubernetes cluster, other nodes are non-Kubernetes rather than Kubernetes worker nodes(Their image is copied and installed using copy_antrea_agent_files_on_linux/windows, which differs from the one used in the Kubernetes cluster).

Previously, we built Antrea image directly on the controller node, so there was no need to copy or import it again. However, we now need to build the image on the jumper node and then deliver it to the k8s controller node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I didn't realize we didn't have any separate K8s worker Nodes for this test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, it seems that we could still have iterated over all Nodes to load the images, not just the ones with the node-role.kubernetes.io/control-plane label, to be on the safe side, but the current version is fine if we don't expect to add regular K8s worker Nodes for this test in the future).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, it seems that we could still have iterated over all Nodes to load the images, not just the ones with the node-role.kubernetes.io/control-plane label, to be on the safe side, but the current version is fine if we don't expect to add regular K8s worker Nodes for this test in the future).

Yes, this will be better for future maintenance. I've updated the related code, thanks for the suggestion.

@antoninbas
Copy link
Contributor

/test-vm-e2e

* Upgraded Windows images to restore CI on the latest kubernetes cluster
* Updated external node e2e test and jenkins script for testbed migration.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
@XinShuYang
Copy link
Contributor Author

/test-vm-e2e

1 similar comment
@XinShuYang
Copy link
Contributor Author

/test-vm-e2e

@XinShuYang
Copy link
Contributor Author

@antoninbas The code has passed the vm-e2e test at least once, but the testbed has flaky failures, likely due to infrastructure issues. We can proceed with merging this PR, and I will continue investigating the testbed problems.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making the latest changes

@antoninbas antoninbas merged commit 47ce51e into antrea-io:main Oct 28, 2024
52 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants