Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

antctl check cluster tries to pull wrong tag #6563

Closed
mkaring opened this issue Jul 28, 2024 · 4 comments · Fixed by #6565
Closed

antctl check cluster tries to pull wrong tag #6563

mkaring opened this issue Jul 28, 2024 · 4 comments · Fixed by #6565
Labels
area/component/antctl Issues or PRs releated to the command line interface component kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.

Comments

@mkaring
Copy link

mkaring commented Jul 28, 2024

Describe the bug

The command antctl check cluster using the version deployed with antrea v2.1.0 fails to startup the cluster-checker. The reason is that the image antrea/antrea-agent-ubuntu is requested with the tag "2.1.0" instead of "v2.1.0". This causes the image pull to fail.

  Normal   Pulling    52s (x4 over 2m24s)  kubelet            Pulling image "antrea/antrea-agent-ubuntu:2.1.0"
  Warning  Failed     51s (x4 over 2m23s)  kubelet            Failed to pull image "antrea/antrea-agent-ubuntu:2.1.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/antrea/antrea-agent-ubuntu:2.1.0": failed to resolve reference "docker.io/antrea/antrea-agent-ubuntu:2.1.0": docker.io/antrea/antrea-agent-ubuntu:2.1.0: not found
  Warning  Failed     51s (x4 over 2m23s)  kubelet            Error: ErrImagePull
  Warning  Failed     41s (x6 over 2m23s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    30s (x7 over 2m23s)  kubelet            Back-off pulling image "antrea/antrea-agent-ubuntu:2.1.0"

The correct image has to tag "v2.1.0" according to DockerHub: antrea/antrea-agent-ubuntu:v2.1.0

Subsequently this causes the command itself to also fail.

[kubernetes] Creating Namespace antrea-test-5uat0 for pre installation tests...
[kubernetes] Creating Deployment
[kubernetes] Waiting for Deployment to become ready
[kubernetes] Waiting for Deployment cluster-checker to become ready...
Error: error while waiting for Deployment to become ready: waiting for Deployment cluster-checker to become ready has been interrupted: context deadline exceeded

To Reproduce

  1. Get the antctl executable from the v2.1.0 release.
  2. Run antctl check cluster
  3. Wait for the timeout error

After that you can easily inspect the pod created and check the events.

Expected

The command starts its checker pod up correctly.

Versions:

  • Antrea: v2.1.0
  • Kubernetes:
    • Client Version: v1.29.2
    • Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    • Server Version: v1.29.7
  • Container Runtime: containerd 1.7.19
  • Control Plane OS: Ubuntu 22.04.4 LTS
@mkaring mkaring added the kind/bug Categorizes issue or PR as related to a bug. label Jul 28, 2024
@luolanzone
Copy link
Contributor

Hi @mkaring may I know how did you deploy the cluster with Antrea? This tool antctl check cluster is used before Antrea is installed. But according to your description, seems the Antrea is already installed but not succeeded.

@tnqn
Copy link
Member

tnqn commented Jul 29, 2024

Thanks @mkaring for reporting it.

@luolanzone I can reproduce the issue. It's a valid one. There is an issue when getting the image tag when antctl is built for release, as @mkaring has pointed out.

@tnqn tnqn added area/component/antctl Issues or PRs releated to the command line interface component reported-by/end-user Issues reported by end users. labels Jul 29, 2024
@mkaring
Copy link
Author

mkaring commented Jul 29, 2024

@luolanzone The installation of antrea did indeed fail at the point that the OVS from my windows node did not seem to be able to connect with the one on the control plane.

The check of the cluster installation also showed timeout problems, because the containers did not start up. The reason for that is that the antrea agent did not start on my windows node.

As I was stuck at this point, I had the idea to tear down the entire cluster and start again from scratch, to see what the checker has to say about it.

@luolanzone
Copy link
Contributor

Hi @mkaring , thanks for the info, we confirmed that it is indeed a problem in antctl check cluster, @tnqn had submitted a bugfix via PR #6565.

@tnqn tnqn closed this as completed in #6565 Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/component/antctl Issues or PRs releated to the command line interface component kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants