Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tenant fails creation with RELEASE.2021-09-03T03-56-13Z #803

Closed
jr0dd opened this issue Sep 3, 2021 · 24 comments
Closed

tenant fails creation with RELEASE.2021-09-03T03-56-13Z #803

jr0dd opened this issue Sep 3, 2021 · 24 comments
Assignees
Labels

Comments

@jr0dd
Copy link

jr0dd commented Sep 3, 2021

The operator and operator console load up fine on 4.2.4. But there is an issue when creating the tenant after updating to RELEASE.2021-09-03T03-56-13Z. Rolling back to RELEASE.2021-08-31T05-46-54Z got everything back up and running again.

ERROR Unable to validate passed arguments in MINIO_ARGS:env+tls://Mg8Cc4bzEdqMOddlnSL7:o1vjIbF6RCHySn3cM5CVAllejppL7K1BHRPK0PCm@operator.minio.svc.cluster.local:4222/webhook/v1/getenv/minio/tenant0: Get "https://operator.minio.svc.cluster.local:4222/webhook/v1/getenv/minio/tenant0?key=MINIO_ARGS": dial tcp 172.17.100.186:4222: connect: connection refused
@dshuvar
Copy link

dshuvar commented Sep 3, 2021

same case
with image: minio/minio:RELEASE.2021-08-31T05-46-54Z tenant has been created fine.

@harshavardhana
Copy link
Member

The operator and operator console load up fine on 4.2.4. But there is an issue when creating the tenant after updating to RELEASE.2021-09-03T03-56-13Z. Rolling back to RELEASE.2021-08-31T05-46-54Z got everything back up and running again.

ERROR Unable to validate passed arguments in MINIO_ARGS:env+tls://Mg8Cc4bzEdqMOddlnSL7:o1vjIbF6RCHySn3cM5CVAllejppL7K1BHRPK0PCm@operator.minio.svc.cluster.local:4222/webhook/v1/getenv/minio/tenant0: Get "https://operator.minio.svc.cluster.local:4222/webhook/v1/getenv/minio/tenant0?key=MINIO_ARGS": dial tcp 172.17.100.186:4222: connect: connection refused

This error is correct it shows that operator should have the webhook running and it's not running.

@harshavardhana
Copy link
Member

Do you have network policies that restrict access for MinIO tenant pods to talk to operator namespace? @jr0dd @dshuvar

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

Do you have network policies that restrict access for MinIO tenant pods to talk to operator namespace? @jr0dd @dshuvar

No I do not. Rolling back to RELEASE.2021-08-31T05-46-54Z fixed it for now. Any attempt to update today's build gives that error.

@harshavardhana
Copy link
Member

No I do not. Rolling back to RELEASE.2021-08-31T05-46-54Z fixed it for now. Any attempt to update today's build gives that error.

That rollback is not right @jr0dd the issue is very clear operator webhook is not reachable.

The upgrade is showing an existing problem in our operator since it was using a fallback hack previously that is removed in the latest releases.

@harshavardhana
Copy link
Member

Please share the operator logs @jr0dd

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

I0903 17:48:34.764102       1 main-controller.go:700] minio/tenant0 Detected we are adding a new pool
I0903 17:48:34.771123       1 main-controller.go:841] 'minio/tenant0' Error waiting for pool to be ready: Get "https://tenant0-ss-0-0.tenant0-hl.minio.svc.cluster.local:9000/minio/admin/v3/info": dial tcp 172.16.0.180:9000: connect: connection refused
E0903 17:48:34.771214       1 main-controller.go:461] error syncing 'minio/tenant0': Waiting for all pools to initialize
I0903 17:48:34.959925       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 17:48:35.750272       1 monitoring.go:106] 'minio/tenant0' no pool is initialized

@harshavardhana
Copy link
Member

I0903 17:48:34.764102 1 main-controller.go:700] minio/tenant0 Detected we are adding a new pool
I0903 17:48:34.771123 1 main-controller.go:841] 'minio/tenant0' Error waiting for pool to be ready: Get "https://tenant0-ss-0-0.tenant0-hl.minio.svc.cluster.local:9000/minio/admin/v3/info": dial tcp 172.16.0.180:9000: connect: connection refused
E0903 17:48:34.771214 1 main-controller.go:461] error syncing 'minio/tenant0': Waiting for all pools to initialize
I0903 17:48:34.959925 1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 17:48:35.750272 1 monitoring.go:106] 'minio/tenant0' no pool is initialized

can you start a ubuntu container and then try to reach out to curl https://operator.minio.svc.cluster.local:4222 --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

@harshavardhana
Copy link
Member

@jr0dd also can you list out the steps properly here what you did here?

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

I0903 17:48:34.764102 1 main-controller.go:700] minio/tenant0 Detected we are adding a new pool
I0903 17:48:34.771123 1 main-controller.go:841] 'minio/tenant0' Error waiting for pool to be ready: Get "https://tenant0-ss-0-0.tenant0-hl.minio.svc.cluster.local:9000/minio/admin/v3/info": dial tcp 172.16.0.180:9000: connect: connection refused
E0903 17:48:34.771214 1 main-controller.go:461] error syncing 'minio/tenant0': Waiting for all pools to initialize
I0903 17:48:34.959925 1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 17:48:35.750272 1 monitoring.go:106] 'minio/tenant0' no pool is initialized

can you start a ubuntu container and then try to reach out to curl https://operator.minio.svc.cluster.local:4222 --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

*   Trying 172.17.173.3:4222...
* TCP_NODELAY set
* connect to 172.17.173.3 port 4222 failed: Connection refused
* Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused
* Closing connection 0
curl: (7) Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

@jr0dd also can you list out the steps properly here what you did here?

all i did was change the image to the new release. nothing else

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

@harshavardhana

operator:
  image:
    repository: quay.io/minio/operator
    tag: v4.2.4
console:
  image:
    repository: quay.io/minio/console
    tag: v0.9.6
  ingress:
    enabled: true
    ingressClass: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-production
      traefik.ingress.kubernetes.io/router.entrypoints: websecure
    host: "minio.${SECRET_DOMAIN}"
    path: /
    tls:
      - secretName: minio-tls
        hosts:
          - "minio.${SECRET_DOMAIN}"
tenants:
  - name: tenant0
    image:
      repository: quay.io/minio/minio
      tag: RELEASE.2021-09-03T03-56-13Z
    namespace: minio
    pools:
      - servers: 4
        volumesPerServer: 1
        size: 20Gi
        storageClassName: openebs-zfspv
    secrets:
      enabled: true
      name: minio-creds
      accessKey: "${MINIO_ACCESS_KEY}"
      secretKey: "${MINIO_SECRET_KEY}"
    metrics:
      enabled: true
      port: 9000
    certificate:
      requestAutoCert: true
    env:
      - name: MINIO_BROWSER_REDIRECT_URL
        value: "https://minio.${SECRET_DOMAIN}"

@harshavardhana
Copy link
Member

  • Trying 172.17.173.3:4222...
  • TCP_NODELAY set
  • connect to 172.17.173.3 port 4222 failed: Connection refused
  • Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused
  • Closing connection 0
    curl: (7) Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused

yeah this is the problem @jr0dd - can you restart the operator-pod and try the curl again?

@harshavardhana
Copy link
Member

  • Trying 172.17.173.3:4222...
  • TCP_NODELAY set
  • connect to 172.17.173.3 port 4222 failed: Connection refused
  • Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused
  • Closing connection 0
    curl: (7) Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused

yeah this is the problem @jr0dd - can you restart the operator-pod and try the curl again?

wait did you install operator to custom namespace? operator.minio.svc.cluster.local instead of minio-operator ?

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

  • Trying 172.17.173.3:4222...
  • TCP_NODELAY set
  • connect to 172.17.173.3 port 4222 failed: Connection refused
  • Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused
  • Closing connection 0
    curl: (7) Failed to connect to operator.minio.svc.cluster.local port 4222: Connection refused

yeah this is the problem @jr0dd - can you restart the operator-pod and try the curl again?

wait did you install operator to custom namespace? operator.minio.svc.cluster.local instead of minio-operator ?

correct I installed to just minio namespace

@harshavardhana
Copy link
Member

which k8s deployment is this? baremetal or cloud k8s? @jr0dd

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

baremetal

Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4+k3s1", GitCommit:"3e250fdbab72d88f7e6aae57446023a0567ffc97", GitTreeState:"clean", BuildDate:"2021-08-19T19:09:53Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

@harshavardhana
Copy link
Member

I0903 17:48:34.764102 1 main-controller.go:700] minio/tenant0 Detected we are adding a new pool
I0903 17:48:34.771123 1 main-controller.go:841] 'minio/tenant0' Error waiting for pool to be ready: Get "https://tenant0-ss-0-0.tenant0-hl.minio.svc.cluster.local:9000/minio/admin/v3/info": dial tcp 172.16.0.180:9000: connect: connection refused
E0903 17:48:34.771214 1 main-controller.go:461] error syncing 'minio/tenant0': Waiting for all pools to initialize
I0903 17:48:34.959925 1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 17:48:35.750272 1 monitoring.go:106] 'minio/tenant0' no pool is initialized

Would you mind sharing the entire log here? - do not truncate it please @jr0dd

@dvaldivia
Copy link
Collaborator

We are interested to see if the API started @jr0dd, #805 will make it so Tenant's won't sync if the Operator API hasn't started

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

@dvaldivia @harshavardhana

I0903 18:34:12.676264       1 main.go:74] Starting MinIO Operator
I0903 18:34:12.998910       1 main.go:141] caBundle on CRD updated
I0903 18:34:13.000732       1 main-controller.go:242] Setting up event handlers
I0903 18:34:13.001298       1 main-controller.go:359] Starting Tenant controller
I0903 18:34:13.001320       1 main-controller.go:362] Waiting for informer caches to sync
I0903 18:34:13.041964       1 main-controller.go:342] Starting HTTPS api server
I0903 18:34:13.042600       1 main-controller.go:345] HTTPS server ListenAndServeTLS: tls: private key does not match public key
I0903 18:34:13.101620       1 main-controller.go:367] Starting workers
I0903 18:34:13.200552       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:18.070442       1 upgrades.go:94] tenant0 has no log secret
E0903 18:34:18.073145       1 upgrades.go:121] Error deleting operator webhook secret, manual deletion is needed: secrets "operator-webhook-secret" not found
I0903 18:34:18.085824       1 status.go:195] Hit conflict issue, getting latest version of tenant to update version
I0903 18:34:18.098340       1 status.go:155] Hit conflict issue, getting latest version of tenant
I0903 18:34:18.147545       1 status.go:54] Hit conflict issue, getting latest version of tenant
I0903 18:34:18.645267       1 status.go:54] Hit conflict issue, getting latest version of tenant
I0903 18:34:19.647178       1 main-controller.go:725] 'minio/tenant0': Deploying pool ss-0
I0903 18:34:21.042201       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:21.442595       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:21.842948       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:22.251258       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:22.642908       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:23.044592       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:23.447333       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:23.842738       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:24.242659       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:24.642329       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:25.050920       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:34:25.243141       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:19.723473       1 main-controller.go:841] 'minio/tenant0' Error waiting for pool to be ready: Get "https://tenant0-ss-0-0.tenant0-hl.minio.svc.cluster.local:9000/minio/admin/v3/info": dial tcp: lookup tenant0-ss-0-0.tenant0-hl.minio.svc.cluster.local on 172.17.0.10:53: no such host
E0903 18:36:19.723962       1 main-controller.go:461] error syncing 'minio/tenant0': Waiting for all pools to initialize
I0903 18:36:19.740255       1 minio.go:136] Generating private key
I0903 18:36:19.740589       1 minio.go:149] Generating CSR with CN=*.tenant0-hl.minio.svc.cluster.local
I0903 18:36:19.769930       1 csr.go:145] Start polling for certificate of csr/tenant0-minio-csr, every 5s, timeout after 20m0s
I0903 18:36:24.794769       1 csr.go:170] Certificate successfully fetched, creating secret with Private key and Certificate
E0903 18:36:24.822772       1 main-controller.go:461] error syncing 'minio/tenant0': waiting for minio cert
I0903 18:36:36.863260       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:36.933362       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:39.970858       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:40.102480       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:40.181698       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:40.296388       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:41.160553       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:43.346482       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:44.364975       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:44.398780       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:44.417462       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:45.346203       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:45.942881       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:46.741479       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:47.742504       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:48.589806       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:49.340979       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:36:50.145439       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:05.595947       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:06.865211       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:07.009201       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:07.459554       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:08.657402       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:09.457718       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:10.257456       1 monitoring.go:106] 'minio/tenant0' no pool is initialized
I0903 18:37:13.207131       1 monitoring.go:106] 'minio/tenant0' no pool is initialized

@harshavardhana
Copy link
Member

I0903 18:34:13.042600 1 main-controller.go:345] HTTPS server ListenAndServeTLS: tls: private key does not match public key

Here is the issue @jr0dd

@jr0dd
Copy link
Author

jr0dd commented Sep 3, 2021

I0903 18:34:13.042600 1 main-controller.go:345] HTTPS server ListenAndServeTLS: tls: private key does not match public key

Here is the issue @jr0dd

ugh....
I deleted the old operator-tls secret and the tenant loaded now.

@harshavardhana
Copy link
Member

I deleted the old operator-tls secret and the tenant loaded now.

👍🏽 @dshuvar please verify this as well on your end.

@harshavardhana
Copy link
Member

To avoid this mistake we added #807

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants