Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After upgrading to 2.13.2 (1.5.1 velero) backup location not found error #168

Closed
sagor999 opened this issue Oct 5, 2020 · 29 comments
Closed

Comments

@sagor999
Copy link
Contributor

sagor999 commented Oct 5, 2020

What steps did you take and what happened:
Upgraded from helm chart version 2.12.17 to 2.13.2.
Velero pod is now spitting out this into log:

time="2020-10-05T20:33:20Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2020-10-05T20:33:20Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"                                                                     

Checked that backuplocations do exist in the cluster (and they were working just fine in previous release)

Tried to remove the chart, removed CRDs, re-install the chart. Checked that backuplocations were added back. And same error.
For some reason it no longer sees back up locations.

Environment:

  • helm version (use helm version):
    Version:"v3.3.1"
  • helm chart version and app version:
    • helm3 (use helm list -n <YOUR NAMESPACE>):
NAME  	NAMESPACE	REVISION	UPDATED                               	STATUS  	CHART        	APP VERSION
velero	velero   	1       	2020-10-05 20:31:07.85468752 +0000 UTC	deployed	velero-2.13.2	1.5.1  
  • Kubernetes version (use kubectl version):
    1.17.3
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
    ubuntu 18.04
@zubron
Copy link
Contributor

zubron commented Oct 5, 2020

This is definitely a confusing message, but does not indicate that there is an error with your setup. The messages comes from the BackupStorageLocation controller when it is verifying each of the backup locations. Each backup location has a frequency at which it should be verified and if enough time has passed since the last time it was verified, it will be verified again. While iterating over all the locations, if none of the locations needed to be verified, then the controller will print the message that no backup locations were ready to be verified. Although this looks like an error, the state is OK and the backup location will be verified at a later time.

This frequency and wording of this log message is confusing and is something that we are tracking to fix in the velero project (vmware-tanzu/velero#2942). If you have any further comments or questions about this, it would be really helpful if you could direct them to the issue in the velero project.

@zubron
Copy link
Contributor

zubron commented Oct 5, 2020

Given that the issue with the wording is being tracked in that project, I am going to close this issue.

@zubron zubron closed this as completed Oct 5, 2020
@Slutzky
Copy link

Slutzky commented Nov 14, 2020

hi @zubron

All backups stuck with progress status after removing helm chart 1.4.2 + crd and installing 1.5.2

velero-backup-application-20201114125750 InProgress 2020-11-14 13:57:50 +0100 CET 29d gcp <none> velero-backup-application-20201113125712 InProgress 2020-11-13 13:57:28 +0100 CET 28d gcp <none> velero-backup-kubedb-20201114125750 InProgress 2020-11-14 13:58:07 +0100 CET 29d gcp <none>

log from velero pod

time="2020-11-14T20:12:21Z" level=error msg="Error patching backup location's last-synced time" backupLocation=gcp controller=backup-sync error="backupstoragelocations.velero.io \"gcp\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_sync_controller.go:311" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:311" time="2020-11-14T20:12:51Z" level=error msg="Error patching backup location's last-synced time" backupLocation=gcp controller=backup-sync error="backupstoragelocations.velero.io \"gcp\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_sync_controller.go:311" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:311"

helm ls

velero velero 2 2020-11-14 19:57:51.32498833 +0000 UTC deployed velero-2.12.17 1.4.2

No problem with version 1.4.2.

@jenting
Copy link
Collaborator

jenting commented Nov 16, 2020

hi @zubron

All backups stuck with progress status after removing helm chart 1.4.2 + crd and installing 1.5.2

velero-backup-application-20201114125750 InProgress 2020-11-14 13:57:50 +0100 CET 29d gcp <none> velero-backup-application-20201113125712 InProgress 2020-11-13 13:57:28 +0100 CET 28d gcp <none> velero-backup-kubedb-20201114125750 InProgress 2020-11-14 13:58:07 +0100 CET 29d gcp <none>

log from velero pod

time="2020-11-14T20:12:21Z" level=error msg="Error patching backup location's last-synced time" backupLocation=gcp controller=backup-sync error="backupstoragelocations.velero.io \"gcp\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_sync_controller.go:311" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:311" time="2020-11-14T20:12:51Z" level=error msg="Error patching backup location's last-synced time" backupLocation=gcp controller=backup-sync error="backupstoragelocations.velero.io \"gcp\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_sync_controller.go:311" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupSyncController).run" logSource="pkg/controller/backup_sync_controller.go:311"

helm ls

velero velero 2 2020-11-14 19:57:51.32498833 +0000 UTC deployed velero-2.12.17 1.4.2

No problem with version 1.4.2.

@Slutzky could you please provides the helm install commands and the related values.yaml? Thank you.

@Slutzky
Copy link

Slutzky commented Nov 16, 2020

I'm using flux. Flux installs the velero by applying the following "HelmRelease"

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: velero
  namespace: velero
  annotations:
    fluxcd.io/automated: "false"
spec:
  releaseName: velero
  chart:
    repository: https://vmware-tanzu.github.io/helm-charts
    name: velero
    version: 2.12.17
  values:
    image:
      repository: velero/velero
      tag: v1.5.2    
    configuration:
      backupStorageLocation:
        bucket: bucket
        name: gcp
        prefix: null
      provider: gcp
      resticTimout: 4h
      volumeSnapshotLocation:
        name: gcp
    credentials:
      existingSecret: cloud-credentials
      useSecret: true
    deployRestic: false
    initContainers:
    - image: velero/velero-plugin-for-gcp:v1.0.0
      imagePullPolicy: IfNotPresent
      name: velero-plugin-for-gcp
      volumeMounts:
      - mountPath: /target
        name: plugins
    metrics:
      enabled: true
      scrapeInterval: 30s
      serviceMonitor:
        additionalLabels:
          prometheus: kube-prometheus
        enabled: true
    schedules:
      backup-volumes:
        schedule: '@every 24h'
        template:
          ttl: 720h
          includeClusterResources: false
          includedNamespaces:
            - dev
          includedResources:
            - persistentvolumes
            - persistentvolumeclaims
          snapshotVolumes: true
          storageLocation: gcp
    serviceAccount:
      server:
        name: velero

@jenting
Copy link
Collaborator

jenting commented Nov 19, 2020

@Slutzky,
Could you please check the output of velero backup-location get and kubectl get backupstoragelocations.velero.io?
And I'd suggest bumping the velero-plugin-for-gcp plugin tag to v1.1.0.

@Slutzky
Copy link

Slutzky commented Nov 19, 2020

hi @jenting .

It seems problem is in new CRD they added a check from velero pod so velero pod checks if it is up then it marks it as available but because of flux and the lifecycle hook this gcp backupstoragelocation was always synced from flux , this means it would set the flag available as empty when it synced next run and available is only set at beginning in velero so velero cannot recover.

Issue has been solved.

Thank you for your support.

Best

@jenting
Copy link
Collaborator

jenting commented Nov 19, 2020

hi @jenting .

It seems problem is in new CRD they added a check from velero pod so velero pod checks if it is up then it marks it as available but because of flux and the lifecycle hook this gcp backupstoragelocation was always synced from flux , this means it would set the flag available as empty when it synced next run and available is only set at beginning in velero so velero cannot recover.

Issue has been solved.

Thank you for your support.

Best

@Slutzky good to know the main root cause, although I did not know how you resolve the issue. If possible, could you please elaborate more on how you fix the issue? Thank you.

@Slutzky
Copy link

Slutzky commented Nov 19, 2020

sure, I'm using Flux to install velero, I added
enableHelmHooks: false into the velero.yaml file.

chart:
    repository: https://vmware-tanzu.github.io/helm-charts
    name: velero
    version:  2.12.13
  values:
    image:
      repository: velero/velero
      tag: v1.5.2    
    configuration:
      backupStorageLocation:
        bucket: prod
      useSecret: true
    deployRestic: false
    initContainers:
    - image: velero/velero-plugin-for-gcp:v1.0.0
      imagePullPolicy: IfNotPresent
      name: velero-plugin-for-gcp
      volumeMounts:
      - mountPath: /target
        name: plugins
    enableHelmHooks: false

https://docs.fluxcd.io/projects/helm-operator/en/1.0.0-rc9/references/helmrelease-custom-resource.html#configuration

@jenting
Copy link
Collaborator

jenting commented Nov 19, 2020

Thank you, so this is related to #171

sure, I'm using Flux to install velero, I added
enableHelmHooks: false into the velero.yaml file.

chart:
    repository: https://vmware-tanzu.github.io/helm-charts
    name: velero
    version:  2.12.13
  values:
    image:
      repository: velero/velero
      tag: v1.5.2    
    configuration:
      backupStorageLocation:
        bucket: prod
      useSecret: true
    deployRestic: false
    initContainers:
    - image: velero/velero-plugin-for-gcp:v1.0.0
      imagePullPolicy: IfNotPresent
      name: velero-plugin-for-gcp
      volumeMounts:
      - mountPath: /target
        name: plugins
    enableHelmHooks: false

https://docs.fluxcd.io/projects/helm-operator/en/1.0.0-rc9/references/helmrelease-custom-resource.html#configuration

Thank you, so this is related to #171.

@talha0324
Copy link

Hey, I'm using latest helm release v2.14.12 and still seeing this error,
velero v1.5.3
restic enabled

@jenting
Copy link
Collaborator

jenting commented Mar 2, 2021

Hey, I'm using latest helm release v2.14.12 and still seeing this error,
velero v1.5.3
restic enabled

Are you install Velero helm chart fresh install or perform the upgrade?
If you perform the upgrade, remember that helm upgrade won't upgrade the CRD manifest, you have to install the velero v1.5.3 client and run velero install --crds-only --dry-run -o yaml | kubectl apply -f -.

@talha0324
Copy link

I did the fresh install.
I also tried to make template out of helm charts and individually apply everything. But I still got the same error

@jenting
Copy link
Collaborator

jenting commented Mar 3, 2021

I did the fresh install.
I also tried to make template out of helm charts and individually apply everything. But I still got the same error

Could you please provide more information? Like velero backup-location get and also the log of Velero server Pod, thank u.

@talha0324
Copy link

velero backup-location get -n openshift-velero

NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE
default   aws        velero_backup   Available   2021-03-03 11:44:33 +0500 PKT   ReadWrite

Velero Server logs

INFO[0021] Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available  controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
INFO[0021] No backup locations were ready to be verified  controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
ERRO[0021] Error getting backup store for this location  backupLocation=default controller=backup-sync error="unable to locate ObjectStore plugin named velero.io/aws" logSource="pkg/controller/backup_sync_controller.go:168"
ERRO[0021] Error checking repository for stale locks     controller=restic-repository error="error running command=restic unlock --repo=s3:https://object-eu-se-1a.binero.cloud/velero_backup/restic/hanzala-12 --password-file=/tmp/velero-restic-credentials-hanzala-12848765126, stdout=, stderr=: exec: \"restic\": executable file not found in $PATH" error.file="/github.com/vmware-tanzu/velero/pkg/restic/repository_manager.go:286" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*repositoryManager).exec" logSource="pkg/controller/restic_repository_controller.go:144" name=hanzala-12-default-xtbsx namespace=openshift-velero
ERRO[0021] Error checking repository for stale locks     controller=restic-repository error="error running command=restic unlock --repo=s3:https://object-eu-se-1a.binero.cloud/velero_backup/restic/hanzala-12 --password-file=/tmp/velero-restic-credentials-hanzala-12947945837, stdout=, stderr=: exec: \"restic\": executable file not found in $PATH" error.file="/github.com/vmware-tanzu/velero/pkg/restic/repository_manager.go:286" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*repositoryManager).exec" logSource="pkg/controller/restic_repository_controller.go:144" name=hanzala-12-default-xtbsx namespace=openshift-velero
INFO[0021] Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available  controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
INFO[0021] No backup locations were ready to be verified  controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
INFO[0021] Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available  controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"

@patrickheinzelmann
Copy link

I've installed Velero for the first time and my logs are also filled with the following ouputs.

time="2021-03-08T21:32:27Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2021-03-08T21:32:28Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2021-03-08T21:32:28Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"
time="2021-03-08T21:33:28Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2021-03-08T21:33:28Z" level=info msg="Checking for existing backup locations ready to be verified; there needs to be at least 1 backup location available" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:58"
time="2021-03-08T21:33:28Z" level=info msg="No backup locations were ready to be verified" controller=backupstoragelocation logSource="pkg/controller/backupstoragelocation_controller.go:120"

Here is the info from velero backup-location get

NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE
default   azure      velero          Available   2021-03-08 22:36:29 +0100 CET   ReadWrite

@talha0324
Copy link

Hey @patrickheinzelmann , I figured that this log message doesn't prevent velero from creating a backup. I'm having this log but my backups are being created successfully.

@patrickheinzelmann
Copy link

My problem is that Velero is creating the the folder and the log file, but isn't creating the data of the backup.

It worked only once, all other attempts to backup a namespace failed.

velero

@talha0324
Copy link

As it pushed logs to the backupstorageLocation, so I'm sure this is not related to backupStorageLocation. I might be wrong.
You can look into these logs for some error

@patrickheinzelmann
Copy link

I looked into the error log of the backup creation and the Velero POD.

Name:         default-202103091015
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.17.13
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=17

Phase:  Failed (run `velero backup logs default-202103091015` for more information)

Errors:    0
Warnings:  1

Namespaces:
  Included:  default
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2021-03-09 11:16:13 +0100 CET
Completed:  2021-03-09 11:16:23 +0100 CET

Expiration:  2021-04-08 12:16:13 +0200 CEST

Total items to be backed up:  76
Items backed up:              76

Velero-Native Snapshots: <none included>

But I can't find any error message beside the backup location messages.

velero-default-202109031015.log

@patrickheinzelmann
Copy link

@talha0324 I agree that the problems seems not related to the backup location, but this is the only message I can see in the log file.

@talha0324
Copy link

as your backup says:
Phase: Failed (run velero backup logs default-202103091015 for more information)

@patrickheinzelmann
Copy link

patrickheinzelmann commented Mar 9, 2021

@talha0324 I know that the backup fails but I can't see any error when I execute "velero backup logs default-202103091015". I have attached the output in the file velero-default-202109031015.log.

@talha0324
Copy link

I did went through that file and I'm not sure about this.

@dsu-igeek
Copy link

It looks like you're using the Azure plug-in. There was an issue where the Azure plug-in needed more memory than we were giving to the Velero containers by default. That was corrected in 1.5.3 and the amount of memory was increased to 512M but I don't see anything in the Helm charts that sets the amount of memory.

There is some info in the PR about the error messages you would see. vmware-tanzu/velero#3234 You will see the messages in the Velero pod's log. They happen after the backup has finished but while uploading the data to the blob store, so they are not getting tracked properly.

Check the amount of memory on your Velero pod with kubectl describe. It should be set to 512Mi for use with the Azure plug-in.

@patrickheinzelmann
Copy link

@dsu-igeek The memory is limited to 256Mi.

    Limits:
      cpu:     1
      memory:  256Mi
    Requests:
      cpu:     500m
      memory:  128Mi

I followed the installation guide for the Azure Plugin. https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure#setup. However the increasing of the memory wasn't mentioned.

@dsu-igeek
Copy link

It's fixed in the main Velero install in rev 1.5.3. The Helm chart is not setting the resource limit properly. I added an issue for the Helm chart - #231

@jenting
Copy link
Collaborator

jenting commented Mar 10, 2021

@patrickheinzelmann, is your issue been addressed after increase the memory limit?

@patrickheinzelmann
Copy link

@jenting Yes, after increasing the memory the backup data will be uploaded to the Azure Storage.

ndegory pushed a commit to ndegory/helm-charts that referenced this issue Jul 12, 2021
* Resync rules and dashboards

Signed-off-by: Jesse Bye <jbye@riskalyze.com>

* Bump chart version

Signed-off-by: Jesse Bye <jbye@riskalyze.com>

* resync again

Signed-off-by: Jesse Bye <jbye@riskalyze.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants