-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stash does not work with Flux #1334
Comments
OK after some investigation, I found that the stash deployment does not exist after the latest helm upgrade. Then I discovered, that the stash helm version |
@Legion2 In version You can learn more about the changes here: https://blog.byte.builders/post/stash-v2021.03.17/ Please, follow the setup guide from here: https://stash.run/docs/v2021.03.17/setup/ |
Ok I see, Is there any easy way to disable the webhooks registered by stash, because I can now not deploy to my cluster, because kubernetes tries to call the validation webhook but the stash deployment does not exit. |
Can you show the out put of the following commands?
|
I used the |
Can you describe the respective |
Name: mongodb-backup
Namespace: app
Labels: kustomize.toolkit.fluxcd.io/checksum=db8a28a33d3fa7350b82a59d5a69e42d6a6cdddf
kustomize.toolkit.fluxcd.io/name=flux-system
kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations: <none>
API Version: stash.appscode.com/v1beta1
Kind: BackupConfiguration
Metadata:
Creation Timestamp: 2021-03-25T22:09:11Z
Finalizers:
stash.appscode.com
Generation: 1
Managed Fields:
API Version: stash.appscode.com/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
f:spec:
f:runtimeSettings:
f:task:
.:
f:name:
f:tempDir:
f:status:
.:
f:conditions:
f:observedGeneration:
Manager: stash
Operation: Update
Time: 2021-03-25T22:09:11Z
API Version: stash.appscode.com/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:labels:
.:
f:kustomize.toolkit.fluxcd.io/checksum:
f:kustomize.toolkit.fluxcd.io/name:
f:kustomize.toolkit.fluxcd.io/namespace:
f:spec:
.:
f:driver:
f:hooks:
.:
f:postBackup:
.:
f:containerName:
f:exec:
.:
f:command:
f:preBackup:
.:
f:containerName:
f:exec:
.:
f:command:
f:repository:
.:
f:name:
f:retentionPolicy:
.:
f:keepDaily:
f:keepLast:
f:name:
f:prune:
f:schedule:
f:target:
.:
f:paths:
f:ref:
.:
f:apiVersion:
f:kind:
f:name:
f:volumeMounts:
Manager: kustomize-controller
Operation: Update
Time: 2021-03-28T15:56:30Z
Resource Version: 109847583
Self Link: /apis/stash.appscode.com/v1beta1/namespaces/voize-ml-controller/backupconfigurations/mongodb-backup
UID: 5017f3e8-ceea-4877-8e2d-9d865c14638b
Spec:
Driver: Restic
Hooks:
Post Backup:
Container Name: mongodb
Exec:
Command:
/bin/sh
-c
rm /data/db/backup/mongodb.tar.gz
Pre Backup:
Container Name: mongodb
Exec:
Command:
/bin/sh
-c
mkdir -p /data/db/backup && mongodump -u="`cat $MONGO_INITDB_ROOT_USERNAME_FILE`" -p="`cat $MONGO_INITDB_ROOT_PASSWORD_FILE`" --authenticationDatabase=admin --gzip --archive=/data/db/backup/mongodb.tar.gz
Repository:
Name: mongodb-s3-backup-repo
Retention Policy:
Keep Daily: 7
Keep Last: 5
Name: keep-last-5
Prune: true
Schedule: */60 * * * *
Target:
Paths:
/data/db/backup
Ref:
API Version: apps/v1
Kind: StatefulSet
Name: mongodb
Volume Mounts:
Mount Path: /data/db
Name: mongodb-data
Status:
Conditions:
Last Transition Time: 2021-03-25T22:09:11Z
Message: Repository voize-ml-controller/mongodb-s3-backup-repo exist.
Reason: RepositoryAvailable
Status: True
Type: RepositoryFound
Last Transition Time: 2021-03-25T22:09:11Z
Message: Backend Secret voize-ml-controller/mongodb-s3-backup-secret-c65b5tk9bf exist.
Reason: BackendSecretAvailable
Status: True
Type: BackendSecretFound
Last Transition Time: 2021-03-25T22:09:11Z
Message: Backup target apps/v1 statefulset/mongodb found.
Reason: TargetAvailable
Status: True
Type: BackupTargetFound
Last Transition Time: 2021-03-25T22:09:11Z
Message: Successfully injected stash sidecar into apps/v1 statefulset/mongodb
Reason: SidecarInjectionSucceeded
Status: True
Type: StashSidecarInjected
Last Transition Time: 2021-03-25T22:09:11Z
Message: Successfully created backup triggering CronJob.
Reason: CronJobCreationSucceeded
Status: True
Type: CronJobCreated
Observed Generation: 1
Events: |
Interesting. After you re-install the operator. It should re-sync everything. I am wondering why it is not working for you. Can you share the log from the Stash operator pod? |
Also all the other BackupConfigurations can not push their metrics, because with the helm chart upgrade the name of the service changed (I use the community edition) and the sidecars where not updated. - lastTransitionTime: "2021-03-28T17:00:16Z"
message: 'Failed to push repository metrics. Reason: Post "http://stash.stash.svc:56789/metrics/job/backupconfiguration-monitoring-grafana-backup":
dial tcp: lookup stash.stash.svc on 10.100.0.10:53: no such host'
reason: FailedToPushRepositoryMetrics
status: "False"
type: RepositoryMetricsPushed I removed namespace and name from the logs:
|
How did you upgrade the operator? |
I use helm controller https://toolkit.fluxcd.io/components/helm/controller/ it applies the helm charts to the cluster. After I uninstalled the stash helm chart manually, the helm controller automatically reinstalled it. |
We have made some major changes in installation process. So, simple helm upgrade won't work. You should follow this upgrade guide: https://stash.run/docs/v2021.03.17/setup/upgrade/ |
You can just follow uninstallation guide for
We are really sorry that this is happening again and again. Some issues are happening because of how Helm handles CRDs. |
The guide for |
Hmm. I see the issue. I don't think we can get rid of the finalizer. We use it for various reasons. We are probably going to provide a |
I installed stash again (while the crds were marked for deletion) and then stash cleanup the resources and removed the finilizers, so the crds could be garbage collected. After crd deletion I uninstalled the helm chart. After reinstallation only 3 of 5 Backups were successful. The other two were skipped, but all backups should run every hour. BackupSession:
But I can't find the BackupSession CronJob:
Log of sidecar:
What does that mean? |
Are they backup of same target or different targets? It seems the other backup has got stuck in |
They are backups in different namespaces and both backup a mongodb (there is a third namespace where the same BackupConfiguration successfully created a backup of a mongodb). And there is no other backup in the same namespace. So I don't know where this backup in |
One hour later 4 of 5 backups are successful, the 5th was skipped again. I think there is some kind of race condition where multiple BackupSessions are created and then deleted again. |
@hossainemruz I found the problem, why the stash backup service accounts are deleted in my cluster. It is because stash copies the labels of the Backup Configuration as is to the service account, including the labels generated by flux kustomize controller for the backup configuration. labels:
kustomize.toolkit.fluxcd.io/checksum=ce1d06af37d4ce7096a58a2d48656d02b5721332
kustomize.toolkit.fluxcd.io/name=flux-system
kustomize.toolkit.fluxcd.io/namespace=flux-system these labels indicate that a resource is managed by flux kustomize controller, but the service account is/should not be managed by kustomize controller. Therefore the kustomize controller will delete the service account, because it is not part of any kustomization managed by the controller. Stash should not copy the labels of the Backup Configuration to the Service Account. |
I looked through the code and found that labels of owner objects are reused many times for resources created by the reconciliation logic. Even for the global resources such as stash/pkg/controller/backup_configuration.go Line 375 in 35ec31c
Lines 36 to 38 in 35ec31c
This random coping of labels causes trouble in cluster where these labels are used for management purposes. |
I changed the title and created an issue in the flux repo fluxcd/kustomize-controller#315. However the issue must be fixed in Stash. The problem is a race condition. Flux uses a checksum label on all resources to manage garbage collection. Flux updates the checksum on the all resources, including the To fix this, Stash must not copy the labels of other resources. Also what I found weird, is that Stash does not recreate the SA after they were deleted. This means some part of the reconciliation logic of the |
Yes thats right. We should only pass the Stash labels. However, what will happen when user want pass some custom label to the resources created by Stash?
That's weird. We have have to investigate it. This is not the desired behavior. |
I think it should be made explicit which custom labels stash use when creating resources. Maybe |
Flux 0.15.0 moved the checksum into an annotation, so it is not copied by stash anymore. |
I manually delete the CronJob of a BackupConfiguration , because for some reason the ServiceAccount for the Job did not exist and failed to start pods. It is best practice for a Kubernetes Operators, to periodically reconcile the actual state in the cluster to remove inconsistencies, but the CronJob was not recreated after 30 minutes by the Stash Operator and also the status of the BackupConfiguration was not updated and indicated that the CronJob exists, but it did not.
The text was updated successfully, but these errors were encountered: