Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clickhouse-backup server --watch stops when initial connection fails #857

Closed
itssimon opened this issue Mar 2, 2024 · 2 comments · Fixed by #843
Closed

clickhouse-backup server --watch stops when initial connection fails #857

itssimon opened this issue Mar 2, 2024 · 2 comments · Fixed by #843
Assignees
Milestone

Comments

@itssimon
Copy link

itssimon commented Mar 2, 2024

I'm running clickhouse-backup as a sidecar container with clickhouse-server on Kubernetes (using Altinity ClickHouse Operator). When the pod with both containers starts, about half of the times the clickhouse-server is not ready to accept connections from clickhouse-backup straight away, and it appears that freezes clickhouse-backup.

Often, when I restart the pod a couple of times it works eventually. Must be a timing issue.

See log output of the clickhouse-backup container below for when it fails. No backups are created and there is no further log output, even after hours.

2024/03/02 02:00:57.407705  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/03/02 02:00:57.411904 error clickhouse connection ping: tcp://localhost:9000 return error: dial tcp [::1]:9000: connect: connection refused logger=clickhouse
2024/03/02 02:00:57.411951 error dial tcp [::1]:9000: connect: connection refused logger=server.Run
2024/03/02 02:01:02.437006  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/03/02 02:01:02.463061  info clickhouse connection open: tcp://localhost:9000 logger=clickhouse
2024/03/02 02:01:02.463366  info Create integration tables logger=server
2024/03/02 02:01:02.463585  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/03/02 02:01:02.520820 error clickhouse connection ping: tcp://localhost:9000 return error: read: EOF logger=clickhouse
2024/03/02 02:01:02.520878 error can't connect to clickhouse: read: EOF logger=server.Run
2024/03/02 02:01:02.552043  info Starting API server on 0.0.0.0:7171 logger=server.Run
2024/03/02 02:01:02.588894  info Starting API Server in watch mode logger=server
2024/03/02 02:01:02.589005  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/03/02 02:01:02.589438  info Update backup metrics start (onlyLocal=false) logger=server
2024/03/02 02:01:02.590896  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/03/02 02:01:02.593702 error clickhouse connection ping: tcp://localhost:9000 return error: dial tcp [::1]:9000: connect: connection refused logger=clickhouse
2024/03/02 02:01:02.594849 error UpdateBackupMetrics return error: dial tcp [::1]:9000: connect: connection refused logger=server.Run
2024/03/02 02:01:02.595358 error clickhouse connection ping: tcp://localhost:9000 return error: read: EOF logger=clickhouse
2024/03/02 02:01:02.595640 error ResumeOperationsAfterRestart return error: read: EOF logger=server.Run
2024/03/02 02:01:02.596038  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/03/02 02:01:02.601335 error clickhouse connection ping: tcp://localhost:9000 return error: dial tcp [::1]:9000: connect: connection refused logger=clickhouse

Below is my pod template:

        podTemplates:
          - name: default
            spec:
              containers:
                - name: clickhouse-server
                  image: clickhouse/clickhouse-server:23.9
                  resources:
                    requests:
                      cpu: 500m
                      memory: 1Gi
                  volumeMounts:
                    - name: bootstrap-scripts
                      mountPath: /docker-entrypoint-initdb.d
                  ports:
                    - name: metrics
                      containerPort: 9363
                - name: clickhouse-backup
                  image: altinity/clickhouse-backup:2.4.33
                  command: ["/bin/clickhouse-backup", "server", "--watch"]
                  env:
                    - name: ALLOW_EMPTY_BACKUPS
                      value: "true"
                    - name: API_LISTEN
                      value: "0.0.0.0:7171"
                    - name: API_CREATE_INTEGRATION_TABLES
                      value: "true"
                    - name: BACKUPS_TO_KEEP_LOCAL
                      value: "3"
                    - name: BACKUPS_TO_KEEP_REMOTE
                      value: "336"
                    - name: REMOTE_STORAGE
                      value: s3
                    - name: S3_ENDPOINT
                      value: https://sfo3.digitaloceanspaces.com
                    - name: S3_BUCKET
                      value: xxx
                    - name: S3_PATH
                      value: cluster-{cluster}/shard-{shard}
                    - name: S3_ACCESS_KEY
                      value: xxx
                    - name: S3_SECRET_KEY
                      value: {{ "ref+sops://secrets.yaml#/clickhouseBackupSpacesSecretKey" | fetchSecretValue | quote }}
                    - name: S3_FORCE_PATH_STYLE
                      value: "true"
                    - name: WATCH_INTERVAL
                      value: "1h"
                    - name: FULL_INTERVAL
                      value: "24h"
                    - name: WATCH_BACKUP_NAME_TEMPLATE
                      value: "{time:20060102150405}-{type}"
                  ports:
                    - name: backup-api
                      containerPort: 7171
              volumes:
                - name: bootstrap-scripts
                  configMap:
                    name: bootstrap-scripts
@Slach Slach self-assigned this Mar 2, 2024
@Slach Slach added this to the 2.5.0 milestone Mar 2, 2024
@Slach
Copy link
Collaborator

Slach commented Mar 2, 2024

thanks for reportings will fix in next release

@Slach
Copy link
Collaborator

Slach commented Mar 2, 2024

As a workaround I would to like propose using CronJob instead of watch, look examples
https://github.com/Altinity/clickhouse-backup/blob/master/Examples.md#how-to-use-clickhouse-backup-in-kubernetes

@Slach Slach mentioned this issue Apr 5, 2024
@Slach Slach closed this as completed in ab3dd36 Apr 8, 2024
@Slach Slach closed this as completed in #843 Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants