We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos, Prometheus and Golang version used:
Thanos update from v0.29.0 to v0.30.0-rc.0
redis version redis:6.2.6
Object Storage Provider: S3
What happened:
Thanos store gateway CrashLoopBackOff when update from v0.29.0 to v0.30.0-rc.0.
$ kubectl get po -n thanos-prod NAME READY STATUS RESTARTS AGE redis-master-0 1/1 Running 0 2d16h thanos-prod-bucketweb-7754c688d5-zd89f 1/1 Running 0 2d18h thanos-prod-compactor-67964d97f-prbwt 1/1 Running 0 2d18h thanos-prod-query-7cf87f67d5-2t2cn 1/1 Running 0 42h thanos-prod-query-7cf87f67d5-92tnh 1/1 Running 0 42h thanos-prod-query-7cf87f67d5-t5nq4 1/1 Running 0 42h thanos-prod-query-frontend-69d9796878-bwvd8 1/1 Running 0 2d18h thanos-prod-receive-0 1/1 Running 0 2d18h thanos-prod-receive-1 1/1 Running 0 2d17h thanos-prod-receive-2 1/1 Running 0 2d17h thanos-prod-receive-distributor-7ff99cdc64-654f9 1/1 Running 0 2d17h thanos-prod-receive-distributor-7ff99cdc64-fsq6l 1/1 Running 0 2d17h thanos-prod-receive-distributor-7ff99cdc64-rbzzz 1/1 Running 0 2d18h thanos-prod-ruler-0 2/2 Running 0 2d17h thanos-prod-ruler-1 2/2 Running 0 2d18h thanos-prod-storegateway-0 1/1 Running 0 43h thanos-prod-storegateway-1 0/1 CrashLoopBackOff 502 42h
error log:
$ kubectl logs thanos-prod-storegateway-1 -n thanos-prod level=info ts=2022-12-29T02:37:44.296395835Z caller=factory.go:52 msg="loading bucket configuration" level=info ts=2022-12-29T02:37:44.296812118Z caller=caching_bucket_factory.go:76 msg="loading caching bucket configuration" level=info ts=2022-12-29T02:37:44.301647033Z caller=redis.go:48 msg="created redis cache" level=info ts=2022-12-29T02:37:44.301824396Z caller=factory.go:35 msg="loading index cache configuration" panic: duplicate metrics collector registration attempted goroutine 1 [running]: github.com/prometheus/client_golang/prometheus.(*wrappingRegisterer).MustRegister(0xc00065e6c0, {0xc00021c950?, 0x1, 0x0?}) /go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/wrap.go:106 +0x151 github.com/prometheus/client_golang/prometheus/promauto.Factory.NewGauge({{0x2beb8d0?, 0xc00065e6c0?}}, {{0x0, 0x0}, {0x0, 0x0}, {0x26a5ded, 0x10}, {0x26e5307, 0x25}, ...}) /go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promauto/auto.go:297 +0xfd github.com/thanos-io/thanos/pkg/gate.New({0x2beb8d0, 0xc00065e6c0}, 0x64) /app/pkg/gate/gate.go:86 +0x7e github.com/thanos-io/thanos/pkg/cacheutil.NewRedisClientWithConfig({0x2bd9640?, _}, {_, _}, {{0xc000055920, 0x2f}, {0x0, 0x0}, {0xc000449b10, 0x10}, ...}, ...) /app/pkg/cacheutil/redis_client.go:217 +0x33f github.com/thanos-io/thanos/pkg/cacheutil.NewRedisClient({0x2bd9640, 0xc0002473b0}, {0x269a989, 0xb}, {0xc0001ac600?, 0x5f3f01?, 0xc0001ac200?}, {0x2beb8a0, 0xc000247e50}) /app/pkg/cacheutil/redis_client.go:167 +0x191 github.com/thanos-io/thanos/pkg/store/cache.NewIndexCache({0x2bd9640, 0xc0002473b0}, {0xc00057a000, 0x172, 0x180}, {0x2beb8a0, 0xc000247e50}) /app/pkg/store/cache/factory.go:58 +0x229 main.runStore(_, {_, _}, _, {_, _}, {_, _, _}, {0xc0004840a0, ...}, ...) /app/cmd/thanos/store.go:304 +0x945 main.registerStore.func1(0x237eb40?, {0x2bd9640, 0xc0002473b0}, 0x6?, {0x2beb7b0, 0x417d8a0}, 0x414d2e0?, 0x0) /app/cmd/thanos/store.go:210 +0x2ae main.main() /app/cmd/thanos/main.go:133 +0x1235
What you expected to happen:
thanos store run correctly
relevant yaml:
thanos store
apiVersion: apps/v1 kind: StatefulSet metadata: name: thanos-prod-storegateway namespace: "thanos-prod" labels: app.kubernetes.io/name: thanos app.kubernetes.io/instance: thanos-prod app.kubernetes.io/component: storegateway spec: replicas: 2 podManagementPolicy: OrderedReady serviceName: thanos-prod-storegateway-headless updateStrategy: type: RollingUpdate selector: matchLabels: app.kubernetes.io/name: thanos app.kubernetes.io/instance: thanos-prod app.kubernetes.io/component: storegateway template: metadata: labels: app.kubernetes.io/name: thanos app.kubernetes.io/instance: thanos-prod app.kubernetes.io/component: storegateway spec: hostAliases: - ip: "10.12.32.100" hostnames: - "s3-qos.iot-st-armtest.qiniu-solutions.com" serviceAccount: thanos-prod-storegateway affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/name: thanos app.kubernetes.io/instance: thanos-prod app.kubernetes.io/component: storegateway namespaces: - "thanos-prod" topologyKey: kubernetes.io/hostname weight: 1 securityContext: fsGroup: 1001 containers: - name: storegateway image: thanosio/thanos:v0.30.0-rc.0 imagePullPolicy: "IfNotPresent" securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: false runAsNonRoot: true runAsUser: 1001 args: - store - --log.level=info - --log.format=logfmt - --grpc-address=0.0.0.0:10901 - --http-address=0.0.0.0:10902 - --data-dir=/data - --objstore.config-file=/conf/objstore.yml - --index-cache.config-file=/cache_conf/index-cache.yml - --store.caching-bucket.config-file=/cache_conf/bucket-cache.yml ports: - name: http containerPort: 10902 protocol: TCP - name: grpc containerPort: 10901 protocol: TCP env: - name: NAME valueFrom: fieldRef: fieldPath: metadata.name livenessProbe: failureThreshold: 6 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 30 httpGet: path: /-/healthy port: http readinessProbe: failureThreshold: 6 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 30 httpGet: path: /-/ready port: http resources: limits: {} requests: {} volumeMounts: - name: objstore-config mountPath: /conf - name: cache-config mountPath: /cache_conf - name: data mountPath: /data volumes: - name: objstore-config configMap: name: thanos-prod-objstore-configmap - name: cache-config configMap: name: thanos-prod-storegateway-cache-configmap - name: data emptyDir: {}
object store conf
apiVersion: v1 kind: ConfigMap metadata: name: thanos-prod-objstore-configmap namespace: thanos-prod labels: app.kubernetes.io/name: thanos app.kubernetes.io/instance: thanos-prod data: objstore.yml: |- type: s3 config: bucket: * endpoint: * access_key: * secret_key: * insecure: true list_objects_version: "v1"
cache conf
apiVersion: v1 kind: ConfigMap metadata: name: thanos-prod-storegateway-cache-configmap namespace: thanos-prod labels: app.kubernetes.io/name: thanos app.kubernetes.io/instance: thanos-prod data: index-cache.yml: |- type: REDIS config: addr: redis-master.thanos-prod.svc.cluster.local:6379 password: * db: 0 dial_timeout: 5s read_timeout: 3s write_timeout: 3s pool_size: 100 min_idle_conns: 10 idle_timeout: 5m0s max_conn_age: 0s max_get_multi_concurrency: 100 get_multi_batch_size: 100 max_set_multi_concurrency: 100 set_multi_batch_size: 100 bucket-cache.yml: |- type: REDIS config: addr: redis-master.thanos-prod.svc.cluster.local:6379 password: * db: 1 dial_timeout: 5s read_timeout: 3s write_timeout: 3s pool_size: 100 min_idle_conns: 10 idle_timeout: 5m0s max_conn_age: 0s max_get_multi_concurrency: 100 get_multi_batch_size: 100 max_set_multi_concurrency: 100 set_multi_batch_size: 100 chunk_subrange_size: 16000 max_chunks_get_range_requests: 3 chunk_object_attrs_ttl: 24h chunk_subrange_ttl: 24h blocks_iter_ttl: 5m metafile_exists_ttl: 2h metafile_doesnt_exist_ttl: 15m metafile_content_ttl: 24h metafile_max_size: 1MiB
remark:
I test all images in https://hub.docker.com/r/thanosio/thanos/tags, and found this issue occur in version thanos:main-2022-12-20-e85bc1f and after. Related to this commit: e85bc1f. @GiedriusS @bwplotka
The text was updated successfully, but these errors were encountered:
The bug seems that we registered the same metrics twice when creating the redis cache client. To avoid duplicate registration we can wrap a constant label on the metrics registry, similar as what the memcached client is doing https://github.com/thanos-io/thanos/blob/main/pkg/cacheutil/memcached_client.go#L248.
Sorry, something went wrong.
Hello, may I take this issue please?
@kama910 It is yours!
Let's not forget to include this fix to the v0.30.0 release.
kama910
Successfully merging a pull request may close this issue.
Thanos, Prometheus and Golang version used:
Thanos update from v0.29.0 to v0.30.0-rc.0
redis version redis:6.2.6
Object Storage Provider: S3
What happened:
Thanos store gateway CrashLoopBackOff when update from v0.29.0 to v0.30.0-rc.0.
error log:
What you expected to happen:
thanos store run correctly
relevant yaml:
thanos store
object store conf
cache conf
remark:
I test all images in https://hub.docker.com/r/thanosio/thanos/tags, and found this issue occur in version thanos:main-2022-12-20-e85bc1f and after. Related to this commit: e85bc1f. @GiedriusS @bwplotka
The text was updated successfully, but these errors were encountered: