Thanos component of receive got recovered from panic. #6047

JayChanggithub · 2023-01-16T15:55:24Z

Hi Thanos developers

i got error message from pod of receive component. We adopted receive component to implement multi kubernetes cluster promethues metrics query. Totally we receive about 30 cluster promethues metrics to thanos receive so far. At first we got all of replicas(3 replicas) pod of receive are restart several times that i guess maybe reached the resources limitation so i scale up the limit of request cpu and memory also observed a while are back to fine(without any restarts times situation). But after few hours. One of receive pod got following error. Have anyone meet same issue. Thanks.

Promethues version: v2.40.5
Thanos version: thanosio/thanos:v0.30.1

Receive compoment arguments:

- args:
   - receive
   - --grpc-address=0.0.0.0:10901
   - --http-address=0.0.0.0:10902
   - --remote-write.address=0.0.0.0:19291
   - --receive.replication-factor=1
   - --objstore.config-file=/etc/thanos/objectstorage.yaml
   - --tsdb.path=/var/thanos/receive
    - --tsdb.retention=12h
    - --label=receive_replica="$(NAME)"
    - --label=receive="true"
    - --receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.json
    - --receive.local-endpoint=$(NAME).thanos-receive.thanos.svc.cluster.local:10901

$ k get po -n thanos                                                                                                                                                                                               [23:37:10]
NAME                            READY   STATUS    RESTARTS       AGE
thanos-compact-0                1/1     Running   0              3h42m
thanos-query-5b7854649d-56xbd   1/1     Running   0              11h
thanos-query-5b7854649d-rkhtv   1/1     Running   0              12h
thanos-query-5b7854649d-wdm78   1/1     Running   0              3h15m
thanos-receive-0                1/1     Running   0              3h14m
thanos-receive-1                1/1     Running   17 (21m ago)   8h
thanos-receive-2                1/1     Running   0              3h33m
thanos-rule-0                   1/1     Running   0              3h32m
thanos-rule-1                   1/1     Running   0              12h
thanos-store-0                  1/1     Running   0              3h31m
thanos-store-1                  1/1     Running   0              11h

# thanos receive resources requests

resources:
   limits:
       cpu: "8"
       memory: 48Gi
    requests:
       cpu: "2"
        memory: 12Gi

$ k top pod -n thanos                                                                                                                                                                                              [23:56:49]
NAME                            CPU(cores)   MEMORY(bytes)   
thanos-compact-0                44m          86Mi            
thanos-query-5b7854649d-56xbd   3m           56Mi            
thanos-query-5b7854649d-rkhtv   3m           56Mi            
thanos-query-5b7854649d-wdm78   2m           49Mi            
thanos-receive-0                1227m        21168Mi         
thanos-receive-1                7715m        41556Mi         
thanos-receive-2                1543m        20563Mi         
thanos-rule-0                   2m           46Mi            
thanos-rule-1                   2m           59Mi            
thanos-store-0                  2m           314Mi           
thanos-store-1                  3m           379Mi

k logs -f thanos-receive-1  -n thanos

level=error ts=2023-01-16T15:15:44.615000727Z caller=grpc.go:70 component=receive service=gRPC/server component=receive msg="recovered from panic" panic="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 762 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/thanos-io/thanos/pkg/server/grpc.New.func1({0x22909c0?, 0x4109510})\n\t/app/pkg/server/grpc/grpc.go:70 +0xda\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.WithRecoveryHandler.func1.1({0xc0008a2101?, 0xc0000a0dc0?}, {0x22909c0?, 0x4109510?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/options.go:33 +0x2d\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.recoverFrom({0x2bf4288?, 0xc00343c1e0?}, {0x22909c0?, 0x4109510?}, 0x0?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:53 +0x36\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.UnaryServerInterceptor.func1.1()\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:27 +0x68\npanic({0x22909c0, 0x4109510})\n\t/usr/local/go/src/runtime/panic.go:884 +0x212\ngithub.com/thanos-io/thanos/pkg/store.(*TSDBStore).LabelSet(0xc0087e0210?)\n\t/app/pkg/store/tsdb.go:104 +0x18\ngithub.com/thanos-io/thanos/pkg/receive.(*localClient).LabelSets(0x0?)\n\t/app/pkg/receive/multitsdb.go:109 +0x2f\ngithub.com/thanos-io/thanos/pkg/store.(*ProxyStore).LabelSet(0xc0001df440)\n\t/app/pkg/store/proxy.go:196 +0x112\nmain.runReceive.func3()\n\t/app/cmd/thanos/receive.go:323 +0x1d\ngithub.com/thanos-io/thanos/pkg/info.(*InfoServer).Info(0xc00028f9a0, {0x26b4c41?, 0x2184f60?}, 0xc0087e01f0?)\n\t/app/pkg/info/info.go:174 +0x26\ngithub.com/thanos-io/thanos/pkg/info/infopb._Info_Info_Handler.func1({0x2bf4288, 0xc00343c3c0}, {0x2579f20?, 0x417d880})\n\t/app/pkg/info/infopb/rpc.pb.go:460 +0x78\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf4288, 0xc00343c3c0}, {0x2579f20, 0x417d880}, 0x14ee6885b?, 0xc003438138)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf4288?, 0xc00343c3c0?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf4288, 0xc00343c360}, {0x2579f20, 0x417d880}, 0x22f5640?, 0xc004524120)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/thanos-io/thanos/pkg/tracing.UnaryServerInterceptor.func1({0x2bf4288?, 0xc00343c2a0?}, {0x2579f20, 0x417d880}, 0x0?, 0xffffffffffffffff?)\n\t/app/pkg/tracing/grpc.go:30 +0x88\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf4288?, 0xc00343c2a0?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf4288, 0xc00343c1e0}, {0x2579f20, 0x417d880}, 0x0?, 0xc004524140)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf4288?, 0xc00343c1e0?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1({0x2bf4288, 0xc00343c1e0}, {0x2579f20, 0x417d880}, 0x7fbd66a1a5c0?, 0xc004524160)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107 +0x87\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf4288?, 0xc00343c1e0?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.UnaryServerInterceptor.func1({0x2bf4288?, 0xc00343c1e0?}, {0x2579f20?, 0x417d880?}, 0x7fbb99f503e8?, 0xc004524100?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:31 +0xa7\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf4288?, 0xc00343c1e0?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1({0x2bf4288, 0xc00343c1e0}, {0x2579f20, 0x417d880}, 0x228f400?, 0x26056e0?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:36 +0xbe\ngithub.com/thanos-io/thanos/pkg/info/infopb._Info_Info_Handler({0x223fee0?, 0xc00028f9a0}, {0x2bf4288, 0xc00343c1e0}, 0x26056e0?, 0xc0008308a0)\n\t/app/pkg/info/infopb/rpc.pb.go:462 +0x126\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc000832000, {0x2c06ba0, 0xc0021f6b60}, 0xc000967e60, 0xc000830c30, 0x410cb60, 0x0)\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1282 +0xccf\ngoogle.golang.org/grpc.(*Server).handleStream(0xc000832000, {0x2c06ba0, 0xc0021f6b60}, 0xc000967e60, 0x0)\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1619 +0xa2f\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2()\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:921 +0x98\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:919 +0x28a\n"

The text was updated successfully, but these errors were encountered:

lukasCoppens · 2023-01-17T14:14:49Z

We are seeing the same issue on thanos v0.29.0. Thanos receive seems to restart out of nowhere. When the TSBD is loading, we also get the same stacktrace.

fpetkovski · 2023-01-17T14:21:49Z

Looks like the panic is happening here: https://github.com/thanos-io/thanos/blob/main/pkg/store/tsdb.go#L104. I wonder if the TSDBStore can somehow be nil? 🤔

JayChanggithub · 2023-01-17T16:43:45Z

Hi @fpetkovski
Have anything possible to confirm that the TSDBstore? I think the pods controlled by statesfulset also mount with PVC for replicas(3) receiver. I dont't know why another pods of receiver work fine. Only thanos-receive-1 hit TSDB error?

matej-g · 2023-01-18T08:52:07Z

As @fpetkovski said, according to stack trace it seems like TSDBStore is nil in this case. It seems to be triggered via the info API, so I assume somehow the TSDBStore is not yet initialised when we're calling info API, but as to how exactly I need to look closer.

JayChanggithub · 2023-01-18T15:33:10Z

@matej-g Thanks for your take care.

fpetkovski · 2023-01-24T08:41:05Z

I've raised #6067 which believe should fix this.

JayChanggithub · 2023-02-02T05:37:26Z

Hi @fpetkovski
In additional. I had been update the thanos receive version to v0.30.2 currently. It seems no longer show this issue. However i got another issue msg as this level=error ts=2023-02-02T05:03:02.845106192Z caller=handler.go:544 component=receive component=receive-handler tenant=default-tenant err="context deadline exceeded" msg="internal server error" and level=warn ts=2023-02-02T05:00:42.308073375Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=157 cased one of receive pod restart several times.

Promethues version: v2.40.5
Thanos version: thanosio/thanos:v0.30.1
collected about 20-30 remote promethues instance metrics

promethues external labels is the same in every instances

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
  external_labels:
    prometheus: cic-system/prometheus
    prometheus_replica: prometheus-prometheus-0

k get po -n thanos 

NAME                            READY   STATUS    RESTARTS        AGE
thanos-compact-0                1/1     Running   0               100m
thanos-query-7bd9bfb7cf-c5s2l   1/1     Running   0               102m
thanos-query-7bd9bfb7cf-svjlp   1/1     Running   0               97m
thanos-query-7bd9bfb7cf-z284k   1/1     Running   0               102m
thanos-receive-0                1/1     Running   0               90m
thanos-receive-1                0/1     Running   6 (4m51s ago)   96m
thanos-receive-2                1/1     Running   0               102m
thanos-rule-0                   1/1     Running   0               101m
thanos-rule-1                   1/1     Running   0               96m
thanos-store-0                  1/1     Running   0               99m
thanos-store-1                  1/1     Running   0               96m

apiVersion: v1
kind: ConfigMap
metadata:
  name: thanos-receive-hashrings
  namespace: thanos
data:
  thanos-receive-hashrings.json: |
    [
      {
        "hashring": "soft-tenants",
        "endpoints":
        [
            "thanos-receive-0.thanos-receive.thanos.svc.cluster.local:10901",
            "thanos-receive-1.thanos-receive.thanos.svc.cluster.local:10901",
            "thanos-receive-2.thanos-receive.thanos.svc.cluster.local:10901"
        ]
      }
    ]
---

fpetkovski · 2023-02-02T14:53:10Z

I don't think out of order samples should lead to restarts. Can you post the logs of:

kubectl describe pod thanos-receive-1
kubectl logs --previous thanos-receive-1

JayChanggithub · 2023-02-02T15:06:38Z

Hi @fpetkovski

i run command of kubectl logs --previous thanos-receive-1 -n thanos got the detailed as following.

level=error ts=2023-02-02T09:19:10.337875555Z caller=grpc.go:70 component=receive service=gRPC/server component=receive msg="recovered from panic" panic="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 127 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/thanos-io/thanos/pkg/server/grpc.New.func1({0x22909c0?, 0x4109510})\n\t/app/pkg/server/grpc/grpc.go:70 +0xda\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.WithRecoveryHandler.func1.1({0x0?, 0x0?}, {0x22909c0?, 0x4109510?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/options.go:33 +0x2d\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.recoverFrom({0x2bf41e8?, 0xc00108fe00?}, {0x22909c0?, 0x4109510?}, 0x0?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:53 +0x36\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.UnaryServerInterceptor.func1.1()\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:27 +0x68\npanic({0x22909c0, 0x4109510})\n\t/usr/local/go/src/runtime/panic.go:884 +0x212\ngithub.com/thanos-io/thanos/pkg/store.(*TSDBStore).LabelSet(0xc0000a7710?)\n\t/app/pkg/store/tsdb.go:104 +0x18\ngithub.com/thanos-io/thanos/pkg/receive.(*localClient).LabelSets(0xc000f8f0d0?)\n\t/app/pkg/receive/multitsdb.go:109 +0x2f\ngithub.com/thanos-io/thanos/pkg/store.(*ProxyStore).LabelSet(0xc00021e3f0)\n\t/app/pkg/store/proxy.go:196 +0x112\nmain.runReceive.func3()\n\t/app/cmd/thanos/receive.go:323 +0x1d\ngithub.com/thanos-io/thanos/pkg/info.(*InfoServer).Info(0xc000de04b0, {0x26b4c38?, 0x2184f60?}, 0xc0000a76c0?)\n\t/app/pkg/info/info.go:174 +0x26\ngithub.com/thanos-io/thanos/pkg/info/infopb._Info_Info_Handler.func1({0x2bf41e8, 0xc001116000}, {0x2579f20?, 0x417d880})\n\t/app/pkg/info/infopb/rpc.pb.go:460 +0x78\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf41e8, 0xc001116000}, {0x2579f20, 0x417d880}, 0x28a5259af?, 0xc001245b00)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc001116000?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf41e8, 0xc00108ff80}, {0x2579f20, 0x417d880}, 0x22f5640?, 0xc0017980c0)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/thanos-io/thanos/pkg/tracing.UnaryServerInterceptor.func1({0x2bf41e8?, 0xc00108fec0?}, {0x2579f20, 0x417d880}, 0x0?, 0xffffffffffffffff?)\n\t/app/pkg/tracing/grpc.go:30 +0x88\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc00108fec0?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf41e8, 0xc00108fe00}, {0x2579f20, 0x417d880}, 0x0?, 0xc0017980e0)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc00108fe00?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1({0x2bf41e8, 0xc00108fe00}, {0x2579f20, 0x417d880}, 0x7f7580807200?, 0xc001798100)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107 +0x87\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc00108fe00?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.UnaryServerInterceptor.func1({0x2bf41e8?, 0xc00108fe00?}, {0x2579f20?, 0x417d880?}, 0x7f758091f9c0?, 0xc0017980a0?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:31 +0xa7\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc00108fe00?}, {0x2579f20?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1({0x2bf41e8, 0xc00108fe00}, {0x2579f20, 0x417d880}, 0x228f400?, 0x26056e0?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:36 +0xbe\ngithub.com/thanos-io/thanos/pkg/info/infopb._Info_Info_Handler({0x223fee0?, 0xc000de04b0}, {0x2bf41e8, 0xc00108fe00}, 0x26056e0?, 0xc000dea8a0)\n\t/app/pkg/info/infopb/rpc.pb.go:462 +0x126\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc000df8000, {0x2c06b00, 0xc000542d00}, 0xc001be6000, 0xc000deac30, 0x410cb60, 0x0)\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1282 +0xccf\ngoogle.golang.org/grpc.(*Server).handleStream(0xc000df8000, {0x2c06b00, 0xc000542d00}, 0xc001be6000, 0x0)\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1619 +0xa2f\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2()\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:921 +0x98\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:919 +0x28a\n"
level=error ts=2023-02-02T09:19:10.339996444Z caller=grpc.go:70 component=receive service=gRPC/server component=receive msg="recovered from panic" panic="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 128 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/thanos-io/thanos/pkg/server/grpc.New.func1({0x22909c0?, 0x4109510})\n\t/app/pkg/server/grpc/grpc.go:70 +0xda\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.WithRecoveryHandler.func1.1({0x40d95f?, 0xc0000be960?}, {0x22909c0?, 0x4109510?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/options.go:33 +0x2d\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.recoverFrom({0x2bf41e8?, 0xc001116240?}, {0x22909c0?, 0x4109510?}, 0xc000f8ef40?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:53 +0x36\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.UnaryServerInterceptor.func1.1()\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:27 +0x68\npanic({0x22909c0, 0x4109510})\n\t/usr/local/go/src/runtime/panic.go:884 +0x212\ngithub.com/thanos-io/thanos/pkg/store.(*TSDBStore).TimeRange(0xc000f8f0e8?)\n\t/app/pkg/store/tsdb.go:117 +0x14\ngithub.com/thanos-io/thanos/pkg/receive.(*localClient).TimeRange(0x40?)\n\t/app/pkg/receive/multitsdb.go:113 +0x1b\ngithub.com/thanos-io/thanos/pkg/store.(*ProxyStore).Info(0xc00021e3f0, {0x232f1a0?, 0x2504820?}, 0x8000000000000000?)\n\t/app/pkg/store/proxy.go:152 +0x178\ngithub.com/thanos-io/thanos/pkg/store/storepb._Store_Info_Handler.func1({0x2bf41e8, 0xc001116420}, {0x257be60?, 0x417d880})\n\t/app/pkg/store/storepb/rpc.pb.go:1071 +0x78\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf41e8, 0xc001116420}, {0x257be60, 0x417d880}, 0x28a71fb4a?, 0xc001245c68)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc001116420?}, {0x257be60?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf41e8, 0xc0011163c0}, {0x257be60, 0x417d880}, 0x22f5640?, 0xc001798260)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/thanos-io/thanos/pkg/tracing.UnaryServerInterceptor.func1({0x2bf41e8?, 0xc001116300?}, {0x257be60, 0x417d880}, 0x0?, 0xffffffffffffffff?)\n\t/app/pkg/tracing/grpc.go:30 +0x88\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc001116300?}, {0x257be60?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2bf41e8, 0xc001116240}, {0x257be60, 0x417d880}, 0x0?, 0xc001798280)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/server.go:22 +0x21e\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc001116240?}, {0x257be60?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1({0x2bf41e8, 0xc001116240}, {0x257be60, 0x417d880}, 0x7f75808073b8?, 0xc0017982a0)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107 +0x87\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc001116240?}, {0x257be60?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.UnaryServerInterceptor.func1({0x2bf41e8?, 0xc001116240?}, {0x257be60?, 0x417d880?}, 0x7f758091f9c0?, 0xc001798240?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/recovery/interceptors.go:31 +0xa7\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1.1.1({0x2bf41e8?, 0xc001116240?}, {0x257be60?, 0x417d880?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:27 +0x3a\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2.ChainUnaryServer.func1({0x2bf41e8, 0xc001116240}, {0x257be60, 0x417d880}, 0x228f400?, 0x26056e0?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/chain.go:36 +0xbe\ngithub.com/thanos-io/thanos/pkg/store/storepb._Store_Info_Handler({0x2504820?, 0xc00021a3c0}, {0x2bf41e8, 0xc001116240}, 0x26056e0?, 0xc000dea8a0)\n\t/app/pkg/store/storepb/rpc.pb.go:1073 +0x126\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc000df8000, {0x2c06b00, 0xc000542d00}, 0xc001be6360, 0xc000deaa80, 0x411a120, 0x0)\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1282 +0xccf\ngoogle.golang.org/grpc.(*Server).handleStream(0xc000df8000, {0x2c06b00, 0xc000542d00}, 0xc001be6360, 0x0)\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1619 +0xa2f\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2()\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:921 +0x98\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:919 +0x28a\n"
level=error ts=2023-02-02T09:19:12.185489067Z caller=head.go:590 component=receive component=multi-tsdb tenant=default-tenant msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 39677950, last chunk: [1675318174037, 1675319824037], new: [1675315399046, 1675315399046]"
level=info ts=2023-02-02T09:19:12.18556617Z caller=head.go:825 component=receive component=multi-tsdb tenant=default-tenant msg="Deleting mmapped chunk files"
level=info ts=2023-02-02T09:19:12.18770156Z caller=head.go:831 component=receive component=multi-tsdb tenant=default-tenant msg="Deleting mmapped chunk files"
level=info ts=2023-02-02T09:19:12.187746562Z caller=head.go:834 component=receive component=multi-tsdb tenant=default-tenant msg="Deletion of corrupted mmap chunk files failed, discarding chunk files completely" err="cannot handle error: iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 39677950, last chunk: [1675318174037, 1675319824037], new: [1675315399046, 1675315399046]"
level=info ts=2023-02-02T09:19:12.309931207Z caller=head.go:606 component=receive component=multi-tsdb tenant=default-tenant msg="On-disk memory mappable chunks replay completed" duration=7.102967039s
level=info ts=2023-02-02T09:19:12.30998721Z caller=head.go:612 component=receive component=multi-tsdb tenant=default-tenant msg="Replaying WAL, this may take a while"
level=info ts=2023-02-02T09:22:01.66016654Z caller=head.go:648 component=receive component=multi-tsdb tenant=default-tenant msg="WAL checkpoint loaded"
level=info ts=2023-02-02T09:22:07.72185801Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3522 maxSegment=3590
level=info ts=2023-02-02T09:22:11.216360932Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3523 maxSegment=3590
level=info ts=2023-02-02T09:22:13.877105639Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3524 maxSegment=3590
level=info ts=2023-02-02T09:22:22.015376614Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3525 maxSegment=3590
level=info ts=2023-02-02T09:22:26.834855691Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3526 maxSegment=3590
level=info ts=2023-02-02T09:22:28.585117878Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3527 maxSegment=3590
level=info ts=2023-02-02T09:22:29.193702156Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3528 maxSegment=3590
level=info ts=2023-02-02T09:22:30.095454818Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3529 maxSegment=3590
level=info ts=2023-02-02T09:22:30.096456445Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3530 maxSegment=3590
level=info ts=2023-02-02T09:22:30.10455015Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3531 maxSegment=3590
level=info ts=2023-02-02T09:22:30.105468483Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3532 maxSegment=3590
level=info ts=2023-02-02T09:22:30.533545184Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3533 maxSegment=3590
level=info ts=2023-02-02T09:22:30.533937748Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3534 maxSegment=3590
level=info ts=2023-02-02T09:22:33.948233395Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3535 maxSegment=3590
level=info ts=2023-02-02T09:22:33.948643762Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3536 maxSegment=3590
level=info ts=2023-02-02T09:22:37.037965441Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3537 maxSegment=3590
level=info ts=2023-02-02T09:22:37.04769483Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3538 maxSegment=3590
level=info ts=2023-02-02T09:22:40.292334983Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3539 maxSegment=3590
level=info ts=2023-02-02T09:22:40.292763442Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3540 maxSegment=3590
level=info ts=2023-02-02T09:22:43.032166933Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3541 maxSegment=3590
level=info ts=2023-02-02T09:22:43.04132819Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3542 maxSegment=3590
level=info ts=2023-02-02T09:22:46.536321596Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3543 maxSegment=3590
level=info ts=2023-02-02T09:22:46.537366196Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3544 maxSegment=3590
level=info ts=2023-02-02T09:22:47.914108191Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3545 maxSegment=3590
level=info ts=2023-02-02T09:22:47.914509629Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3546 maxSegment=3590
level=info ts=2023-02-02T09:22:51.690457624Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3547 maxSegment=3590
level=info ts=2023-02-02T09:22:51.691488185Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3548 maxSegment=3590
level=info ts=2023-02-02T09:22:51.97368926Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3549 maxSegment=3590
level=info ts=2023-02-02T09:22:51.974045616Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3550 maxSegment=3590
level=info ts=2023-02-02T09:22:51.97994664Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3551 maxSegment=3590
level=info ts=2023-02-02T09:22:51.981090019Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3552 maxSegment=3590
level=warn ts=2023-02-02T09:23:36.107749025Z caller=head_wal.go:394 component=receive component=multi-tsdb tenant=default-tenant msg="Unknown series references" samples=11 exemplars=0 histograms=0 metadata=0
level=info ts=2023-02-02T09:23:36.107843038Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3553 maxSegment=3590
level=warn ts=2023-02-02T09:24:12.401880958Z caller=head_wal.go:394 component=receive component=multi-tsdb tenant=default-tenant msg="Unknown series references" samples=1 exemplars=0 histograms=0 metadata=0
level=info ts=2023-02-02T09:24:12.401961273Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3554 maxSegment=3590
level=info ts=2023-02-02T09:24:42.489455618Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3555 maxSegment=3590
level=warn ts=2023-02-02T09:25:09.996509564Z caller=head_wal.go:394 component=receive component=multi-tsdb tenant=default-tenant msg="Unknown series references" samples=78 exemplars=0 histograms=0 metadata=0
level=info ts=2023-02-02T09:25:09.996584776Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3556 maxSegment=3590
level=warn ts=2023-02-02T09:25:32.902585514Z caller=head_wal.go:394 component=receive component=multi-tsdb tenant=default-tenant msg="Unknown series references" samples=1381 exemplars=0 histograms=0 metadata=0
level=info ts=2023-02-02T09:25:32.902654321Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3557 maxSegment=3590
level=warn ts=2023-02-02T09:26:42.474967685Z caller=head_wal.go:394 component=receive component=multi-tsdb tenant=default-tenant msg="Unknown series references" samples=30 exemplars=0 histograms=0 metadata=0
level=info ts=2023-02-02T09:26:42.475058296Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3558 maxSegment=3590
level=info ts=2023-02-02T09:27:09.43432155Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3559 maxSegment=3590
level=info ts=2023-02-02T09:27:35.071131027Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3560 maxSegment=3590
level=info ts=2023-02-02T09:27:56.405102302Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3561 maxSegment=3590
level=info ts=2023-02-02T09:28:17.611360868Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3562 maxSegment=3590
level=info ts=2023-02-02T09:28:45.511098239Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3563 maxSegment=3590
level=info ts=2023-02-02T09:29:35.119612968Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3564 maxSegment=3590
level=info ts=2023-02-02T09:30:29.174803666Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3565 maxSegment=3590
level=info ts=2023-02-02T09:30:52.81131704Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3566 maxSegment=3590
level=info ts=2023-02-02T09:31:22.500840331Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3567 maxSegment=3590
level=info ts=2023-02-02T09:31:53.336546444Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3568 maxSegment=3590
level=info ts=2023-02-02T09:32:24.706632278Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3569 maxSegment=3590
level=info ts=2023-02-02T09:33:23.240959703Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3570 maxSegment=3590
level=info ts=2023-02-02T09:34:07.440000122Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3571 maxSegment=3590
level=info ts=2023-02-02T09:34:31.680250827Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3572 maxSegment=3590
level=info ts=2023-02-02T09:35:03.592994435Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3573 maxSegment=3590
level=info ts=2023-02-02T09:35:13.435226522Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3574 maxSegment=3590
level=info ts=2023-02-02T09:35:36.443986132Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3575 maxSegment=3590
level=info ts=2023-02-02T09:35:40.896672984Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3576 maxSegment=3590
level=warn ts=2023-02-02T09:35:51.338969115Z caller=head_wal.go:394 component=receive component=multi-tsdb tenant=default-tenant msg="Unknown series references" samples=2118 exemplars=0 histograms=0 metadata=0
level=info ts=2023-02-02T09:35:51.339059919Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3577 maxSegment=3590
level=info ts=2023-02-02T09:35:51.425255012Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3578 maxSegment=3590
level=info ts=2023-02-02T09:36:47.653356145Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3579 maxSegment=3590
level=info ts=2023-02-02T09:37:38.919866978Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3580 maxSegment=3590
level=info ts=2023-02-02T09:38:09.746025186Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3581 maxSegment=3590
level=info ts=2023-02-02T09:38:39.013375223Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3582 maxSegment=3590
level=info ts=2023-02-02T09:39:05.50310057Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3583 maxSegment=3590
level=info ts=2023-02-02T09:39:31.929050297Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3584 maxSegment=3590
level=info ts=2023-02-02T09:40:27.430730888Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3585 maxSegment=3590
level=info ts=2023-02-02T09:41:13.521228684Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3586 maxSegment=3590
level=info ts=2023-02-02T09:41:37.507649803Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3587 maxSegment=3590
level=info ts=2023-02-02T09:41:53.213302941Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3588 maxSegment=3590
level=warn ts=2023-02-02T09:42:08.895343813Z caller=head_wal.go:394 component=receive component=multi-tsdb tenant=default-tenant msg="Unknown series references" samples=2 exemplars=0 histograms=0 metadata=0
level=info ts=2023-02-02T09:42:08.895431227Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3589 maxSegment=3590
level=info ts=2023-02-02T09:42:08.895974615Z caller=head.go:683 component=receive component=multi-tsdb tenant=default-tenant msg="WAL segment loaded" segment=3590 maxSegment=3590
level=info ts=2023-02-02T09:42:08.896000919Z caller=head.go:720 component=receive component=multi-tsdb tenant=default-tenant msg="WAL replay completed" checkpoint_replay_duration=2m49.350229941s wal_replay_duration=20m7.235769968s wbl_replay_duration=400ns total_replay_duration=23m3.689039951s
level=info ts=2023-02-02T09:42:54.064017452Z caller=multitsdb.go:534 component=receive component=multi-tsdb tenant=default-tenant msg="TSDB is now ready"
level=info ts=2023-02-02T09:42:54.097964256Z caller=intrumentation.go:56 component=receive msg="changing probe status" status=ready
level=info ts=2023-02-02T09:42:54.098002562Z caller=receive.go:616 component=receive msg="storage started, and server is ready to receive requests"
level=warn ts=2023-02-02T09:43:08.741780794Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=157
level=warn ts=2023-02-02T09:43:08.744918326Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=187
level=warn ts=2023-02-02T09:43:08.816567704Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=175
level=warn ts=2023-02-02T09:43:08.826098412Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=146
level=warn ts=2023-02-02T09:43:08.859955193Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=180
level=warn ts=2023-02-02T09:43:08.873848262Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=119
level=warn ts=2023-02-02T09:43:08.92800449Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=159
level=warn ts=2023-02-02T09:43:08.932687885Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=156
level=warn ts=2023-02-02T09:43:08.996877874Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=182
level=warn ts=2023-02-02T09:43:09.042821733Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=165
level=warn ts=2023-02-02T09:43:09.044143473Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=157
level=warn ts=2023-02-02T09:43:09.097278392Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=153
level=warn ts=2023-02-02T09:43:09.134065983Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=167
level=warn ts=2023-02-02T09:43:09.165086464Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=155
level=warn ts=2023-02-02T09:43:09.220763252Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=177
level=warn ts=2023-02-02T09:43:09.241381533Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=69
level=warn ts=2023-02-02T09:43:09.251690223Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=171
level=warn ts=2023-02-02T09:43:09.258125504Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=110
level=warn ts=2023-02-02T09:43:09.263312952Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=170
level=warn ts=2023-02-02T09:43:09.280965719Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=167
level=warn ts=2023-02-02T09:43:09.287243783Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=147
level=warn ts=2023-02-02T09:43:09.31539176Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=165
level=warn ts=2023-02-02T09:43:09.315777501Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=170
level=warn ts=2023-02-02T09:43:09.318375376Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=185
level=warn ts=2023-02-02T09:43:09.334437175Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=164
level=warn ts=2023-02-02T09:43:09.338388293Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=175
level=warn ts=2023-02-02T09:43:09.340381303Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=166
level=warn ts=2023-02-02T09:43:09.380420538Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=81
level=warn ts=2023-02-02T09:43:09.400156625Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=164
level=warn ts=2023-02-02T09:43:09.409360099Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=78
level=warn ts=2023-02-02T09:43:09.415381335Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=170
level=warn ts=2023-02-02T09:43:09.43509312Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=171
level=warn ts=2023-02-02T09:43:09.437115834Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=188
level=warn ts=2023-02-02T09:43:09.473561589Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=83
level=warn ts=2023-02-02T09:43:09.475029144Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=81
level=warn ts=2023-02-02T09:43:09.491458582Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=166
level=warn ts=2023-02-02T09:43:09.494708425Z caller=writer.go:188 component=receive component=receive-writer tenant=default-tenant msg="Error on ingesting samples that are too old or are too far into the future" numDropped=157

fpetkovski · 2023-02-02T15:11:00Z

Could you also post the logs of the describe command?

JayChanggithub · 2023-02-02T15:20:37Z

Sure. However i had been increased cpu/memory requests earlier. Previously error err="context deadline exceeded" msg="internal server error seems cannot duplicated. Currently describe command detailed as following.

updated cpu limit from 8 to 16 as well as memory limit from 48 to 56

k describe pod thanos-receive-1 -n thanos

Name:         thanos-receive-1
Namespace:    thanos
Priority:     0
Node:         shoot--rm--rm-dev-big-nodes-z2-66bd4-rskhs/10.250.0.10
Start Time:   Thu, 02 Feb 2023 17:03:52 +0800
Labels:       controller-revision-hash=thanos-receive-77567dddd4
              kubernetes.io/name=thanos-receive
              statefulset.kubernetes.io/pod-name=thanos-receive-1
              thanos-store-api=true
Annotations:  cni.projectcalico.org/containerID: 2f4128f7d732f736447002600b9e44e038cffcc29edfd2c8b30b1891f9e1fb56
              cni.projectcalico.org/podIP: 100.96.0.100/32
              cni.projectcalico.org/podIPs: 100.96.0.100/32
              kubernetes.io/psp: gardener.privileged
              policies.kyverno.io/last-applied-patches: image-pull-secret-add.image-pull-secret-operation.kyverno.io: added /spec/imagePullSecrets/0
Status:       Running
IP:           100.96.0.100
IPs:
  IP:           100.96.0.100
Controlled By:  StatefulSet/thanos-receive
Containers:
  thanos-receive:
    Container ID:  containerd://d63431cf0ae2a99dcee8e73ca929cf293f4992667ac2becf11264472c8511eed
    Image:         thanosio/thanos:v0.30.2
    Image ID:      docker.io/thanosio/thanos@sha256:6b97f63c716781c487da88750850cf5a4e0a1c23af32764e97faefc1383432b1
    Ports:         10901/TCP, 10902/TCP, 19291/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      receive
      --grpc-address=0.0.0.0:10901
      --http-address=0.0.0.0:10902
      --remote-write.address=0.0.0.0:19291
      --receive.replication-factor=1
      --objstore.config-file=/etc/thanos/objectstorage.yaml
      --tsdb.path=/var/thanos/receive
      --tsdb.retention=12h
      --receive-forward-timeout=120s
      --label=receive_replica="$(NAME)"
      --label=receive="true"
      --receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.json
      --receive.local-endpoint=$(NAME).thanos-receive.thanos.svc.cluster.local:10901
    State:          Running
      Started:      Thu, 02 Feb 2023 17:49:01 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 02 Feb 2023 17:18:59 +0800
      Finished:     Thu, 02 Feb 2023 17:48:46 +0800
    Ready:          True
    Restart Count:  2
    Limits:
      cpu:     16
      memory:  56Gi
    Requests:
      cpu:      2
      memory:   12Gi
    Liveness:   http-get http://:10902/-/healthy delay=0s timeout=1s period=30s #success=1 #failure=4
    Readiness:  http-get http://:10902/-/ready delay=10s timeout=1s period=30s #success=1 #failure=3
    Environment:
      NAME:                     thanos-receive-1 (v1:metadata.name)
      KUBERNETES_SERVICE_HOST:  api.rm-dev.rm.internal.live.k8s.ondemand.com
    Mounts:
      /etc/thanos/objectstorage.yaml from thanos-objectstorage (rw,path="objectstorage.yaml")
      /etc/thanos/thanos-receive-hashrings.json from thanos-receive-hashrings (rw,path="thanos-receive-hashrings.json")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rmxcc (ro)
      /var/thanos/receive from data (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-thanos-receive-1
    ReadOnly:   false
  thanos-receive-hashrings:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      thanos-receive-hashrings
    Optional:  false
  thanos-objectstorage:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  thanos-objectstorage
    Optional:    false
  kube-api-access-rmxcc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

ChanderG · 2023-03-31T15:07:49Z

@fpetkovski Not OP, we are facing the same problem (the panic due to nil store LabelSet) that doesn't go away even with changes in #6067.

This is happening on one particular environment (also happens to have the data folder in a PV like the original poster). We are on 0.30.2 with the changes in #6067 cherry-picked. Trying to isolate/figure out a minimal reproducible example, but wanted to check with you - if you can think of a reason why this bug would still occur.

fpetkovski · 2023-03-31T15:10:12Z

You might be hitting #6190

ChanderG · 2023-04-03T09:08:16Z

@fpetkovski The problem seems to have been fixed with changes from #6203.

Fwiw, I was able to reliably reproduce the panic (and confirm the fix with #6203) by simulating a slow file system using FUSE. (Nothing more than the FS shown in this blog post https://www.stavros.io/posts/python-fuse-filesystem/ with some time.sleep(1) added to key functions.)

My initial experiments were with some data from the original Ingestor pod that had crashed, but I just confirmed that this panic occurs even when starting out with an empty data folder (on the slow fs).

Just fyi, in case it helps in testing/debugging other problems.

matej-g added bug component: receive needs-investigation labels Jan 17, 2023

fpetkovski mentioned this issue Jan 24, 2023

Fix Receiver panic when querying uninitialized TSDBs #6067

Merged

2 tasks

GiedriusS closed this as completed in #6067 Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanos component of receive got recovered from panic. #6047

Thanos component of receive got recovered from panic. #6047

JayChanggithub commented Jan 16, 2023 •

edited

Loading

lukasCoppens commented Jan 17, 2023

fpetkovski commented Jan 17, 2023

JayChanggithub commented Jan 17, 2023

matej-g commented Jan 18, 2023

JayChanggithub commented Jan 18, 2023

fpetkovski commented Jan 24, 2023

JayChanggithub commented Feb 2, 2023

fpetkovski commented Feb 2, 2023

JayChanggithub commented Feb 2, 2023

fpetkovski commented Feb 2, 2023

JayChanggithub commented Feb 2, 2023

ChanderG commented Mar 31, 2023

fpetkovski commented Mar 31, 2023

ChanderG commented Apr 3, 2023

Thanos component of receive got recovered from panic. #6047

Thanos component of receive got recovered from panic. #6047

Comments

JayChanggithub commented Jan 16, 2023 • edited Loading

lukasCoppens commented Jan 17, 2023

fpetkovski commented Jan 17, 2023

JayChanggithub commented Jan 17, 2023

matej-g commented Jan 18, 2023

JayChanggithub commented Jan 18, 2023

fpetkovski commented Jan 24, 2023

JayChanggithub commented Feb 2, 2023

fpetkovski commented Feb 2, 2023

JayChanggithub commented Feb 2, 2023

fpetkovski commented Feb 2, 2023

JayChanggithub commented Feb 2, 2023

ChanderG commented Mar 31, 2023

fpetkovski commented Mar 31, 2023

ChanderG commented Apr 3, 2023

JayChanggithub commented Jan 16, 2023 •

edited

Loading