Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traceing use elastic apm #4386

Open
MrYueQ opened this issue Jun 30, 2021 · 18 comments
Open

traceing use elastic apm #4386

MrYueQ opened this issue Jun 30, 2021 · 18 comments
Labels
bug dont-go-stale Label for important issues which tells the stalebot not to close them help wanted needs-investigation

Comments

@MrYueQ
Copy link

MrYueQ commented Jun 30, 2021

Thanos, Prometheus and Golang version used:

thanos, version 0.21.0 (branch: HEAD, revision: d19aea21b178a6c08838c6a05b834dd44ef14273)
  build user:       root@30b80273f41d
  build date:       20210603-13:49:01
  go version:       go1.16.4
  platform:         linux/amd64

prometheus, version 2.25.0 (branch: HEAD, revision: a6be548dbc17780d562a39c0e4bd0bd4c00ad6e2)
  build user:       root@615f028225c9
  build date:       20210217-14:17:24
  go version:       go1.15.8
  platform:         linux/amd64

apm-server version
apm-server version 7.8.1 (amd64), libbeat 7.8.1 [f1a08a22e814d19ee01d5d714430a851552d4937 built 2020-07-21 14:00:32 +0000 UTC]

Object Storage Provider:

COS

What happened:

thanos turns on the traceid function. Report indicators to elastic

elastic-apm-config

 trace.elastic.apm.yml
type: ELASTIC_APM
config:
  service_name: "thanos"
  service_version: "v1.0.0"
  service_environment: "maxwell"
  sample_rate: 0

systemctl

Environment="ELASTIC_APM_SERVER_URL=http://apm-server:8200"
Environment="ELASTIC_APM_LOG_LEVEL=debug"
Environment="ELASTIC_APM_LOG_FILE=stdout"

thanos query --store.response-timeout=2m \
                  --query.auto-downsampling  \
                  --query.partial-response  \
                  --query.max-concurrent=200 \
                  --query.max-concurrent-select=40 \
                  --query.metadata.default-time-range=30s \
                  --log.format=json \
                  --log.level=warn \
                  --store.unhealthy-timeout=3m  \
                  --store.sd-interval=3m  \
                  --store.sd-files=/data/app/thanos/conf.d/store.yml \
                  --request.logging-config-file=/data/app/thanos/conf.d/logger.yml \
                  --tracing.config-file=/data/app/thanos/conf.d/trace.elastic.apm.yml

image
image

Full logs to relevant components:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x142274f]
goroutine 713 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc00125a630)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc00125a630, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc00125a630)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc000372e00, 0x1fc3220, 0xc000076120, 0xe9591)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc000373140, 0x1c46d40, 0xc000cd41e0, 0x10, 0x1a936c0)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00000db90, 0x1c46d40, 0xc000cd41e0, 0x2, 0x2)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc000a42810, 0xc0014caf70, 0x0, 0x1)
/app/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc00125a7e0, 0xc0009d60c0, 0xc0009d6060)
/app/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
/app/pkg/store/proxy.go:392 +0x1af
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x142274f]
goroutine 837 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc000e90fc0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc000e90fc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc000e90fc0)
/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc0004fd8c0, 0x1fc3220, 0xc000076120, 0x10be2a)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc0004fd980, 0x1c46d40, 0xc000e7c4d0, 0x10, 0x1a936c0)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00000d320, 0x1c46d40, 0xc000e7c4d0, 0x2, 0x2)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc000a42360, 0xc001185f70, 0x0, 0x1)
/app/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc000e91170, 0xc00027c660, 0xc00027c600)
/app/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
/app/pkg/store/proxy.go:392 +0x1af
@MrYueQ
Copy link
Author

MrYueQ commented Jul 14, 2021

image

@MrYueQ
Copy link
Author

MrYueQ commented Aug 6, 2021

Help me.

@BoringCat
Copy link

Same panic in thanos@0.22.0

Thanos, Golang version used:

thanos, version 0.22.0 (branch: HEAD, revision: 3656e290c8eb034e2e09e0afc79f087be88de481)
  build user:       circleci@d99a55512da0
  build date:       20210721-15:34:46
  go version:       go1.16.6
  platform:         linux/amd64

panic log:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x143818f]

goroutine 559 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc000677950)
	/home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc000677950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc000677950)
	/home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc00056ab00, 0x1fed600, 0xc00009a110, 0x2a9ac6)
	/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc00056adc0, 0x1c6b420, 0xc00040a1a0, 0x10, 0x1ab4920)
	/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00098a828, 0x1c6b420, 0xc00040a1a0, 0x1b08220, 0xc00009a101)
	/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc0002092a0, 0xc000a75f70, 0x0, 0x1)
	/home/circleci/project/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc000677b00, 0xc000675380, 0xc000675320)
	/home/circleci/project/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
	/home/circleci/project/pkg/store/proxy.go:392 +0x1af

Full logs to relevant components:

Full logs with --log.level=debug
{"caller":"main.go:64","level":"debug","msg":"maxprocs: Updating GOMAXPROCS=[2]: determined from CPU quota","ts":"2021-08-17T09:31:10.272249221Z"}
{"caller":"factory.go:38","level":"info","msg":"loading tracing configuration","ts":"2021-08-17T09:31:10.272428136Z"}
{"caller":"engine.go:292","level":"debug","msg":"Lookback delta is zero, setting to default value","ts":"2021-08-17T09:31:10.27636971Z","value":"5m0s"}
{"caller":"options.go:24","level":"info","msg":"disabled TLS, key and cert must be set to enable","protocol":"gRPC","ts":"2021-08-17T09:31:10.277965106Z"}
{"caller":"query.go:640","level":"info","msg":"starting query node","ts":"2021-08-17T09:31:10.278338688Z"}
{"caller":"intrumentation.go:60","level":"info","msg":"changing probe status","status":"healthy","ts":"2021-08-17T09:31:10.278503006Z"}
{"address":"0.0.0.0:9090","caller":"http.go:63","component":"query","level":"info","msg":"listening for requests and metrics","service":"http/server","ts":"2021-08-17T09:31:10.278536659Z"}
{"caller":"tls_config.go:191","component":"query","http2":false,"level":"info","msg":"TLS is disabled.","service":"http/server","ts":"2021-08-17T09:31:10.278649952Z"}
{"caller":"intrumentation.go:48","level":"info","msg":"changing probe status","status":"ready","ts":"2021-08-17T09:31:10.278713161Z"}
{"address":"0.0.0.0:10901","caller":"grpc.go:123","component":"query","level":"info","msg":"listening for serving gRPC","service":"gRPC/server","ts":"2021-08-17T09:31:10.278743287Z"}
{"cachedStores":0,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:10.278776269Z"}
{"activeStores":0,"cachedStores":0,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:10.278829168Z"}
{"cachedStores":0,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:15.279829519Z"}
{"activeStores":9,"cachedStores":0,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:15.325354902Z"}
{"address":"172.16.0.24:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325406739Z"}
{"address":"172.16.0.250:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.32542843Z"}
{"address":"172.16.0.113:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325471391Z"}
{"address":"172.16.1.7:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325507488Z"}
{"address":"172.16.1.3:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325530953Z"}
{"address":"172.16.0.110:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325552383Z"}
{"address":"172.16.0.184:10901","caller":"storeset.go:463","component":"storeset","extLset":"{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325574765Z"}
{"address":"172.16.0.32:10901","caller":"storeset.go:463","component":"storeset","extLset":"{rule_replica=\"thanos-rule-0\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325597248Z"}
{"address":"172.16.0.116:10901","caller":"storeset.go:463","component":"storeset","extLset":"{rule_replica=\"thanos-rule-1\"}","level":"info","msg":"adding new storeAPI to query storeset","ts":"2021-08-17T09:31:15.325620601Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:20.282361055Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:20.284025571Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:25.282178565Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:25.283644257Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:30.282237366Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:30.284005367Z"}
{"cachedStores":9,"caller":"storeset.go:409","component":"storeset","level":"debug","msg":"starting updating storeAPIs","ts":"2021-08-17T09:31:35.279837842Z"}
{"activeStores":9,"cachedStores":9,"caller":"storeset.go:412","component":"storeset","level":"debug","msg":"checked requested storeAPIs","ts":"2021-08-17T09:31:35.281350331Z"}
{"caller":"proxy.go:659","level":"debug","msg":"Store Addr: 172.16.0.110:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.250:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.113:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 queried;Store Addr: 172.16.1.7:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 queried;Store Addr: 172.16.1.3:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.116:10901 LabelSets: {rule_replica=\"thanos-rule-1\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out due to does not have data within this time period: [1629149483143,1629192683143]. Store time ranges: [9223372036854775807,9223372036854775807];Store Addr: 172.16.0.24:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.184:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.32:10901 LabelSets: {rule_replica=\"thanos-rule-0\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out due to does not have data within this time period: [1629149483143,1629192683143]. Store time ranges: [9223372036854775807,9223372036854775807]","ts":"2021-08-17T09:31:38.854410793Z"}
{"caller":"proxy.go:274","component":"proxy","level":"debug","msg":"Series: started fanout streams","request":"min_time:1629192392224 max_time:1629192692224 matchers: aggregates:COUNT aggregates:SUM partial_response_disabled:true ","status":"Store Addr: 172.16.0.24:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.184:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;store Addr: 172.16.0.32:10901 LabelSets: {rule_replica=\"thanos-rule-0\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [9223372036854775807,9223372036854775807];Store Addr: 172.16.1.3:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;store Addr: 172.16.0.116:10901 LabelSets: {rule_replica=\"thanos-rule-1\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [9223372036854775807,9223372036854775807];Store Addr: 172.16.0.110:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;Store Addr: 172.16.0.250:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"} Mint: 1628582400000 Maxt: 9223372036854775807 queried;store Addr: 172.16.0.113:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [1624439036465,1629188507217];store Addr: 172.16.1.7:10901 LabelSets: {receive=\"true\", receive_replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-1\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-2\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-3\", tenant_id=\"default-tenant\"},{receive=\"true\", receive_replica=\"thanos-receive-4\", tenant_id=\"default-tenant\"} Mint: 1624439036465 Maxt: 1629188507217 filtered out: does not have data within this time period: [1629192392224,1629192692224]. Store time ranges: [1624439036465,1629188507217]","ts":"2021-08-17T09:31:38.924669277Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x143818f]

goroutine 559 [running]:
go.elastic.co/apm/module/apmot.(*otSpan).setSpanContext(0xc000677950)
/home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 +0x2ef
go.elastic.co/apm/module/apmot.(*otSpan).FinishWithOptions(0xc000677950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:99 +0x22e
go.elastic.co/apm/module/apmot.(*otSpan).Finish(0xc000677950)
/home/circleci/go/pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:75 +0x4b
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/tracing.(*opentracingClientReporter).PostCall(0xc00056ab00, 0x1fed600, 0xc00009a110, 0x2a9ac6)
/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/tracing/client.go:34 +0x90
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.(*monitoredClientStream).RecvMsg(0xc00056adc0, 0x1c6b420, 0xc00040a1a0, 0x10, 0x1ab4920)
/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.2.0.20201207153454-9f6bf00c00a7/interceptors/client.go:78 +0x155
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredClientStream).RecvMsg(0xc00098a828, 0x1c6b420, 0xc00040a1a0, 0x1b08220, 0xc00009a101)
/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:160 +0x4b
github.com/thanos-io/thanos/pkg/store/storepb.(*storeSeriesClient).Recv(0xc0002092a0, 0xc000a75f70, 0x0, 0x1)
/home/circleci/project/pkg/store/storepb/rpc.pb.go:756 +0x62
github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1.2(0xc000677b00, 0xc000675380, 0xc000675320)
/home/circleci/project/pkg/store/proxy.go:394 +0x4b
created by github.com/thanos-io/thanos/pkg/store.startStreamSeriesSet.func1
/home/circleci/project/pkg/store/proxy.go:392 +0x1af

@stale
Copy link

stale bot commented Oct 30, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Oct 30, 2021
@omron93
Copy link

omron93 commented Nov 1, 2021

I'm getting the same issue even after upgrading to 0.23.1

@stale stale bot removed the stale label Nov 1, 2021
@matej-g
Copy link
Collaborator

matej-g commented Nov 8, 2021

Looks like this started to appear after #4619 but hasn't been addressed yet. I'm wondering what is the exact issue, to me it looks like something might be up with APM's implementation. This would need further investigation but might be worth opening issue upstream as well.

@dbasden
Copy link

dbasden commented Nov 9, 2021

The problem is happening at pkg/mod/go.elastic.co/apm/module/apmot@v1.11.0/span.go:182 with the s.span.Context.SetLabel call.

Everything is fine with spans from the inbound HTTP UI requests, where s.span.Context is set (likely directly in thanos/pkg/tracing/http.go)

When an intercepted GRPC span is handled by the above APM span code however (e.g. when Thanos query is connecting to a store API over GRPC to service a query) s.span.Context is nil, and the null pointer deference for SetLabel causes the SIGSEGV.

Looking at thanos/pkg/tracing/tracing/client/factory.go, it looks like elastic APM is the only library that is not passed a context when setting up a new tracer. I'm guessing that could be why the problem is only happening with APM, and only with traces using the grpc interception middleware

That's as far as I got with a bunch of time and dlv. I'm not that experienced with golang or the ecosystem so can't fix it myself but hopefully that should give a lot more direction (assuming I'm right about the root cause)

@matej-g
Copy link
Collaborator

matej-g commented Nov 10, 2021

Thanks for the investigation @dbasden! I had a bit of closer look as well, to me it seems that for some reason, in the APM implementation, the SpanData pointer is nil at the point where the program panics. SpanData has field Context, which we try to get in setSpanContext() through embedding in s.span.

According to the docs, the SpanData should be only nil if the span has been ended. This could mean that we are trying to call setSpanContext() on an already ended span, which would explain it being nil. But I wasn't able to confirm whether this is really the case yet.

@stale
Copy link

stale bot commented Mar 2, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Mar 2, 2022
@omron93
Copy link

omron93 commented Mar 3, 2022

This is still needed and would be great to have it fixed :-)

@stale
Copy link

stale bot commented May 2, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label May 2, 2022
@omron93
Copy link

omron93 commented May 3, 2022

This is still needed and would be great to have it fixed :-)

@stale stale bot removed the stale label May 3, 2022
@stale
Copy link

stale bot commented Jul 31, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 31, 2022
@omron93
Copy link

omron93 commented Aug 3, 2022

This is still needed and would be great to have it fixed :-)

@stale stale bot removed the stale label Aug 3, 2022
@pjaak
Copy link

pjaak commented Sep 6, 2022

Can confirm still happening and still would be great to have fixed!

@matej-g
Copy link
Collaborator

matej-g commented Sep 8, 2022

Yes, this unfortunately this still needs more investigation, it would be great if someone who's affected had the time and resources to look at this 😞

@stale
Copy link

stale bot commented Nov 13, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Nov 13, 2022
@matej-g matej-g removed the stale label Nov 14, 2022
@matej-g matej-g added the dont-go-stale Label for important issues which tells the stalebot not to close them label Nov 14, 2022
@bck01215
Copy link

Same error 0.28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug dont-go-stale Label for important issues which tells the stalebot not to close them help wanted needs-investigation
Projects
None yet
Development

No branches or pull requests

7 participants