We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object Storage Provider:
S3
What happened:
Error occurred while downsampling but Thanos Compactor got stuck
What you expected to happen:
For Thanos Compactor to get into a "halt" state
How to reproduce it (as minimally and precisely as possible):
Seems like it should be enough to have multiple blocks "to be downsampled" and for an error to occur while downsampling one of them.
Full logs to relevant components:
Goroutine dump:
goroutine profile: total 24 4 @ 0x4380f6 0x4309fe 0x463049 0x4d5532 0x4d689a 0x4d6888 0x56c9c9 0x57e605 0x76fd2e 0x5519c3 0x551b1d 0x770aec 0x468981 # 0x463048 internal/poll.runtime_pollWait+0x88 /usr/lib/go-1.17/src/runtime/netpoll.go:229 # 0x4d5531 internal/poll.(*pollDesc).wait+0x31 /usr/lib/go-1.17/src/internal/poll/fd_poll_runtime.go:84 # 0x4d6899 internal/poll.(*pollDesc).waitRead+0x259 /usr/lib/go-1.17/src/internal/poll/fd_poll_runtime.go:89 # 0x4d6887 internal/poll.(*FD).Read+0x247 /usr/lib/go-1.17/src/internal/poll/fd_unix.go:167 # 0x56c9c8 net.(*netFD).Read+0x28 /usr/lib/go-1.17/src/net/fd_posix.go:56 # 0x57e604 net.(*conn).Read+0x44 /usr/lib/go-1.17/src/net/net.go:183 # 0x76fd2d net/http.(*persistConn).Read+0x4d /usr/lib/go-1.17/src/net/http/transport.go:1926 # 0x5519c2 bufio.(*Reader).fill+0x102 /usr/lib/go-1.17/src/bufio/bufio.go:101 # 0x551b1c bufio.(*Reader).Peek+0x5c /usr/lib/go-1.17/src/bufio/bufio.go:139 # 0x770aeb net/http.(*persistConn).readLoop+0x1ab /usr/lib/go-1.17/src/net/http/transport.go:2087 4 @ 0x4380f6 0x447ed2 0x7727bb 0x468981 # 0x7727ba net/http.(*persistConn).writeLoop+0xfa /usr/lib/go-1.17/src/net/http/transport.go:2386 2 @ 0x4380f6 0x4309fe 0x463049 0x4d5532 0x4d689a 0x4d6888 0x56c9c9 0x57e605 0x750e4d 0x5519c3 0x55258f 0x5527e7 0x6df1b9 0x74c359 0x74c35a 0x752205 0x756545 0x468981 # 0x463048 internal/poll.runtime_pollWait+0x88 /usr/lib/go-1.17/src/runtime/netpoll.go:229 # 0x4d5531 internal/poll.(*pollDesc).wait+0x31 /usr/lib/go-1.17/src/internal/poll/fd_poll_runtime.go:84 # 0x4d6899 internal/poll.(*pollDesc).waitRead+0x259 /usr/lib/go-1.17/src/internal/poll/fd_poll_runtime.go:89 # 0x4d6887 internal/poll.(*FD).Read+0x247 /usr/lib/go-1.17/src/internal/poll/fd_unix.go:167 # 0x56c9c8 net.(*netFD).Read+0x28 /usr/lib/go-1.17/src/net/fd_posix.go:56 # 0x57e604 net.(*conn).Read+0x44 /usr/lib/go-1.17/src/net/net.go:183 # 0x750e4c net/http.(*connReader).Read+0x16c /usr/lib/go-1.17/src/net/http/server.go:780 # 0x5519c2 bufio.(*Reader).fill+0x102 /usr/lib/go-1.17/src/bufio/bufio.go:101 # 0x55258e bufio.(*Reader).ReadSlice+0x2e /usr/lib/go-1.17/src/bufio/bufio.go:360 # 0x5527e6 bufio.(*Reader).ReadLine+0x26 /usr/lib/go-1.17/src/bufio/bufio.go:389 # 0x6df1b8 net/textproto.(*Reader).readLineSlice+0x98 /usr/lib/go-1.17/src/net/textproto/reader.go:57 # 0x74c358 net/textproto.(*Reader).ReadLine+0x78 /usr/lib/go-1.17/src/net/textproto/reader.go:38 # 0x74c359 net/http.readRequest+0x79 /usr/lib/go-1.17/src/net/http/request.go:1029 # 0x752204 net/http.(*conn).readRequest+0x224 /usr/lib/go-1.17/src/net/http/server.go:966 # 0x756544 net/http.(*conn).serve+0x864 /usr/lib/go-1.17/src/net/http/server.go:1855 1 @ 0x40b8f4 0x464f18 0x5fb759 0x468981 # 0x464f17 os/signal.signal_recv+0x97 /usr/lib/go-1.17/src/runtime/sigqueue.go:169 # 0x5fb758 os/signal.loop+0x18 /usr/lib/go-1.17/src/os/signal/signal_unix.go:24 1 @ 0x4380f6 0x40640c 0x405e38 0x1694cb3 0x5fbc2f 0x468981 # 0x1694cb2 main.main.func2+0x32 /home/giedrius/dev/thanos/cmd/thanos/main.go:115 # 0x5fbc2e github.com/oklog/run.(*Group).Run.func1+0x2e /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 1 @ 0x4380f6 0x40640c 0x405e38 0x5fb99c 0x16946ba 0x437d27 0x468981 # 0x5fb99b github.com/oklog/run.(*Group).Run+0x7b /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:43 # 0x16946b9 main.main+0x15b9 /home/giedrius/dev/thanos/cmd/thanos/main.go:155 # 0x437d26 runtime.main+0x226 /usr/lib/go-1.17/src/runtime/proc.go:255 1 @ 0x4380f6 0x40640c 0x405e78 0xe2cd25 0x468981 # 0xe2cd24 github.com/baidubce/bce-sdk-go/util/log.NewLogger.func1+0x64 /home/giedrius/go/pkg/mod/github.com/baidubce/bce-sdk-go@v0.9.81/util/log/logger.go:362 1 @ 0x4380f6 0x4309fe 0x463049 0x4d5532 0x4d888c 0x4d8879 0x56e175 0x587e28 0x586ffd 0x75b1f4 0xbc78df 0xbc7679 0xcb1fe5 0x168a3b5 0x5fbc2f 0x468981 # 0x463048 internal/poll.runtime_pollWait+0x88 /usr/lib/go-1.17/src/runtime/netpoll.go:229 # 0x4d5531 internal/poll.(*pollDesc).wait+0x31 /usr/lib/go-1.17/src/internal/poll/fd_poll_runtime.go:84 # 0x4d888b internal/poll.(*pollDesc).waitRead+0x22b /usr/lib/go-1.17/src/internal/poll/fd_poll_runtime.go:89 # 0x4d8878 internal/poll.(*FD).Accept+0x218 /usr/lib/go-1.17/src/internal/poll/fd_unix.go:402 # 0x56e174 net.(*netFD).accept+0x34 /usr/lib/go-1.17/src/net/fd_unix.go:173 # 0x587e27 net.(*TCPListener).accept+0x27 /usr/lib/go-1.17/src/net/tcpsock_posix.go:140 # 0x586ffc net.(*TCPListener).Accept+0x3c /usr/lib/go-1.17/src/net/tcpsock.go:262 # 0x75b1f3 net/http.(*Server).Serve+0x393 /usr/lib/go-1.17/src/net/http/server.go:3001 # 0xbc78de github.com/prometheus/exporter-toolkit/web.Serve+0x17e /home/giedrius/go/pkg/mod/github.com/prometheus/exporter-toolkit@v0.6.1/web/tls_config.go:192 # 0xbc7678 github.com/prometheus/exporter-toolkit/web.ListenAndServe+0xf8 /home/giedrius/go/pkg/mod/github.com/prometheus/exporter-toolkit@v0.6.1/web/tls_config.go:184 # 0xcb1fe4 github.com/thanos-io/thanos/pkg/server/http.(*Server).ListenAndServe+0x1a4 /home/giedrius/dev/thanos/pkg/server/http/http.go:68 # 0x168a3b4 main.runCompact.func1+0x34 /home/giedrius/dev/thanos/cmd/thanos/compact.go:190 # 0x5fbc2e github.com/oklog/run.(*Group).Run.func1+0x2e /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 1 @ 0x4380f6 0x447ed2 0x1690fbb 0x8d4387 0x468981 # 0x1690fba main.downsampleBucket.func4+0x25a /home/giedrius/dev/thanos/cmd/thanos/downsample.go:303 # 0x8d4386 golang.org/x/sync/errgroup.(*Group).Go.func1+0x66 /home/giedrius/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 1 @ 0x4380f6 0x447ed2 0x169506e 0x1694ac5 0x5fbc2f 0x468981 # 0x169506d main.interrupt+0x10d /home/giedrius/dev/thanos/cmd/thanos/main.go:166 # 0x1694ac4 main.main.func4+0x24 /home/giedrius/dev/thanos/cmd/thanos/main.go:139 # 0x5fbc2e github.com/oklog/run.(*Group).Run.func1+0x2e /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 1 @ 0x4380f6 0x447ed2 0x1695365 0x1694a49 0x5fbc2f 0x468981 # 0x1695364 main.reload+0x104 /home/giedrius/dev/thanos/cmd/thanos/main.go:179 # 0x1694a48 main.main.func6+0x28 /home/giedrius/dev/thanos/cmd/thanos/main.go:149 # 0x5fbc2e github.com/oklog/run.(*Group).Run.func1+0x2e /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 1 @ 0x4380f6 0x447ed2 0xcc1405 0x1688805 0x5fbc2f 0x468981 # 0xcc1404 github.com/thanos-io/thanos/pkg/runutil.Repeat+0xe4 /home/giedrius/dev/thanos/pkg/runutil/runutil.go:78 # 0x1688804 main.runCompact.func14+0x124 /home/giedrius/dev/thanos/cmd/thanos/compact.go:545 # 0x5fbc2e github.com/oklog/run.(*Group).Run.func1+0x2e /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 1 @ 0x4380f6 0x447ed2 0xcc1405 0x1688a70 0x5fbc2f 0x468981 # 0xcc1404 github.com/thanos-io/thanos/pkg/runutil.Repeat+0xe4 /home/giedrius/dev/thanos/pkg/runutil/runutil.go:78 # 0x1688a6f main.runCompact.func12+0x4f /home/giedrius/dev/thanos/cmd/thanos/compact.go:533 # 0x5fbc2e github.com/oklog/run.(*Group).Run.func1+0x2e /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 1 @ 0x4380f6 0x447ed2 0xebbef9 0x468981 # 0xebbef8 go.opencensus.io/stats/view.(*worker).start+0xb8 /home/giedrius/go/pkg/mod/go.opencensus.io@v0.23.0/stats/view/worker.go:276 1 @ 0x4380f6 0x448fcc 0x448fa6 0x464745 0x474151 0x8d4207 0x1690b18 0x16899a8 0x1688ef3 0xcc13b0 0x1688ded 0x5fbc2f 0x468981 # 0x464744 sync.runtime_Semacquire+0x24 /usr/lib/go-1.17/src/runtime/sema.go:56 # 0x474150 sync.(*WaitGroup).Wait+0x70 /usr/lib/go-1.17/src/sync/waitgroup.go:130 # 0x8d4206 golang.org/x/sync/errgroup.(*Group).Wait+0x26 /home/giedrius/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:40 # 0x1690b17 main.downsampleBucket+0xc37 /home/giedrius/dev/thanos/cmd/thanos/downsample.go:312 # 0x16899a7 main.runCompact.func7+0x707 /home/giedrius/dev/thanos/cmd/thanos/compact.go:441 # 0x1688ef2 main.runCompact.func8.1+0x52 /home/giedrius/dev/thanos/cmd/thanos/compact.go:470 # 0xcc13af github.com/thanos-io/thanos/pkg/runutil.Repeat+0x8f /home/giedrius/dev/thanos/pkg/runutil/runutil.go:75 # 0x1688dec main.runCompact.func8+0x1cc /home/giedrius/dev/thanos/cmd/thanos/compact.go:469 # 0x5fbc2e github.com/oklog/run.(*Group).Run.func1+0x2e /home/giedrius/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 1 @ 0x462b65 0xb9bef5 0xb9bd0d 0xb98e8b 0xba7bfa 0xba87ae 0x75770f 0x759009 0x75ac7b 0x7567e8 0x468981 # 0x462b64 runtime/pprof.runtime_goroutineProfileWithLabels+0x24 /usr/lib/go-1.17/src/runtime/mprof.go:746 # 0xb9bef4 runtime/pprof.writeRuntimeProfile+0xb4 /usr/lib/go-1.17/src/runtime/pprof/pprof.go:724 # 0xb9bd0c runtime/pprof.writeGoroutine+0x4c /usr/lib/go-1.17/src/runtime/pprof/pprof.go:684 # 0xb98e8a runtime/pprof.(*Profile).WriteTo+0x14a /usr/lib/go-1.17/src/runtime/pprof/pprof.go:331 # 0xba7bf9 net/http/pprof.handler.ServeHTTP+0x499 /usr/lib/go-1.17/src/net/http/pprof/pprof.go:253 # 0xba87ad net/http/pprof.Index+0x12d /usr/lib/go-1.17/src/net/http/pprof/pprof.go:371 # 0x75770e net/http.HandlerFunc.ServeHTTP+0x2e /usr/lib/go-1.17/src/net/http/server.go:2046 # 0x759008 net/http.(*ServeMux).ServeHTTP+0x148 /usr/lib/go-1.17/src/net/http/server.go:2424 # 0x75ac7a net/http.serverHandler.ServeHTTP+0x43a /usr/lib/go-1.17/src/net/http/server.go:2878 # 0x7567e7 net/http.(*conn).serve+0xb07 /usr/lib/go-1.17/src/net/http/server.go:1929 1 @ 0x468981
In metrics you can see this:
thanos_compact_downsample_failures_total{group="0@18435695797974204449"} 1
Anything else we need to know:
Reproduced on 0.23.1.
The text was updated successfully, but these errors were encountered:
If error happens here:
if err := processDownsampling(ctx, logger, bkt, m, dir, resolution, hashFunc, metrics); err != nil { metrics.downsampleFailures.WithLabelValues(compact.DefaultGroupKey(m.Thanos)).Inc() return errors.Wrap(err, errMsg) }
Then:
select { case <-ctx.Done(): return ctx.Err() ----> case ch <- m: }
Will never execute. ctx will never be done because we pass context.Background.
ctx
context.Background
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
Object Storage Provider:
S3
What happened:
Error occurred while downsampling but Thanos Compactor got stuck
What you expected to happen:
For Thanos Compactor to get into a "halt" state
How to reproduce it (as minimally and precisely as possible):
Seems like it should be enough to have multiple blocks "to be downsampled" and for an error to occur while downsampling one of them.
Full logs to relevant components:
Goroutine dump:
In metrics you can see this:
Anything else we need to know:
Reproduced on 0.23.1.
The text was updated successfully, but these errors were encountered: