Retry on network failures (e.g uploads) #318

bwplotka · 2018-05-02T08:37:57Z

Not critical since compactor just restarted and continued just fine, but can be annoying.

level=error name=thanos-compactor ts=2018-04-28T10:32:12.73383864Z caller=main.go:147 msg="running command failed" err="first pass of downsampling failed: retrieve bucket block metas: get meta for block 01C6XZ1256S7VFNQP9D36XJ4F4: Get https://storage.googleapis.com/thanos-alpha/01C6XZ1256S7VFNQP9D36XJ4F4/meta.json: dial tcp [xxx]:443: connect: network is unreachable"

The text was updated successfully, but these errors were encountered:

bwplotka · 2018-06-01T16:46:30Z

Especially funny is the single error during sync metas that causes compactor to retry WHOLE sync.

level=debug ts=2018-05-31T00:14:19.617242203Z caller=compact.go:165 msg="download meta" block=01C8146Z5B7AX5HYTV6S15G044
level=error ts=2018-05-31T00:14:34.617896977Z caller=compact.go:205 msg="retriable error" err="sync: retrieve bucket block metas: downloading meta.json for 01C8146Z5B7AX5HYTV6S15G044: decode meta.json for block 01C8146Z5B7AX5HYTV6S15G044: Get https://prod-int-spaces01.nyc3.internal.digitalocean.com/pandora-lts-ams2-hvs/01C8146Z5B7AX5HYTV6S15G044/meta.json: net/http: timeout awaiting response headers"
level=info ts=2018-05-31T00:15:57.63331754Z caller=compact.go:126 msg="start sync of metas"

Can see for @TimSimmons logs that it happens quite often. We should retry just the problematic thing.

asbjxrn · 2018-07-02T05:36:54Z

This happens both for downloads and uploads (of compacted blocks) for me. And I also see the same timeouts when uploading from the sidecars, so I think this issue applies to all components that communicates with the block store.

With a large enough number of thanos sidecars this issue can be quite bad as once you fall behind the number of files up/downloaded gets large which means higher chance of hitting the issue which may put you even further behind and so on.

bwplotka · 2018-07-02T07:38:48Z

yup, exactly.

Add backoff reply for a single object storage query request, except Range and Iter methods. Error handler splits errors on net/http and others, and replies the request to the object storage for the former. Fixes thanos-io#318

Add backoff reply for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the request to the object storage for the former. Fixes thanos-io#318

Add backoff retry for a single object storage request, except Range and Iter. Error handler splits errors on net/http and others, and replies the request to the object storage for the former. Fixes thanos-io#318

bwplotka · 2018-11-15T11:29:43Z

Ok this is interesting as s3 client really have retries: https://sourcegraph.com/github.com/minio/minio-go@master/-/blob/api.go#L524:17

Maybe it's worth to reach them?

bwplotka · 2019-02-06T01:32:51Z

We double checked and retries are already implemented in minio and GCS client. For each client we need to double check and add if missing (per client).

swollo · 2019-03-13T16:25:36Z

@bwplotka this still seems to happen in v0.3.1. The behavior I see is that the timeout occurs, not exactly sure whether the retry is triggered within minio or not, but the compactor exits and restarts. I'd assume that on restart it's cleaning the compaction directory and effectively starting from 0 again

realdimas · 2019-03-13T17:23:11Z

@bwplotka we are observing net/http: timeout awaiting response headers about every 15 mins.

Compactor is running without --wait so this error force the whole run to fail and retry.

Setup:

S3 bucket with ~1-2 TB of data
200 sidecars uploading chunks
--retention.resolution-raw=10d
--retention.resolution-5m=15d
--retention.resolution-1h=30d
release v0.3.2

Logs

thanos-compactor-1552487171-vwdk5   0/1     Error              0          123m
thanos-compactor-1552487171-8t5ff   0/1     Error              0          88m
thanos-compactor-1552487171-qg22k   0/1     Error              0          76m
thanos-compactor-1552487171-p8xvm   0/1     Error              0          67m
thanos-compactor-1552487171-gxbx5   0/1     Error              0          36m
thanos-compactor-1552487171-w4xdp   0/1     Error              0          23m

thanos-compactor-1552487171-vwdk5

level=debug ts=2019-03-13T15:00:19.251252149Z caller=compact.go:721 compactionGroup="0@{REDACTED}" msg="downloaded and verified blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D55WPAZ0X59B623QNRBNSZW5 /tmp/thanos-compact/0@{REDACTED}/01D563J26MVFDGYVW9J01ZRV0F /tmp/thanos-compact/0@{REDACTED}/01D56ADSEMK3D2EQNR3MH4SAS5 /tmp/thanos-compact/0@{REDACTED}/01D56H9GPMTGPV539Y1Q5PK7SK]" duration=26.415816673s
level=info ts=2019-03-13T15:00:56.248042705Z caller=compact.go:391 msg="compact blocks" count=4 mint=1551744000000 maxt=1551772800000 ulid=01D5VS31C57FPR99W06ZNA2ZFC sources="[01D55WPAZ0X59B623QNRBNSZW5 01D563J26MVFDGYVW9J01ZRV0F 01D56ADSEMK3D2EQNR3MH4SAS5 01D56H9GPMTGPV539Y1Q5PK7SK]" duration=36.996722382s
level=debug ts=2019-03-13T15:00:56.316490692Z caller=compact.go:730 compactionGroup="0@{REDACTED}" msg="compacted blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D55WPAZ0X59B623QNRBNSZW5 /tmp/thanos-compact/0@{REDACTED}/01D563J26MVFDGYVW9J01ZRV0F /tmp/thanos-compact/0@{REDACTED}/01D56ADSEMK3D2EQNR3MH4SAS5 /tmp/thanos-compact/0@{REDACTED}/01D56H9GPMTGPV539Y1Q5PK7SK]" duration=37.065172117s
level=error ts=2019-03-13T15:01:14.050367663Z caller=main.go:181 msg="running command failed" err="compaction failed: compaction: upload of 01D5VS31C57FPR99W06ZNA2ZFC failed: upload chunks: upload file /tmp/thanos-compact/0@{REDACTED}/01D5VS31C57FPR99W06ZNA2ZFC/chunks/000001 as 01D5VS31C57FPR99W06ZNA2ZFC/chunks/000001: upload s3 object: Put https://REDACTED.s3.dualstack.eu-west-1.amazonaws.com/01D5VS31C57FPR99W06ZNA2ZFC/chunks/000001?partNumber=3&uploadId=REDACTED: net/http: timeout awaiting response headers"

thanos-compactor-1552487171-8t5ff

level=debug ts=2019-03-13T15:12:59.720173348Z caller=compact.go:721 compactionGroup="0@{REDACTED}" msg="downloaded and verified blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D5DKWGVBC0ESEBBV9B0ZSRSB /tmp/thanos-compact/0@{REDACTED}/01D5DTR8KCXK3CAZDQWBPSRBBD /tmp/thanos-compact/0@{REDACTED}/01D5E1KZRRXAME2EMDCC2PBS52 /tmp/thanos-compact/0@{REDACTED}/01D5E8FPYGTD4C6WMKWVPHKTYZ]" duration=3.043312731s
level=info ts=2019-03-13T15:13:02.457205517Z caller=compact.go:391 msg="compact blocks" count=4 mint=1552003200000 maxt=1552032000000 ulid=01D5VST7TMJ1NTP7NE4MPJ7FP4 sources="[01D5DKWGVBC0ESEBBV9B0ZSRSB 01D5DTR8KCXK3CAZDQWBPSRBBD 01D5E1KZRRXAME2EMDCC2PBS52 01D5E8FPYGTD4C6WMKWVPHKTYZ]" duration=2.736976322s
level=debug ts=2019-03-13T15:13:02.470046947Z caller=compact.go:730 compactionGroup="0@{REDACTED}" msg="compacted blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D5DKWGVBC0ESEBBV9B0ZSRSB /tmp/thanos-compact/0@{REDACTED}/01D5DTR8KCXK3CAZDQWBPSRBBD /tmp/thanos-compact/0@{REDACTED}/01D5E1KZRRXAME2EMDCC2PBS52 /tmp/thanos-compact/0@{REDACTED}/01D5E8FPYGTD4C6WMKWVPHKTYZ]" duration=2.749819539s
level=error ts=2019-03-13T15:13:18.284475363Z caller=main.go:181 msg="running command failed" err="compaction failed: compaction: upload of 01D5VST7TMJ1NTP7NE4MPJ7FP4 failed: upload chunks: upload file /tmp/thanos-compact/0@{REDACTED}/01D5VST7TMJ1NTP7NE4MPJ7FP4/chunks/000001 as 01D5VST7TMJ1NTP7NE4MPJ7FP4/chunks/000001: upload s3 object: Put https://REDACTED.s3.dualstack.eu-west-1.amazonaws.com/01D5VST7TMJ1NTP7NE4MPJ7FP4/chunks/000001?partNumber=2&uploadId=REDACTED: net/http: timeout awaiting response headers"

thanos-compactor-1552487171-qg22k

level=debug ts=2019-03-13T15:21:43.531599137Z caller=compact.go:721 compactionGroup="0@{REDACTED}" msg="downloaded and verified blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D58F32FYX0WBN5T6CTRZH0SG /tmp/thanos-compact/0@{REDACTED}/01D58NYSRKDHK69QPFEKXPR5H1 /tmp/thanos-compact/0@{REDACTED}/01D58WTGVPV9P54PEYZN61Y2MC /tmp/thanos-compact/0@{REDACTED}/01D593P88DJC3V3RJJF9NDDKNB]" duration=2.769941168s
level=info ts=2019-03-13T15:21:46.956397106Z caller=compact.go:391 msg="compact blocks" count=4 mint=1551830400000 maxt=1551859200000 ulid=01D5VTA7BRA7JGNGA7YGWTPBP1 sources="[01D58F32FYX0WBN5T6CTRZH0SG 01D58NYSRKDHK69QPFEKXPR5H1 01D58WTGVPV9P54PEYZN61Y2MC 01D593P88DJC3V3RJJF9NDDKNB]" duration=3.424717733s
level=debug ts=2019-03-13T15:21:46.966418662Z caller=compact.go:730 compactionGroup="0@{REDACTED}" msg="compacted blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D58F32FYX0WBN5T6CTRZH0SG /tmp/thanos-compact/0@{REDACTED}/01D58NYSRKDHK69QPFEKXPR5H1 /tmp/thanos-compact/0@{REDACTED}/01D58WTGVPV9P54PEYZN61Y2MC /tmp/thanos-compact/0@{REDACTED}/01D593P88DJC3V3RJJF9NDDKNB]" duration=3.434741475s
level=error ts=2019-03-13T15:22:02.754889288Z caller=main.go:181 msg="running command failed" err="compaction failed: compaction: upload of 01D5VTA7BRA7JGNGA7YGWTPBP1 failed: upload chunks: upload file /tmp/thanos-compact/0@{REDACTED}/01D5VTA7BRA7JGNGA7YGWTPBP1/chunks/000001 as 01D5VTA7BRA7JGNGA7YGWTPBP1/chunks/000001: upload s3 object: Put https://REDACTED.s3.dualstack.eu-west-1.amazonaws.com/01D5VTA7BRA7JGNGA7YGWTPBP1/chunks/000001?partNumber=1&uploadId=REDACTED: net/http: timeout awaiting response headers"

thanos-compactor-1552487171-p8xvm

level=debug ts=2019-03-13T15:49:56.838470615Z caller=compact.go:721 compactionGroup="0@{REDACTED}" msg="downloaded and verified blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D57KM4WNZV4N098VV5Z9FW85 /tmp/thanos-compact/0@{REDACTED}/01D57TFW4GZ8SREPC5HASRT7J2 /tmp/thanos-compact/0@{REDACTED}/01D581BKD3WY77J8X45W01N0WV /tmp/thanos-compact/0@{REDACTED}/01D5887AMGDTN1FJQ0GQZDK20A]" duration=1m38.510479563s
level=info ts=2019-03-13T15:52:23.551514954Z caller=compact.go:391 msg="compact blocks" count=4 mint=1551801600000 maxt=1551830400000 ulid=01D5VVXZT9QSR5SXWDE900CKVD sources="[01D57KM4WNZV4N098VV5Z9FW85 01D57TFW4GZ8SREPC5HASRT7J2 01D581BKD3WY77J8X45W01N0WV 01D5887AMGDTN1FJQ0GQZDK20A]" duration=2m26.71297677s
level=debug ts=2019-03-13T15:52:23.828198304Z caller=compact.go:730 compactionGroup="0@{REDACTED}" msg="compacted blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D57KM4WNZV4N098VV5Z9FW85 /tmp/thanos-compact/0@{REDACTED}/01D57TFW4GZ8SREPC5HASRT7J2 /tmp/thanos-compact/0@{REDACTED}/01D581BKD3WY77J8X45W01N0WV /tmp/thanos-compact/0@{REDACTED}/01D5887AMGDTN1FJQ0GQZDK20A]" duration=2m26.989662374s
level=error ts=2019-03-13T15:52:50.788765966Z caller=main.go:181 msg="running command failed" err="compaction failed: compaction: upload of 01D5VVXZT9QSR5SXWDE900CKVD failed: upload chunks: upload file /tmp/thanos-compact/0@{REDACTED}/01D5VVXZT9QSR5SXWDE900CKVD/chunks/000003 as 01D5VVXZT9QSR5SXWDE900CKVD/chunks/000003: upload s3 object: Put https://REDACTED.s3.dualstack.eu-west-1.amazonaws.com/01D5VVXZT9QSR5SXWDE900CKVD/chunks/000003?partNumber=3&uploadId=REDACTED: net/http: timeout awaiting response headers"

thanos-compactor-1552487171-gxbx5

level=debug ts=2019-03-13T16:03:26.488516059Z caller=compact.go:721 compactionGroup="0@{REDACTED}" msg="downloaded and verified blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D5CRDJWX3KXRYR9KH8XHCY7S /tmp/thanos-compact/0@{REDACTED}/01D5CZ9A4X2BN6HWYKD873V53R /tmp/thanos-compact/0@{REDACTED}/01D5D651D71TN0YVFWKC028XPH /tmp/thanos-compact/0@{REDACTED}/01D5DD0RPM576T0B4YMB704BNR]" duration=2.300868771s
level=info ts=2019-03-13T16:03:27.988367016Z caller=compact.go:391 msg="compact blocks" count=4 mint=1551974400000 maxt=1552003200000 ulid=01D5VWPKN6M9702HJH263GKC7C sources="[01D5CRDJWX3KXRYR9KH8XHCY7S 01D5CZ9A4X2BN6HWYKD873V53R 01D5D651D71TN0YVFWKC028XPH 01D5DD0RPM576T0B4YMB704BNR]" duration=1.499793223s
level=debug ts=2019-03-13T16:03:27.995430305Z caller=compact.go:730 compactionGroup="0@{REDACTED}" msg="compacted blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D5CRDJWX3KXRYR9KH8XHCY7S /tmp/thanos-compact/0@{REDACTED}/01D5CZ9A4X2BN6HWYKD873V53R /tmp/thanos-compact/0@{REDACTED}/01D5D651D71TN0YVFWKC028XPH /tmp/thanos-compact/0@{REDACTED}/01D5DD0RPM576T0B4YMB704BNR]" duration=1.506860238s
level=error ts=2019-03-13T16:03:43.388278661Z caller=main.go:181 msg="running command failed" err="compaction failed: compaction: upload of 01D5VWPKN6M9702HJH263GKC7C failed: upload chunks: upload file /tmp/thanos-compact/0@{REDACTED}/01D5VWPKN6M9702HJH263GKC7C/chunks/000001 as 01D5VWPKN6M9702HJH263GKC7C/chunks/000001: upload s3 object: Put https://REDACTED.s3.dualstack.eu-west-1.amazonaws.com/01D5VWPKN6M9702HJH263GKC7C/chunks/000001?partNumber=2&uploadId=REDACTED: net/http: timeout awaiting response headers"

thanos-compactor-1552487171-w4xdp

level=debug ts=2019-03-13T16:19:22.895728181Z caller=compact.go:721 compactionGroup="0@{REDACTED}" msg="downloaded and verified blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D5EFBD320884B27184AGYDQ6 /tmp/thanos-compact/0@{REDACTED}/01D5EP74B77CBDEV6M3B0JG4Z3 /tmp/thanos-compact/0@{REDACTED}/01D5EX2VK8H1DSCAR55MA6DT0S /tmp/thanos-compact/0@{REDACTED}/01D5F3YJMAF3884QWFEX076VXY]" duration=26.174313049s
level=info ts=2019-03-13T16:19:59.160326436Z caller=compact.go:391 msg="compact blocks" count=4 mint=1552032000000 maxt=1552060800000 ulid=01D5VXKVKD992DWAN2AGY2BEZG sources="[01D5EFBD320884B27184AGYDQ6 01D5EP74B77CBDEV6M3B0JG4Z3 01D5EX2VK8H1DSCAR55MA6DT0S 01D5F3YJMAF3884QWFEX076VXY]" duration=36.264544218s
level=debug ts=2019-03-13T16:19:59.242527786Z caller=compact.go:730 compactionGroup="0@{REDACTED}" msg="compacted blocks" blocks="[/tmp/thanos-compact/0@{REDACTED}/01D5EFBD320884B27184AGYDQ6 /tmp/thanos-compact/0@{REDACTED}/01D5EP74B77CBDEV6M3B0JG4Z3 /tmp/thanos-compact/0@{REDACTED}/01D5EX2VK8H1DSCAR55MA6DT0S /tmp/thanos-compact/0@{REDACTED}/01D5F3YJMAF3884QWFEX076VXY]" duration=36.346748295s
level=error ts=2019-03-13T16:20:24.118422931Z caller=main.go:181 msg="running command failed" err="compaction failed: compaction: upload of 01D5VXKVKD992DWAN2AGY2BEZG failed: upload chunks: upload file /tmp/thanos-compact/0@{REDACTED}/01D5VXKVKD992DWAN2AGY2BEZG/chunks/000003 as 01D5VXKVKD992DWAN2AGY2BEZG/chunks/000003: upload s3 object: Put https://REDACTED.s3.dualstack.eu-west-1.amazonaws.com/01D5VXKVKD992DWAN2AGY2BEZG/chunks/000003?partNumber=2&uploadId=REDACTED: net/http: timeout awaiting response headers"

bwplotka · 2019-03-13T18:00:48Z

So this is essentially connected to minio library.. If you are getting timeout seems like we should look on the reasons why..Are blocks too big? Is there anyway we can adjust minio library (https://github.com/minio/minio-go) to improve that?

Retry is already in place, minio should handle retries. But if you are getting timeout for retries even... Not sure if masking your issue with another retry is a good solution here (:

bwplotka · 2019-03-13T18:01:32Z

one way is to actually grab a single bigger block that fails constantly and try upload it on your own using mc (minio client) - and adjust things there so you know how to adjust it on prod.

swollo · 2019-03-13T18:58:46Z

I'll give that a try. I believe the some directories from the compaction are large, > 100 GBs. I'll have to do some digging.
However, in logs that @forkbomber posted, we see that the upload fails with an error log. I'm new to Go, but from my understanding, this should only log a warning and then retry, unless I'm understanding something wrong:

https://github.com/improbable-eng/thanos/blob/2b8669265ebcb3b60bbf4dadb887a504a4bfa56e/pkg/compact/compact.go#L575

realdimas · 2019-03-13T19:02:37Z

@bwplotka in our case these are all different blocks each time. It does succeed, but at times after a handful of cronjob restarts caused by net/http: timeout awaiting response headers.

bwplotka · 2019-03-14T17:42:28Z

@GiedriusS re: #923 (comment)

See this issue here, but what's the point of retrying if the underlying client provider lib retries for us?
Essentialy:

they have more control, they can actually split request into actual multi upload and retry only those etc

The only problem is when the library we use has this logic broken, I think we should propagate this issue to them. Double retrying is not a solution.

GiedriusS · 2019-03-14T17:52:41Z

Oh, sorry, haven't seen this since it was closed. Could we rename the title to be a bit more generic because this affects not only compactor but sidecar as well? :P Yes, I agree that this should be delegated to the underlying libraries that we use but perhaps we could think of some kind even smarter solution like double checking what (if any) files were uploaded to remote storage, and to retry uploading only those files if they are still present on the disk.

bwplotka · 2019-03-14T18:14:55Z

I would follow up for every issue to the underlying provider and make them better. If we will be really hit by this we can still evaluate that bit, but in perfect world (open source world) we should not do it unless provider states that.

E.g how we can tell if the error is even retriable? It does not make sense to retry always (500, 403,404 etc)

SuperQ · 2019-03-18T12:04:47Z

Related: #934

xjewer · 2019-04-12T15:12:54Z

just rolled back to the build with v0.20, will see, how it works

for some reason, I see uploaded blocks with corrupted state (missing files, could be index file or chunk files)

Example of one block with index file being absent:

mc ls -r s3/thanos/01D84YH6M4MPG0JZ6M9C411B72/
[2019-04-11 01:03:48 BST] 512MiB chunks/000001
[2019-04-11 01:03:51 BST] 512MiB chunks/000002
[2019-04-11 01:04:10 BST] 512MiB chunks/000003
[2019-04-11 01:04:19 BST] 512MiB chunks/000004
[2019-04-11 01:04:26 BST] 512MiB chunks/000005
[2019-04-11 01:04:35 BST] 113MiB chunks/000006
[2019-04-11 01:05:05 BST]   453B meta.json

Example of logs from radosgw:

[11/Apr/2019:01:03:40 +0000] "HEAD /thanos/01D84YH6M4MPG0JZ6M9C411B72/meta.json HTTP/1.1" 404 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:40 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:40 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:40 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:40 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:40 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:40 +0000] "PUT /thanos/debug/metas/01D84YH6M4MPG0JZ6M9C411B72.json HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:42 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:43 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:43 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:44 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:46 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000001 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:47 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:47 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:47 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:47 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:47 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:48 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:49 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:50 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:51 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:54 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000002 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:55 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:56 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:56 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:56 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:56 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:58 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:03:59 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:04 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:04 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:08 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000003 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:09 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:09 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:09 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:09 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:09 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:12 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:12 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:12 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:13 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:19 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000004 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:19 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:20 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:20 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:20 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:20 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:21 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:21 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:21 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:22 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:26 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000005 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:26 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/chunks/000006 HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:35 +0000] "POST /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:35 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:35 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:35 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:35 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:38 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:38 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:39 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:40 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:41 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:42 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:42 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:44 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:45 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:49 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:50 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:53 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:54 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 400 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:04:55 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:05:01 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 400 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:05:02 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:05:04 +0000] "DELETE /thanos/01D84YH6M4MPG0JZ6M9C411B72/index HTTP/1.1" 204 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)
[11/Apr/2019:01:05:04 +0000] "PUT /thanos/01D84YH6M4MPG0JZ6M9C411B72/meta.json HTTP/1.1" 200 0 - Minio (linux; amd64) minio-go/v6.0.16 thanos-sidecar/0.3.2 (go1.12)

Alexvianet · 2019-04-18T15:34:46Z

Also have the similar issue with sidecar and compact release v0.3.2, s3 provider.
prometheus2/c56c3589-a2f1-4d22-aa7a-dfaf9b05cecf: stdout | level=error ts=2019-04-18T15:03:01.700845076Z caller=shipper.go:342 msg="shipping failed" block=01D8RFCAP40FQ0HDCM3YVP3737 err="failed to clean block after upload issue. Partial block in system. Err: upload meta file: upload file /var/vcap/store/prometheus2/thanos/upload/01D8RFCAP40FQ0HDCM3YVP3737/meta.json as 01D8RFCAP40FQ0HDCM3YVP3737/meta.json: upload s3 object: Put https://s3/thanos-dc20-prod/01D8RFCAP40FQ0HDCM3YVP3737/meta.json: net/http: timeout awaiting response headers: upload meta file: upload file /var/vcap/store/prometheus2/thanos/upload/01D8RFCAP40FQ0HDCM3YVP3737/meta.json as 01D8RFCAP40FQ0HDCM3YVP3737/meta.json: upload s3 object: Put https://s3/thanos-dc20-prod/01D8RFCAP40FQ0HDCM3YVP3737/meta.json: net/http: timeout awaiting response headers" thanos_compactor/7646e8e8-62d6-418e-8ab9-319ec593cb56: stdout | level=error ts=2019-04-18T15:05:39.924223397Z caller=compact.go:265 msg="retriable error" err="compaction failed: sync: retrieve bucket block metas: Get https://s3/thanos-dc20-prod/?delimiter=%2F&max-keys=1000&prefix=: net/http: timeout awaiting response headers"

drax68 · 2019-04-23T15:38:30Z

Same issue with rc release and 400+Gb block upload. Compactor fails with "net/http: timeout await
ing response headers" and retries whole compaction for that group. It's quite inefficient and generates large amount of traffic.

bwplotka · 2019-04-23T15:54:33Z

Guys, can you make sure to mention:

What provider are you using
Thanos version
How are you sure that no retry actually happen. It might be that provider code attempted couple of times and gave up (which might indicate massive network disconnectivity issues)

Otherwise it is not much helpful ):

Ideally we would like to focus on each provider separatedly

drax68 · 2019-04-23T17:09:30Z

s3
0.4.0-rc.0
Logs from the time when it's happened, nothing more:

thanos_compactor[16064]: level=info ts=2019-04-23T12:03:21.942003329Z caller=compact.go:441 msg="compact blocks" count=7 mint=1554336000000 maxt=1555545600000 ulid=01D94MRNE52W85WSVD9RG6PS07 sources="[01D7S0BWZR5MNZFWVSX3RY52CP 01D7XVKN5AMQ1J40GRMK99NN2Q 01D82QNH8NHZ9T6Y8B7MXSE4PS 01D87XMJ3QK1SSQJ9GCR8CQFJB 01D8D77WAM0W1T7XPD24Y2NPXM 01D8JDC78D23WR7A67HAQCDYPR 01D930ZBMBD33HBZRGYAHH7XHT]" duration=3h38m47.722834567s
thanos_compactor[16064]: level=error ts=2019-04-23T12:24:23.825585304Z caller=main.go:182 msg="running command failed" err="error executing compaction: compaction failed: compaction failed for group 0@{prometheus_node=\"1234\",prometheus_stack_name=\"stack1\"}: upload of 01D94MRNE52W85WSVD9RG6PS07 failed: upload chunks: upload file /opt/thanos/compact/compact/0@{prometheus_node=\"1234\",prometheus_stack_name=\"stack1\"}/01D94MRNE52W85WSVD9RG6PS07/chunks/000075 as 01D94MRNE52W85WSVD9RG6PS07/chunks/000075: upload s3 object: Put https://s3_bucket/01D94MRNE52W85WSVD9RG6PS07/chunks/000075?partNumber=2&uploadId=asdzxc: net/http: timeout awaiting response headers"

antonio · 2019-04-25T23:08:14Z

This is also happening constantly to me, on S3, with version 0.3.1.

I've spent some time today debugging the issue and I believe it might have been caused by #323 : likely the 15 seconds timeout that were set in that PR are not enough for large blocks.

I'm testing a custom version in which I've increased the timeout to 2 minutes (🤷‍♂️ 😄) and so far I haven't seen any issues in a couple hours, where it used to fail every 5-10 minutes. I'll leave a few compacting processes running over the night and will report back tomorrow with the results.

antonio · 2019-04-26T10:40:19Z

I'll leave a few compacting processes running over the night and will report back tomorrow with the results.

All the processes are still working correctly after 12 hours.

antonio · 2019-04-29T11:04:52Z

I'm testing a custom version in which I've increased the timeout to 2 minutes

I haven't experienced a single error in the last 4 days. @bwplotka I'd be happy to contribute a patch for the timeout awaiting response headers issue, but I'd like to ask what your preferred option would be: to simply increase it to another arbitrary value (e.g. 2 minutes) or to add a configuration flag. The latter is more flexible, but at the same time adds complexity without (imho) adding much value. Pinging @alvaroaleman too as the creator of #323

alvaroaleman · 2019-04-29T11:12:33Z

The headers are the first thing sent, admittedly the 10s we currently use are a bit little, but if you don't get them within two whole minutes I'd say you can safely assume they wont come later.

kadern0 · 2019-04-29T14:39:47Z

I'm testing a custom version in which I've increased the timeout to 2 minutes

I haven't experienced a single error in the last 4 days. @bwplotka I'd be happy to contribute a patch for the timeout awaiting response headers issue, but I'd like to ask what your preferred option would be: to simply increase it to another arbitrary value (e.g. 2 minutes) or to add a configuration flag. The latter is more flexible, but at the same time adds complexity without (imho) adding much value. Pinging @alvaroaleman too as the creator of #323

I would'n mind trying your approach, as I'm facing exactly the same issue. Could you share your changes ? Gracias.

antonio · 2019-04-29T16:21:46Z

@kadern0 I've included my change in #1094

cspargo · 2019-05-16T01:01:17Z

I see the s3 header timeout issue with 0.3.2 and 0.4.0. One thing I noticed when I was trying 0.4.0 is that when the timeout happens, the thanos compactor process exits and needs to be restarted. With 0.3.2 it does not exit, and just loops and tries again. Is this an expected changed in behaviour in 0.4.0?

Allex1 · 2019-06-19T07:11:00Z

S3
thanos 0.5.0

Jun 19 05:46:05 HOST: level=error ts=2019-06-19T05:46:05.073782433Z caller=main.go:182 msg="running command failed" err="error executing compaction: compaction failed: compaction failed for group 0@{monitor=\"master\",replica=\"1\"}: upload of 01DDQ452TMKEX341JMXVXV90N4 failed: upload chunks: upload file /var/lib/thanos-compact/compact/0@{monitor=\"master\",replica=\"1\"}/01DDQ452TMKEX341JMXVXV90N4/chunks/000005 as 01DDQ452TMKEX341JMXVXV90N4/chunks/000005: upload s3 object: Put https://s3bucket.s3.dualstack.us-east-1.amazonaws.com/01DDQ452TMKEX341JMXVXV90N4/chunks/000005?partNumber=8&uploadId=xxx--: net/http: timeout awaiting response headers"

#1094 which was merged before the 0.5.0 release doesn't seem to fix it for us

daixiang0 · 2020-01-09T02:04:32Z

@Allex1 hi, would you like to try with v0.10 rc?

Allex1 · 2020-01-09T10:02:50Z

@daixiang0 I haven't seen this error since upgrading to v0.8.1
Thanks

daixiang0 · 2020-01-09T10:43:07Z

@bwplotka seems we can close it safely.

bwplotka added feature request/improvement difficulty: easy difficulty: medium and removed difficulty: easy labels May 2, 2018

asbjxrn mentioned this issue Jun 28, 2018

compact: Get rid of syncDelay and handle eventual consistency & partial upload differently. #377

Closed

bwplotka mentioned this issue Jul 3, 2018

*: Fixed further not checked errors [PART2] #403

Merged

xjewer mentioned this issue Sep 26, 2018

Handle network errors and retry upload/download data with backoff to/from object storage #537

Closed

bwplotka closed this as completed Feb 6, 2019

bwplotka mentioned this issue Mar 14, 2019

sidecar: improve upload error handling logic / add retries #923

Closed

bwplotka reopened this Mar 14, 2019

bwplotka changed the title ~~compactor: Retry on network failures~~ Retry on network failures (e.g uploads) Mar 14, 2019

bwplotka added component: compact component: rule component: sidecar labels Mar 14, 2019

antonio mentioned this issue Apr 29, 2019

objstore: Increase the response header timeout in the S3 provider to 2 minutes and make it configurable #1094

Merged

xjewer mentioned this issue May 9, 2019

Compactor: remove malformed blocks after delay #1053

Merged

wbh1 mentioned this issue Jun 13, 2019

bucket/tooling: add support for retries in Azure Blob storage #1252

Closed

bwplotka closed this as completed Jan 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry on network failures (e.g uploads) #318

Retry on network failures (e.g uploads) #318

bwplotka commented May 2, 2018 •

edited

Loading

bwplotka commented Jun 1, 2018

asbjxrn commented Jul 2, 2018

bwplotka commented Jul 2, 2018

bwplotka commented Nov 15, 2018

bwplotka commented Feb 6, 2019 •

edited

Loading

swollo commented Mar 13, 2019

realdimas commented Mar 13, 2019 •

edited

Loading

bwplotka commented Mar 13, 2019

bwplotka commented Mar 13, 2019

swollo commented Mar 13, 2019 •

edited

Loading

realdimas commented Mar 13, 2019

bwplotka commented Mar 14, 2019 •

edited

Loading

GiedriusS commented Mar 14, 2019

bwplotka commented Mar 14, 2019 •

edited

Loading

SuperQ commented Mar 18, 2019

xjewer commented Apr 12, 2019 •

edited

Loading

Alexvianet commented Apr 18, 2019 •

edited

Loading

drax68 commented Apr 23, 2019

bwplotka commented Apr 23, 2019 •

edited

Loading

drax68 commented Apr 23, 2019

antonio commented Apr 25, 2019

antonio commented Apr 26, 2019

antonio commented Apr 29, 2019

alvaroaleman commented Apr 29, 2019

kadern0 commented Apr 29, 2019

antonio commented Apr 29, 2019

cspargo commented May 16, 2019

Allex1 commented Jun 19, 2019 •

edited

Loading

daixiang0 commented Jan 9, 2020

Allex1 commented Jan 9, 2020

daixiang0 commented Jan 9, 2020

Retry on network failures (e.g uploads) #318

Retry on network failures (e.g uploads) #318

Comments

bwplotka commented May 2, 2018 • edited Loading

bwplotka commented Jun 1, 2018

asbjxrn commented Jul 2, 2018

bwplotka commented Jul 2, 2018

bwplotka commented Nov 15, 2018

bwplotka commented Feb 6, 2019 • edited Loading

swollo commented Mar 13, 2019

realdimas commented Mar 13, 2019 • edited Loading

bwplotka commented Mar 13, 2019

bwplotka commented Mar 13, 2019

swollo commented Mar 13, 2019 • edited Loading

realdimas commented Mar 13, 2019

bwplotka commented Mar 14, 2019 • edited Loading

GiedriusS commented Mar 14, 2019

bwplotka commented Mar 14, 2019 • edited Loading

SuperQ commented Mar 18, 2019

xjewer commented Apr 12, 2019 • edited Loading

Alexvianet commented Apr 18, 2019 • edited Loading

drax68 commented Apr 23, 2019

bwplotka commented Apr 23, 2019 • edited Loading

drax68 commented Apr 23, 2019

antonio commented Apr 25, 2019

antonio commented Apr 26, 2019

antonio commented Apr 29, 2019

alvaroaleman commented Apr 29, 2019

kadern0 commented Apr 29, 2019

antonio commented Apr 29, 2019

cspargo commented May 16, 2019

Allex1 commented Jun 19, 2019 • edited Loading

daixiang0 commented Jan 9, 2020

Allex1 commented Jan 9, 2020

daixiang0 commented Jan 9, 2020

bwplotka commented May 2, 2018 •

edited

Loading

bwplotka commented Feb 6, 2019 •

edited

Loading

realdimas commented Mar 13, 2019 •

edited

Loading

swollo commented Mar 13, 2019 •

edited

Loading

bwplotka commented Mar 14, 2019 •

edited

Loading

bwplotka commented Mar 14, 2019 •

edited

Loading

xjewer commented Apr 12, 2019 •

edited

Loading

Alexvianet commented Apr 18, 2019 •

edited

Loading

bwplotka commented Apr 23, 2019 •

edited

Loading

Allex1 commented Jun 19, 2019 •

edited

Loading