Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thanos compactor crashes with "write compaction: chunk 8 not found: reference sequence 0 out of range" #1300

Closed
bjakubski opened this issue Jul 3, 2019 · 21 comments · Fixed by #1937

Comments

@bjakubski
Copy link

Thanos, Prometheus and Golang version used

thanos, version 0.5.0 (branch: HEAD, revision: 72820b3)
build user: circleci@eeac5eb36061
build date: 20190606-10:53:12
go version: go1.12.5
What happened

thanos compactor crashes with ""write compaction: chunk 8 not found: reference sequence 0 out of range"

What you expected to happen

Should work fine :-)

How to reproduce it (as minimally and precisely as possible):

Not sure :-/

Full logs to relevant components

Out of the list of objects dumped along with error message I've found one without chunks

$  gsutil ls -r gs://REDACTED/01DBZNNTM2557YW8T35RBM676P
gs://REDACTED/01DBZNNTM2557YW8T35RBM676P/:
gs://REDACTED/01DBZNNTM2557YW8T35RBM676P/index
gs://REDACTED/01DBZNNTM2557YW8T35RBM676P/meta.json

meta.json contents:

{
	"version": 1,
	"ulid": "01DBZNNTM2557YW8T35RBM676P",
	"minTime": 1557993600000,
	"maxTime": 1558022400000,
	"stats": {
		"numSamples": 35591778,
		"numSeries": 39167,
		"numChunks": 299446
	},
	"compaction": {
		"level": 2,
		"sources": [
			"01DB04S079GCEBKMTWZBH8HQA3",
			"01DB0BMQG3W7M12M8DE3V9QW5C",
			"01DB0JGEQD5RCZ50JAS2NENHQ6",
			"01DB0SC6071QW08JWQVG000AKF"
		],
		"parents": [
			{
				"ulid": "01DB04S079GCEBKMTWZBH8HQA3",
				"minTime": 1557993600000,
				"maxTime": 1558000800000
			},
			{
				"ulid": "01DB0BMQG3W7M12M8DE3V9QW5C",
				"minTime": 1558000800000,
				"maxTime": 1558008000000
			},
			{
				"ulid": "01DB0JGEQD5RCZ50JAS2NENHQ6",
				"minTime": 1558008000000,
				"maxTime": 1558015200000
			},
			{
				"ulid": "01DB0SC6071QW08JWQVG000AKF",
				"minTime": 1558015200000,
				"maxTime": 1558022400000
			}
		]
	},
	"thanos": {
		"labels": {
			"environment": "devint",
			"instance_number": "1",
			"location": "REDACTED",
			"prometheus": "monitoring/prometheus-operator-prometheus",
			"prometheus_replica": "prometheus-prometheus-operator-prometheus-1",
			"stack": "data"
		},
		"downsample": {
			"resolution": 0
		},
		"source": "compactor"
	}
}

It is an object created by compactor apparently.

We've been running compactor for some time, but after a while (due to lack of local disk storage) it was crashing constantly. After an extended period of crashing storage has been added and compactor was able to go further until it encountered problem described here.

I guess important part is how such object ended up in bucket, although I wonder if it is possible for thanos to ignore such objects and keep processing data rest of data? (exposing data about bad objects in metrics)

I'd guess that it was somehow created during constant crashes we've had earlier, but have nothing to support that.

Anything else we need to know

#688 describes similar issue, although it is about much older thanos than we use here.
We've been running 0.5 and 0.4 before that. I'm not sure, but it is possible that 0.3.2 (compactor) was used at the beginning.

@GiedriusS
Copy link
Member

Sounds like Thanos Compact has crashed midway during the upload and you got inconsistent data. Nowadays Thanos Compact removes old, malformed data that is older than the maximum of consistency delay and the minimum age for removal (30 minutes). Does that not work for you for some reason, @bjakubski?

@GiedriusS
Copy link
Member

Also, Thanos Compact now has metrics about failed compactions (in 0.6.0 which is soon to be released) so you can use that :P

@bjakubski
Copy link
Author

I've just encountered the same problem on another object in bucket, but this time it was created by sidecar. I do not see any message in logs about sidecar ever uploading it, but this object is older (> week old) but according to GCP storage it was just created yesterday. I see messages in prometheus logs
2019-06-29 "write block"
2019-07-02 "found healthy block" (probably after prom restart)
2019-07-10 21:31 "found healthy block) (after prom restart)

object in bucket seems to have been created at 2019-07-10 21:25 - which is just after one of thanos sidecars successfully connected to prometheus (after pod restart). In logs I can see it uploading various objects, but nothing about this particular ulid

Now: I'm running unusual setup with thanos-sidecar: prometheus compaction is disabled, but retention is quite long: 15d IIRC It generally seems to work fine, although I wonder if it might cause problems like I have here?

What is the consistency delay? Should I've been expecting compactor to ignore/remove this object from bucket?

@zhaopf1112
Copy link

Also I met the same question about this , the error log is write compaction: chunk 8 not found: reference sequence 0 out of range, i found the compactor pods always crashoff , and then restart pod again, below is the pod information. and i don't know how to fix this question, could anyone know the issue?

prometheus-operator-57fc4686dc-gw8st   1/1     Running   2          17d
thanos-compactor-0                     1/1     Running   162        24h
thanos-query-596fd87fcc-2tmx8          1/1     Running   0          24h
thanos-store-0                         1/1     Running   0          24h

@kklimonda
Copy link

I'm seeing the same error for blocks uploaded with thanos-compactor 0.6.0 (and then processed by 0.6.0) myself. Backend storage is Ceph cluster via Swift API.

thanos, version 0.6.0 (branch: HEAD, revision: c70b80eb83e52f5013ed5ffab72464989c132883)
  build user:       root@fd15c8276e5c
  build date:       20190718-11:11:58
  go version:       go1.12.5

thanos-compactor has uploaded a compacted block yesterday:

Jul 29 14:03:47 thanos01 thanos[32452]: level=info ts=2019-07-29T14:03:47.456113025Z caller=compact.go:444 msg="compact blocks" count=4 mint=1561622400000 maxt=1561651200000 ulid=01DGZ0PDS2MFX25P9CKG1C1TA3 sources="[01DEC9F6B0BVQHFG7BZDS75N0V 01DECGAXK0CM9HC2QPZ7MMH8M0 01DECQ6MV0K0FR00HHY9906178 01DECY2C2ZX4F6N9GKHKM2PC61]" duration=9.524377767s
Jul 29 14:03:47 thanos01 thanos[32452]: level=warn ts=2019-07-29T14:03:47.752332697Z caller=runutil.go:108 compactionGroup="0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}" msg="detected close error" err="close file /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/meta.json: close /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/meta.json: file already closed"
Jul 29 14:03:55 thanos01 thanos[32452]: level=warn ts=2019-07-29T14:03:55.115560765Z caller=runutil.go:108 compactionGroup="0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}" msg="detected close error" err="close file /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/chunks/000001: close /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/chunks/000001: file already closed"
Jul 29 14:03:56 thanos01 thanos[32452]: level=warn ts=2019-07-29T14:03:56.472928348Z caller=runutil.go:108 compactionGroup="0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}" msg="detected close error" err="close file /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/index: close /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/index: file already closed"
Jul 29 14:03:56 thanos01 thanos[32452]: level=warn ts=2019-07-29T14:03:56.571187127Z caller=runutil.go:108 compactionGroup="0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}" msg="detected close error" err="close file /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/index.cache.json: close /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/index.cache.json: file already closed"
Jul 29 14:03:56 thanos01 thanos[32452]: level=warn ts=2019-07-29T14:03:56.72380147Z caller=runutil.go:108 compactionGroup="0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}" msg="detected close error" err="close file /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/meta.json: close /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3/meta.json: file already closed"
Jul 29 14:03:56 thanos01 thanos[32452]: level=debug ts=2019-07-29T14:03:56.723861375Z caller=compact.go:906 compactionGroup="0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}" msg="uploaded block" result_block=01DGZ0PDS2MFX25P9CKG1C1TA3 duration=8.994729984s

and now it's choking on that block:

Jul 30 11:25:38 thanos01 thanos[22465]: level=debug ts=2019-07-30T11:25:38.310116816Z caller=compact.go:252 msg="download meta" block=01DGZ0PDS2MFX25P9CKG1C1TA3
Jul 30 11:25:47 thanos01 thanos[22465]: level=debug ts=2019-07-30T11:25:47.039193509Z caller=compact.go:840 compactionGroup="0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}" msg="downloaded and verified blocks" blocks="[/var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DEBE09AZXVCNWKYCWP43A7G4 /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3]" duration=7.277184353s
Jul 30 11:25:47 thanos01 thanos[22465]: level=error ts=2019-07-30T11:25:47.214649027Z caller=main.go:199 msg="running command failed" err="error executing compaction: compaction failed: compaction failed for group 0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}: compact blocks [/var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DEBE09AZXVCNWKYCWP43A7G4 /var/lib/thanos/compactor/compact/0@{datacenter=\"p24\",environment=\"internal\",replica=\"02\"}/01DGZ0PDS2MFX25P9CKG1C1TA3]: write compaction: chunk 8 not found: reference sequence 0 out of range"

First, I'd expect it to survive broken blocks, but what's more concerning is that the block has been uploaded successfully before (unless those warning messages are not there just for show, and there is indeed something wrong).

What's uploaded:

$ swift list prometheus-storage | grep 01DGZ0PDS2MFX25P9CKG1C1TA3
01DGZ0PDS2MFX25P9CKG1C1TA3/chunks/000001
01DGZ0PDS2MFX25P9CKG1C1TA3/index
01DGZ0PDS2MFX25P9CKG1C1TA3/index.cache.json
01DGZ0PDS2MFX25P9CKG1C1TA3/meta.json
debug/metas/01DGZ0PDS2MFX25P9CKG1C1TA3.json

and meta.json:

{
	"ulid": "01DGZ0PDS2MFX25P9CKG1C1TA3",
	"minTime": 1561622400000,
	"maxTime": 1561651200000,
	"stats": {
		"numSamples": 271799367,
		"numSeries": 141915,
		"numChunks": 2265057
	},
	"compaction": {
		"level": 2,
		"sources": [
			"01DEC9F6B0BVQHFG7BZDS75N0V",
			"01DECGAXK0CM9HC2QPZ7MMH8M0",
			"01DECQ6MV0K0FR00HHY9906178",
			"01DECY2C2ZX4F6N9GKHKM2PC61"
		],
		"parents": [
			{
				"ulid": "01DEC9F6B0BVQHFG7BZDS75N0V",
				"minTime": 1561622400000,
				"maxTime": 1561629600000
			},
			{
				"ulid": "01DECGAXK0CM9HC2QPZ7MMH8M0",
				"minTime": 1561629600000,
				"maxTime": 1561636800000
			},
			{
				"ulid": "01DECQ6MV0K0FR00HHY9906178",
				"minTime": 1561636800000,
				"maxTime": 1561644000000
			},
			{
				"ulid": "01DECY2C2ZX4F6N9GKHKM2PC61",
				"minTime": 1561644000000,
				"maxTime": 1561651200000
			}
		]
	},
	"version": 1,
	"thanos": {
		"labels": {
			"datacenter": "p24",
			"environment": "internal",
			"replica": "02"
		},
		"downsample": {
			"resolution": 0
		},
		"source": "compactor"
	}
}

@caarlos0
Copy link

caarlos0 commented Aug 8, 2019

This also breaks when running with --wait.

I was expecting it to fail, but not quit... but it did crashloop with that error.

After some time I deleted the pod and it seems to be running fine now... maybe that 30m mentioned above?

@caarlos0
Copy link

caarlos0 commented Aug 8, 2019

OK that was quick, just crashlooped again

@bwplotka
Copy link
Member

bwplotka commented Aug 9, 2019

Are you hitting this maybe? #1331

Essentially partial upload and empty or missing chunk files in <block ulid>/chunks folder?

@bwplotka
Copy link
Member

bwplotka commented Aug 9, 2019

You have 01DGZ0PDS2MFX25P9CKG1C1TA3/chunks/000001 but it might be partiall uploaded. I think we should focus on that issue then.

Thanks all for reporting let's investigate this upload path with minio client. Essentially this might be some assumption: #1331 (comment)

@bwplotka
Copy link
Member

bwplotka commented Aug 9, 2019

Let's continue on one of the issues. This looks like a duplicate of #1331

EDIT: Actually this one was first but I think it does not matter much - let's continue on one (: Hope this is ok with you guys!

@bwplotka bwplotka closed this as completed Aug 9, 2019
@caarlos0
Copy link

caarlos0 commented Aug 9, 2019

Are you hitting this maybe? #1331

not sure, I'm using GCS... that issue seems to be specific to S3...

@bwplotka
Copy link
Member

bwplotka commented Aug 9, 2019

hm, maybe we are wrong and GCS is affected as well. What about other question about chunk folder?

@caarlos0
Copy link

caarlos0 commented Aug 9, 2019

sorry, didn't see that question...

So, the log says:

level=error ts=2019-08-09T14:09:55.107857264Z caller=main.go:199 msg="running command failed" err="error executing compaction: compaction failed: compaction failed for group 0@{server=\"central\"}: compact blocks [/var/thanos/compact/compact/0@{server=\"central\"}/01DB9JXXFE1HE1CYQM171PRMFK /var/thanos/compact/compact/0@{server=\"central\"}/01DBFCAMBRRMRT0SJAT516AVF4 /var/thanos/compact/compact/0@{server=\"central\"}/01DBN5QC3G1ZJXQZJ6MXZNPMV2 /var/thanos/compact/compact/0@{server=\"central\"}/01DBTZ3WZY9VR0T3MDZRTP5Z95 /var/thanos/compact/compact/0@{server=\"central\"}/01DC0RGG6EK3BMK96YVRBDQT6E]: write compaction: chunk 8 not found: reference sequence 0 out of range"
gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/:
 217104068  2019-06-24T17:25:14Z  gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/index
      1650  2019-06-24T17:25:14Z  gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/meta.json

gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/chunks/:
 536861830  2019-06-24T17:24:35Z  gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/chunks/000001
 536862360  2019-06-24T17:24:47Z  gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/chunks/000002
 536867046  2019-06-24T17:25:00Z  gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/chunks/000003
 404281529  2019-06-24T17:25:09Z  gs://REDACTED/01DB9JXXFE1HE1CYQM171PRMFK/chunks/000004
TOTAL: 6 objects, 2231978483 bytes (2.08 GiB)
gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/:
 239464138  2019-06-24T17:26:09Z  gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/index
      1650  2019-06-24T17:26:09Z  gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/meta.json

gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/chunks/:
 536868373  2019-06-24T17:25:26Z  gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/chunks/000001
 536863290  2019-06-24T17:25:39Z  gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/chunks/000002
 536870423  2019-06-24T17:25:52Z  gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/chunks/000003
 498277142  2019-06-24T17:26:03Z  gs://REDACTED/01DBFCAMBRRMRT0SJAT516AVF4/chunks/000004
TOTAL: 6 objects, 2348345016 bytes (2.19 GiB)
gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/:
 264888644  2019-06-24T17:27:05Z  gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/index
      1650  2019-06-24T17:27:06Z  gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/meta.json

gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/chunks/:
 536870054  2019-06-24T17:26:21Z  gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/chunks/000001
 536866656  2019-06-24T17:26:34Z  gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/chunks/000002
 536869259  2019-06-24T17:26:47Z  gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/chunks/000003
 536843539  2019-06-24T17:26:59Z  gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/chunks/000004
  23772607  2019-06-24T17:27:00Z  gs://REDACTED/01DBN5QC3G1ZJXQZJ6MXZNPMV2/chunks/000005
TOTAL: 7 objects, 2436112409 bytes (2.27 GiB)
gs://REDACTED/01DBTZ3WZY9VR0T3MDZRTP5Z95/:
 232606362  2019-06-24T17:27:59Z  gs://REDACTED/01DBTZ3WZY9VR0T3MDZRTP5Z95/index
      1650  2019-06-24T17:27:59Z  gs://REDACTED/01DBTZ3WZY9VR0T3MDZRTP5Z95/meta.json
TOTAL: 2 objects, 232608012 bytes (221.83 MiB)
gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/:
 235607510  2019-06-24T17:29:12Z  gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/index
      1650  2019-06-24T17:29:15Z  gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/meta.json

gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/chunks/:
 536856293  2019-06-24T17:28:15Z  gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/chunks/000001
 536849008  2019-06-24T17:28:30Z  gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/chunks/000002
 536865860  2019-06-24T17:28:43Z  gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/chunks/000003
 532574433  2019-06-24T17:28:55Z  gs://REDACTED/01DC0RGG6EK3BMK96YVRBDQT6E/chunks/000004
TOTAL: 6 objects, 2378754754 bytes (2.22 GiB)

It seems to be the problem is 01DBTZ3WZY9VR0T3MDZRTP5Z95? Should I just delete the entire folder?

@caarlos0
Copy link

anything I can do as a workaround so compact works again?

@GiedriusS
Copy link
Member

You should delete the blocks which have duplicated data and only leave one copy. It's up to you to decide which one it is (: It sounds like you need to delete the one you've mentioned but please double check.

@caarlos0
Copy link

You should delete the blocks which have duplicated data and only leave one copy. It's up to you to decide which one it is (: It sounds like you need to delete the one you've mentioned but please double check.

yeah, did that and it seems to be running now

@aaron-trout
Copy link

Hi, I am having this issue using thanos 0.8.1. I have tried moving the directories for blocks it complains about out of the bucket but then every time I run the compactor it just finds some more to be sad about :-(

This is crashing the compactor with 0.8.1 even with --wait flag.

Any help to further debug this would be appreciated! (@bwplotka maybe?)

@bwplotka
Copy link
Member

bwplotka commented Dec 17, 2019

Hi, we started some doc where we discussed another potential cause of this, which is the false assumption of partial block upload: https://docs.google.com/document/d/1QvHt9NXRvmdzy51s4_G21UW00QwfL9BfqBDegcqNvg0/edit#

@bwplotka
Copy link
Member

bwplotka commented Jan 3, 2020

Reopening as it still occurs. The mentioned doc above states potential issue (however quite unlikely) of compactor removing a block when it is being still uploaded for some reason (long upload / old block upload). We are working on fix for that particular case, but more info would be useful why chunk file is missing from the block.

@bwplotka bwplotka reopened this Jan 3, 2020
@ivan-kiselev
Copy link

ivan-kiselev commented Jan 3, 2020

To compliment the last comment:

thanosio/thanos:v0.9.0

Compactor logs:

level=warn ts=2020-01-03T10:30:29.887625771Z caller=prober.go:117 msg="changing probe status" status=not-ready reason="error executing compaction: compaction failed: compaction failed for group 0@3883330991078906199: gather index issues for block /thanos-compact-data/compact/0@3883330991078906199/01DXKWN08YHZQMCHHJ3VQVVDSP: open index file: try lock file: open /thanos-compact-data/compact/0@3883330991078906199/01DXKWN08YHZQMCHHJ3VQVVDSP/index: no such file or directory"
level=info ts=2020-01-03T10:30:29.888132722Z caller=http.go:78 service=http/server component=compact msg="internal server shutdown" err="error executing compaction: compaction failed: compaction failed for group 0@3883330991078906199: gather index issues for block /thanos-compact-data/compact/0@3883330991078906199/01DXKWN08YHZQMCHHJ3VQVVDSP: open index file: try lock file: open /thanos-compact-data/compact/0@3883330991078906199/01DXKWN08YHZQMCHHJ3VQVVDSP/index: no such file or directory"

Meantime in the bucket (there is the only file meta.json in the bucket indeed):

{
	"ulid": "01DXKWN08YHZQMCHHJ3VQVVDSP",
	"minTime": 1577952000000,
	"maxTime": 1577980800000,
	"stats": {
		"numSamples": 954058569,
		"numSeries": 906289,
		"numChunks": 8144375
	},
	"compaction": {
		"level": 2,
		"sources": [
			"01DXJYK0WNNX8WKK3KXSK2JR6S",
			"01DXK5ER4N46ST170D3D8JX1FS",
			"01DXKCAFCYHS2Z94XP5QYBFJHA",
			"01DXKK66MG06PDAQAQD1YNS8AX"
		],
		"parents": [
			{
				"ulid": "01DXJYK0WNNX8WKK3KXSK2JR6S",
				"minTime": 1577952000000,
				"maxTime": 1577959200000
			},
			{
				"ulid": "01DXK5ER4N46ST170D3D8JX1FS",
				"minTime": 1577959200000,
				"maxTime": 1577966400000
			},
			{
				"ulid": "01DXKCAFCYHS2Z94XP5QYBFJHA",
				"minTime": 1577966400000,
				"maxTime": 1577973600000
			},
			{
				"ulid": "01DXKK66MG06PDAQAQD1YNS8AX",
				"minTime": 1577973600000,
				"maxTime": 1577980800000
			}
		]
	},
	"version": 1,
	"thanos": {
		"labels": {
                             ....
		},
		"downsample": {
			"resolution": 0
		},
		"source": "compactor"
	}
}

So it seems compactor failed to upload itself and not crashing against it.

bwplotka added a commit that referenced this issue Jan 3, 2020
This replaces man 4 inconsistent meta.json syncs places in other components.

Fixes: #1335
Fixes: #1919
Fixes: #1300

* One place for sync logic for both compactor and store
* Corrupted disk cache for meta.json is handled gracefully.
* Blocks without meta.json are handled properly for all compactor phases.
* Synchronize was not taking into account deletion by removing meta.json.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Better observability for syncronize process.
* More logs for store startup process.
* Remove Compactor Syncer.
* Added metric for partialUploadAttempt deletions.
* More tests.

TODO in separate PR:
* More observability for index-cache loading / adding time.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
This replaces man 4 inconsistent meta.json syncs places in other components.

Fixes: #1335
Fixes: #1919
Fixes: #1300

* One place for sync logic for both compactor and store
* Corrupted disk cache for meta.json is handled gracefully.
* Blocks without meta.json are handled properly for all compactor phases.
* Synchronize was not taking into account deletion by removing meta.json.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Better observability for syncronize process.
* More logs for store startup process.
* Remove Compactor Syncer.
* Added metric for partialUploadAttempt deletions.
* More tests.

TODO in separate PR:
* More observability for index-cache loading / adding time.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 6, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 6, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 6, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member

bwplotka commented Jan 6, 2020

We merged fix for the case of: #1919 which potentially might cause such chunk file to be missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants