Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid chunk cksum #1346

Closed
divanikus opened this issue Dec 2, 2019 · 4 comments · Fixed by #1713
Closed

Invalid chunk cksum #1346

divanikus opened this issue Dec 2, 2019 · 4 comments · Fixed by #1713
Labels
component/loki keepalive An issue or PR that will be kept alive and never marked as stale. type/enhancement Something existing could be improved

Comments

@divanikus
Copy link

Hello. We are trying to run loki in our private k8s environment. But unfortunatelly we had several occasions when it was just useless because of that problem. It usually happens on large intervals (24h for example) and with "grep" requests.
Idk if it has something to do with server stability or something else. But it would be nice to just skip broken chunk and proceed further, spitting some warning or error to loki's log. Because now I see nothing in it's logs, the request just returns "Invalid chunk cksum".

@cyriltovena cyriltovena added the keepalive An issue or PR that will be kept alive and never marked as stale. label Dec 2, 2019
@cyriltovena
Copy link
Contributor

24h grep will likely time out unless you have increase Loki and Grafana timeout.

Do you have more logs about the request ? Not sure how you end up with a corrupted chunk but I guess its legitimate to just warn and keep going.( We might actually want to log an error.)

@divanikus
Copy link
Author

divanikus commented Dec 2, 2019

It shows some logs without grep query though. Unfortunatelly no logs from loki itself, at least nothing special in them. The request was something like that {app_kubernetes_io_name="mb-worker"} |= "822c68db13"

I have a backup of chunks directory, but it's like 50GiB wide, so I'm unsure if I can or have to send it to you somehow.

@cyriltovena
Copy link
Contributor

No worries, we will make the change you're asking it is a fair one. Then you should be able to find those bad chunks and we can work from there.

@cyriltovena cyriltovena added component/loki type/enhancement Something existing could be improved labels Dec 2, 2019
@divanikus
Copy link
Author

Just got another one. Seems like it might happen if the server with loki was rebooted unexpectedly. Made a loki installation, fed it with logs for several hours and rebooted the server by pressing reboot button. Got invalid chunk checksum even on 1h interval, even without grep.

Loki's logs are like this:

	2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353103922Z caller=table_manager.go:363 msg="creating table" table=index_2535
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353100675Z caller=table_manager.go:363 msg="creating table" table=index_2526
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353097433Z caller=table_manager.go:363 msg="creating table" table=index_2524
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353094039Z caller=table_manager.go:363 msg="creating table" table=index_2588
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353090767Z caller=table_manager.go:363 msg="creating table" table=index_2529
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353087492Z caller=table_manager.go:363 msg="creating table" table=index_2592
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.35308424Z caller=table_manager.go:363 msg="creating table" table=index_2586
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353080946Z caller=table_manager.go:363 msg="creating table" table=index_2563
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.35307746Z caller=table_manager.go:363 msg="creating table" table=index_2562
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353073507Z caller=table_manager.go:363 msg="creating table" table=index_2552
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.353064623Z caller=table_manager.go:363 msg="creating table" table=index_2527
2019-12-02 21:40:02	
level=info ts=2019-12-02T18:40:02.352985184Z caller=table_manager.go:220 msg="synching tables" expected_tables=86
2019-12-02 21:39:40	
level=error ts=2019-12-02T18:39:40.691974418Z caller=http.go:212 msg="Error writing close message to websocket" err="write tcp 10.244.0.29:3100->10.244.1.30:58094: write: broken pipe"
2019-12-02 21:39:40	
level=error ts=2019-12-02T18:39:40.691939422Z caller=http.go:210 msg="Error writing ping message to websocket" err="write tcp 10.244.0.29:3100->10.244.1.30:58094: write: broken pipe"
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353425276Z caller=table_manager.go:363 msg="creating table" table=index_2567
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353422147Z caller=table_manager.go:363 msg="creating table" table=index_2551
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353419056Z caller=table_manager.go:363 msg="creating table" table=index_2539
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353415962Z caller=table_manager.go:363 msg="creating table" table=index_2538
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353412521Z caller=table_manager.go:363 msg="creating table" table=index_2588
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353409464Z caller=table_manager.go:363 msg="creating table" table=index_2556
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353406259Z caller=table_manager.go:363 msg="creating table" table=index_2554
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353403137Z caller=table_manager.go:363 msg="creating table" table=index_2541
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353400013Z caller=table_manager.go:363 msg="creating table" table=index_2530
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353396779Z caller=table_manager.go:363 msg="creating table" table=index_2601
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353393198Z caller=table_manager.go:363 msg="creating table" table=index_2580
2019-12-02 21:38:02	
level=info ts=2019-12-02T18:38:02.353390037Z caller=table_manager.go:363 msg="creating table" table=index_2557

Absolutely nothing about the problem.

cyriltovena added a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
* adds LabelNamesForMetricName for the chunk and series store

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/loki keepalive An issue or PR that will be kept alive and never marked as stale. type/enhancement Something existing could be improved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants