-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel needs to be re-established if task is not ack'ed #342
Comments
At the very least the PR for this issue should log all the relevant info about the troublesome block (and perhaps some general system metrics, if the root cause is external to the block data, dunno, some pesky locking issue or smn). |
I revisted this a little bit last night after looking over the Wasm issues in the substrate repo related to PoV blocks, and I have a suspicion it might actually be a locking issue related to the archive spawns a bunch of threads to execute blocks in order to speed up storage processing, so it would make sense that there is some kind of conflict happening here that might be worth examining this lock. It would maybe be worth running archive with a modified substrate and increase that limit to see if that fixes anything |
Also come across this issue, the reason is |
Minor bug i discovered while running Archive overnight to test stability. If the work-queue encounter a task (in this case a block being executed) that exceeds RabbitMQ default timeout, rabbitMQ drops the channel in order to avoid filling up disk-space infinitely.
This behavior is described here: https://www.rabbitmq.com/consumers.html#acknowledgement-timeout
Solution to protect against this from the work-queue side is to wrap tasks in a callback/timeout that matches RabbitMQ timeout, if it times out then we need to check the channel to see if it has been dropped, and if so drop the task and re-establish the channel.
As for why a task is taking >30min, that would need further investigation, but i don't believe it to be a serious issue, since this has only happened to me once so far while syncing multiple chains. The problem occurred for me on Kusama. Finding problematic block and reproducing the issue would be needed to solve that
The text was updated successfully, but these errors were encountered: