Channel needs to be re-established if task is not ack'ed #342

insipx · 2021-09-02T23:23:19Z

Minor bug i discovered while running Archive overnight to test stability. If the work-queue encounter a task (in this case a block being executed) that exceeds RabbitMQ default timeout, rabbitMQ drops the channel in order to avoid filling up disk-space infinitely.

This behavior is described here: https://www.rabbitmq.com/consumers.html#acknowledgement-timeout

Solution to protect against this from the work-queue side is to wrap tasks in a callback/timeout that matches RabbitMQ timeout, if it times out then we need to check the channel to see if it has been dropped, and if so drop the task and re-establish the channel.

As for why a task is taking >30min, that would need further investigation, but i don't believe it to be a serious issue, since this has only happened to me once so far while syncing multiple chains. The problem occurred for me on Kusama. Finding problematic block and reproducing the issue would be needed to solve that

25394824:[2021-09-02][07:58:52] [lapin::channel][ERROR] Channel closed on channel 15 by 0:0 => AMQPError { kind: Soft(PRECONDITIONFAILED), message: ShortString("PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more") } => PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more::/home/insipx/.cargo/registry/src/github.com-1ecc6299db9ec823/lapin-1.8.0/src/channel.rs;855
25394825:[2021-09-02][07:58:52] [sa_work_queue::threadpool][ERROR] protocol error: AMQP soft error: PRECONDITION-FAILED: PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more::/home/insipx/projects/parity/substrate-archive/master/work-queue/sa-work-queue/src/threadpool.rs;128
25395027:[2021-09-02][07:58:53] [sa_work_queue::threadpool][ERROR] protocol error: AMQP soft error: PRECONDITION-FAILED: PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more::/home/insipx/projects/parity/substrate-archive/master/work-queue/sa-work-queue/src/threadpool.rs;128
25395183:[2021-09-02][07:58:53] [sa_work_queue::threadpool][ERROR] protocol error: AMQP soft error: PRECONDITION-FAILED: PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more::/home/insipx/projects/parity/substrate-archive/master/work-queue/sa-work-queue/src/threadpool.rs;128
25395326:[2021-09-02][07:58:53] [sa_work_queue::threadpool][ERROR] protocol error: AMQP soft error: PRECONDITION-FAILED: PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more::/home/insipx/projects/parity/substrate-archive/master/work-queue/sa-work-queue/src/threadpool.rs;128
25395474:[2021-09-02][07:58:53] [sa_work_queue::threadpool][ERROR] protocol error: AMQP soft error: PRECONDITION-FAILED: PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more::/home/insipx/projects/parity/substrate-archive/master/work-queue/sa-work-queue/src/threadpool.rs;128
25395625:[2021-09-02][07:58:53] [sa_work_queue::threadpool][ERROR] protocol error: AMQP soft error: PRECONDITION-FAILED: PRECONDITION_FAILED - delivery acknowledgement on channel 15 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more::/home/insipx/projects/parity/substrate-archive/master/work-queue/sa-work-queue/src/threadpool.rs;128

The text was updated successfully, but these errors were encountered:

dvdplm · 2021-09-03T07:08:41Z

As for why a task is taking >30min, that would need further investigation

At the very least the PR for this issue should log all the relevant info about the troublesome block (and perhaps some general system metrics, if the root cause is external to the block data, dunno, some pesky locking issue or smn).

insipx · 2021-10-08T10:57:53Z

I revisted this a little bit last night after looking over the Wasm issues in the substrate repo related to PoV blocks, and I have a suspicion it might actually be a locking issue related to the RuntimeCache in the substrate WasmExecutor here that specifies a hard-coded limit a few lines above here

archive spawns a bunch of threads to execute blocks in order to speed up storage processing, so it would make sense that there is some kind of conflict happening here that might be worth examining this lock. It would maybe be worth running archive with a modified substrate and increase that limit to see if that fixes anything

xcaptain · 2022-05-07T14:40:58Z

Also come across this issue, the reason is prefetch_count too large, currently its value is 5000, execute_block takes about 2s, 5000 means a consumer would take more than 3 hours to consume local cached messages, the queue can't delivery next message to consumer in 1800s causes this error.
After changing the value to 100 this error has gone.

insipx added the difficulty-medium label Sep 2, 2021

insipx mentioned this issue Oct 7, 2021

Log Task data for task that exceeds rabbitmq_timeout #367

Open

This was referenced May 18, 2022

Maintenance Status: Unmaintained. Please fork. #464

Open

decrease prefetch count to 100 #472

Merged

jsdw closed this as completed in #472 May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel needs to be re-established if task is not ack'ed #342

Channel needs to be re-established if task is not ack'ed #342

insipx commented Sep 2, 2021 •

edited

Loading

dvdplm commented Sep 3, 2021

insipx commented Oct 8, 2021 •

edited

Loading

xcaptain commented May 7, 2022

Channel needs to be re-established if task is not ack'ed #342

Channel needs to be re-established if task is not ack'ed #342

Comments

insipx commented Sep 2, 2021 • edited Loading

dvdplm commented Sep 3, 2021

insipx commented Oct 8, 2021 • edited Loading

xcaptain commented May 7, 2022

insipx commented Sep 2, 2021 •

edited

Loading

insipx commented Oct 8, 2021 •

edited

Loading