-
Notifications
You must be signed in to change notification settings - Fork 3.7k
"producer double-confirming known range" error when testing failover #3442
Comments
Thanks for the report, I'll bring this to the devs attention. |
I will create a more specific issue for this and reference this issue in
it.
One of the concepts not ready for fail over is the watermark calculated by
the producer_plugin. This watermark is used to determine how many blocks
to confirm with each produced block.
For the particular scenario you describe we can fix this by scanning
incoming blocks for producers who are also local producers. That way your
backup node will have accurate watermarks
Inaccurate watermarks trick on node into double committing past blocks
which is a considered Byzantine failure and is why you cannot easily
recover
…On Sat, May 26, 2018, 2:46 PM Greg Lee ***@***.***> wrote:
Thanks for the report, I'll bring this to the devs attention.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3442 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACYR4nEsvGMdedtoJLqZWOaWoHncuVy4ks5t2aMMgaJpZM4UO7Ca>
.
|
Look forward to the fix. |
We moved this to 1.1 for a few reasons:
This is not to say it isn't something we need to fix. This will get better in short order. However, I wanted to enumerate some of the reasons why we have slipped it to version 1.1 |
Bart, I have a suggestion. The Ex:
Can you imagine something like this ? This would be to lower the risk of double-signing blocks. Some special payload could unconditionally resume a chain, if you didn't stop production from a node previously. That would be explicit, and you'd need to check you didn't have a node previously. What do you think ? |
sort of an out-of-band sync'ing of node production :) |
Also, I'm seeing that many nodes with the same keys loaded will all "counter-sign" all blocks. It doesn't fork or anything, and it's probably signing the same digest everywhere.. but I would have expected all signing to stop if you |
when I test in 1.0.6 I suffer the same bug after call resume&pasue API in second time to switch master & slaver. the error message: |
Any update on this issue? |
Was it solved? |
Tag: dawn-v4.2.0.
We have two nodes, same producer key and config, and one of them starts with production paused. They both run on the same machine.
Block producing node 1 start producing when the nodes start. Block producing node 2 syncs fine. We run
curl -sL http://127.0.0.1:<node-1-port>/v1/producer/pause
andcurl -sL http://127.0.0.1:<node-2-port>/v1/producer/resume
.Block producing node 2 begins producing with no problems.
Then, we run
curl -sL http://127.0.0.1:<node-2-port>/v1/producer/pause
andcurl -sL http://127.0.0.1:<node-1-port>/v1/producer/resume
.This causes the
producer double-confirming known range
assertion exception. We checked that this is where the assertion is defined in EOS.IO:eos/libraries/chain/block_header_state.cpp
Line 158 in 3b70b57
We also noticed that block producing node 2 can't resync with the chain even when we run it with the
--hard-replay-blockchain
option.The text was updated successfully, but these errors were encountered: