Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0002-validators-warp-sync test failing #2568

Open
michalkucharczyk opened this issue Nov 30, 2023 · 1 comment
Open

0002-validators-warp-sync test failing #2568

michalkucharczyk opened this issue Nov 30, 2023 · 1 comment
Assignees

Comments

@michalkucharczyk
Copy link
Contributor

michalkucharczyk commented Nov 30, 2023

The warp sync in following configuration is sometimes failing.

Failure

Sometimes bob is trying do download the state for some old block. And since it cannot download it, it cannot accomplish warp-sync. Here is what it says:

...
2023-11-29 13:36:05.204  INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 0.11 Mib (0 peers), best: #0 (0xedb3…0fa9), finalized #0 (0xedb3…0fa9), ⬇ 24.3kiB/s ⬆ 1.8kiB/s
2023-11-29 13:36:10.205  INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 0.11 Mib (0 peers), best: #0 (0xedb3…0fa9), finalized #0 (0xedb3…0fa9), ⬇ 2.7kiB/s ⬆ 2.9kiB/s
2023-11-29 13:36:15.206  INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 0.11 Mib (0 peers), best: #0 (0xedb3…0fa9), finalized #0 (0xedb3…0fa9), ⬇ 2.1kiB/s ⬆ 2.5kiB/s
2023-11-29 13:36:20.207  INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 0.11 Mib (0 peers), best: #0 (0xedb3…0fa9), finalized #0 (0xedb3…0fa9), ⬇ 2.3kiB/s ⬆ 2.5kiB/s
2023-11-29 13:36:25.208  INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 0.11 Mib (0 peers), best: #0 (0xedb3…0fa9), finalized #0 (0xedb3…0fa9), ⬇ 2.1kiB/s ⬆ 2.5kiB/s
2023-11-29 13:36:30.209  INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 0.11 Mib (0 peers), best: #0 (0xedb3…0fa9), finalized #0 (0xedb3…0fa9), ⬇ 2.3kiB/s ⬆ 2.5kiB/s

The other nodes bob is connecting to and requesting the block are displaying:

2023-11-29 13:42:05.281 DEBUG tokio-runtime-worker sync: Failed to handle state request from 12D3KooWRkZhiRhsqmrQ28rt73K7V3aCBpqKrLGSXmZ99PTcTZby: UnknownBlock: State already discarded for 0x878b918c14b3aaca7cbbd9e91a30e4406a6305222f7ba37fad2b4b382f0b1776
2023-11-29 13:42:06.281 DEBUG tokio-runtime-worker sync: Failed to handle state request from 12D3KooWRkZhiRhsqmrQ28rt73K7V3aCBpqKrLGSXmZ99PTcTZby: UnknownBlock: State already discarded for 0x878b918c14b3aaca7cbbd9e91a30e4406a6305222f7ba37fad2b4b382f0b1776
2023-11-29 13:42:07.280 DEBUG tokio-runtime-worker sync: Failed to handle state request from 12D3KooWRkZhiRhsqmrQ28rt73K7V3aCBpqKrLGSXmZ99PTcTZby: UnknownBlock: State already discarded for 0x878b918c14b3aaca7cbbd9e91a30e4406a6305222f7ba37fad2b4b382f0b1776
2023-11-29 13:42:08.279 DEBUG tokio-runtime-worker sync: Failed to handle state request from 12D3KooWRkZhiRhsqmrQ28rt73K7V3aCBpqKrLGSXmZ99PTcTZby: UnknownBlock: State already discarded for 0x878b918c14b3aaca7cbbd9e91a30e4406a6305222f7ba37fad2b4b382f0b1776

Analysis

The problem starts with the warp-proof that bob receives from alice:

bob.log
1511:2023-11-29 13:36:02.712 DEBUG tokio-runtime-worker sync: Importing warp proof data from 12D3KooWQCkBm1BYtkHpocxCwMgR8yjitEeHGx8spzcDLGt2gkBm, 115013 bytes.
1512:2023-11-29 13:36:02.738 DEBUG tokio-runtime-worker sync: Verified complete proof, set_id=156

The grandpa set_id and size of proof is different from the warp-proof that alice got from charlie:

alice.log
1550:2023-11-29 13:36:01.735 DEBUG tokio-runtime-worker sync: Importing warp proof data from 12D3KooWPKzmmE2uYgF3z13xjpbFTp63g9dZFag8pG6MgnpSLF4S, 149692 bytes.
1551:2023-11-29 13:36:01.768 DEBUG tokio-runtime-worker sync: Verified complete proof, set_id=202

Why is that? Some debugs added, and here is the exciting story of alice syncing and handling the warp-proof request from bob at the same time:

2023-11-30 08:46:39.782 DEBUG main db: DB Commit 0xedb3d4018996f82242d0fc929c4a78a6e60031d19bc51aea33a56c69150b0fa9 (0), best=true, state=false, existing=false, finalized=true
2023-11-30 08:47:10.154 DEBUG tokio-runtime-worker db: DB Commit 0xaea8b6dae1f008987ee4b6245fce4fa52fc7a3311a69d380514edc5598f9ba77 (12131), best=true, state=true, existing=false, finalized=true
2023-11-30 08:47:10.221  INFO tokio-runtime-worker sync: Warp sync is complete (1 MiB), restarting block sync.
2023-11-30 08:47:10.221 DEBUG tokio-runtime-worker sync: Starting gap sync #1 - #12130
2023-11-30 08:47:10.227 DEBUG tokio-runtime-worker db: DB Commit 0x9e04054c45989549dfcd515e1b5a9a802b6a4f17707023d28bdf5975d0f254fa (12132), best=true, state=true, existing=false, finalized=false
2023-11-30 08:47:10.249 DEBUG tokio-runtime-worker db: DB Commit 0x5448eda4b7963ee85d28ea523149317dc39ce41173e198126b7ba67b7e123854 (12133), best=true, state=true, existing=false, finalized=false
2023-11-30 08:47:10.250 DEBUG tokio-runtime-worker db: DB Commit 0xb0e570ae04898300caa5fa0b5a255d755b11900a8870d5153854d6f612e0f1fd (1), best=false, state=false, existing=false, finalized=false
2023-11-30 08:47:10.250 DEBUG tokio-runtime-worker db: DB Commit 0x047ac13d4a6be9d4a31cc72e29227a828fba1b579c5d3501eae899551c3f5c0f (2), best=false, state=false, existing=false, finalized=false
2023-11-30 08:47:10.250 DEBUG tokio-runtime-worker db: DB Commit 0x35aa95eee607d9b853cfe513f6a2fb88ed9270d8eb8942ea488a0d0b9e5d602a (3), best=false, state=false, existing=false, finalized=false
...
...
2023-11-30 08:47:10.707 DEBUG tokio-runtime-worker db: DB Commit 0x4c57512a2ce0373dec4c5882f7770c694aa2852047a2fa70aca18667de873ba6 (3836), best=false, state=false, existing=false, finalized=false
2023-11-30 08:47:10.707 DEBUG tokio-runtime-worker db: DB Commit 0xc11df720b6c163e66d6a1ea84a5da5d5f3b709c8dba0026bbb21845021bfdfc5 (3837), best=false, state=false, existing=false, finalized=false
2023-11-30 08:47:10.710  INFO tokio-runtime-worker grandpa: Generating warp sync proof from (0, 0xedb3…0fa9) at: (12131, 0xaea8…ba77)
2023-11-30 08:47:10.711  INFO tokio-runtime-worker grandpa: Generated warp sync proof: size: 46375
2023-11-30 08:47:10.713 DEBUG tokio-runtime-worker db: DB Commit 0x68dc1fd20a8924326387d0b6fbd7a45b59601f1ce82a03f271cb7616e04f537c (3838), best=false, state=false, existing=false, finalized=false
2023-11-30 08:47:10.713 DEBUG tokio-runtime-worker db: DB Commit 0x91d6b4b3e466b134095ae8ab8d6d7a2176c1e0b152213d2e022820bb6c58e1de (3839), best=false, state=false, existing=false, finalized=false
...
2023-11-30 08:47:11.603 DEBUG tokio-runtime-worker db: DB Commit 0x2e3bb33e168946c8ca090c9606b7dc033dcb5ae72bbe537f9113affc576fb939 (12130), best=false, state=false, existing=false, finalized=false
2023-11-30 08:47:11.603 DEBUG tokio-runtime-worker db: Removed block gap.
...
2023-11-30 08:47:57.004 DEBUG tokio-runtime-worker db: DB Commit 0x1989dd25a560cf21fd54eab95e063b801a4deeff161b2b0d6680912d5bea0ebb (12146), best=true, state=true, existing=false, finalized=false
2023-11-30 08:48:06.009 DEBUG tokio-runtime-worker db: DB Commit 0x214801783ca47791dd88a0f602bff82fd1301e85245be797029f7abade0bdf5e (12147), best=true, state=true, existing=false, finalized=false

Debugs added are here for reference.

alice just builds the warp-sync proof for the highest finalized block in db while filling the block-gap at the same time.

Random observation: Interestingly latest_justification is not set in WarpSyncProof::generate. The grandpa finalize_block was not set during import of target block from warp-sync.

Local node identity cheatsheet:
eve:     12D3KooWFdSt4JnBmxWXwD5yp7wq2WzPaMA1nscSf6qt8pfge8Tk
dave:    12D3KooWKM1HeSv61ryZwAiBTZnqmass5pYM48k9Z7obzhTbnphm
charlie: 12D3KooWPKzmmE2uYgF3z13xjpbFTp63g9dZFag8pG6MgnpSLF4S
bob:     12D3KooWRkZhiRhsqmrQ28rt73K7V3aCBpqKrLGSXmZ99PTcTZby
alice:   12D3KooWQCkBm1BYtkHpocxCwMgR8yjitEeHGx8spzcDLGt2gkBm

Proposed solution

I'd propose that warp-sync requestor should check if received proof is actually for the requested block (which he knows should be finalized). If not, some additional measures should be performed:

  • waiting and resending request,
  • asking another peer,

I think that honest nodes should also do not respond to warp-sync request if the gap-syncing is still in progress.

@serban300
Copy link
Contributor

serban300 commented Jan 11, 2024

Leaving a comment in order to make sure we don't forget about this. After this is fixed we should remove the zombienet tests workarounds:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: backlog
Development

No branches or pull requests

2 participants