-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Conversation
This reverts commit 5f38aa8.
ethcore/sync/src/block_sync.rs
Outdated
@@ -515,6 +528,7 @@ impl BlockDownloader { | |||
}, | |||
Err(BlockImportError(BlockImportErrorKind::Queue(QueueErrorKind::Full(limit)), _)) => { | |||
debug!(target: "sync", "Block import queue full ({}), restarting sync", limit); | |||
bad = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to trigger resetting the sync right? Do you think we should return a different error type other than BlockDownloaderImportError::Invalid
? (And also handle it in ChainSync::collect_blocks
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree and I had previously done that. However my idea with this initial PR was to do the minimum amount to fix this bug.
However looks like this fix doesn't work entirely so I will add that back while waiting for another mainnet sync...
Unfortunately this does not appear to fix the issue when syncing mainnet. I'm currently investigating. |
This is a problem on mainnet where multiple stale peer requests will force many rounds to complete quickly, forcing the retraction.
@jimpo have implemented your suggestions with the useless counter and the reset logic. Have run it 3 times against mainnet and it manages to get past the corrupt blocks. Also ran a couple of full syncs against POA network and it works okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ascjones Are the changes to the tests submodule expected? (maybe you updated the submodule unknowingly?) |
Yes just noticed that too, I must've done that by accident. Fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor grumble for typos in a comment. Good job!
@@ -491,8 +513,9 @@ impl BlockDownloader { | |||
} | |||
|
|||
/// Checks if there are blocks fully downloaded that can be imported into the blockchain and does the import. | |||
pub fn collect_blocks(&mut self, io: &mut SyncIo, allow_out_of_order: bool) -> Result<(), BlockDownloaderImportError> { | |||
let mut bad = false; | |||
/// Returns DownloadAction::Reset if it is imported all the the blocks it can and all downloading peers should be reset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos in the sentence.
* Log block set in block_sync for easier debugging * logging macros * Match no args in sync logging macros * Add QueueFull error * Only allow importing headers if the first matches requested * WIP * Test for chain head gaps and log * Calc distance even with 2 heads * Revert previous commits, preparing simple fix This reverts commit 5f38aa8. * Reject headers with no gaps when ChainHead * Reset block sync download when queue full * Simplify check for subchain heads * Add comment to explain subchain heads filter * Fix is_subchain_heads check and comment * Prevent premature round completion after restart This is a problem on mainnet where multiple stale peer requests will force many rounds to complete quickly, forcing the retraction. * Reset stale old blocks request after queue full * Revert "Reject headers with no gaps when ChainHead" This reverts commit 0eb8655. * Add BlockSet to BlockDownloader logging Currently it is difficult to debug this because there are two instances, one for OldBlocks and one for NewBlocks. This adds the BlockSet to all log messages for easy log filtering. * Reset OldBlocks download from last enqueued Previously when the ancient block queue was full it would restart the download from the last imported block, so the ones still in the queue would be redownloaded. Keeping the existing downloader instance and just resetting it will start again from the last enqueued block.:wq * Ignore expired Body and Receipt requests * Log when ancient block download being restarted * Only request old blocks from peers with >= difficulty #9226 might be too permissive and causing the behaviour of the retraction soon after the fork block. With this change the peer difficulty has to be greater than or euqal to our syncing difficulty, so should still fix #9225 * Some logging and clear stalled blocks head * Revert "Some logging and clear stalled blocks head" This reverts commit 757641d. * Reset stalled header if useless more than once * Store useless headers in HashSet * Add sync target to logging macro * Don't disable useless peer and fix log macro * Clear useless headers on reset and comments * Use custom error for collecting blocks Previously we resued BlockImportError, however only the Invalid case and this made little sense with the QueueFull error. * Remove blank line * Test for reset sync after consecutive useless headers * Don't reset after consecutive headers when chain head * Delete commented out imports * Return DownloadAction from collect_blocks instead of error * Don't reset after round complete, was causing test hangs * Add comment explaining reset after useless * Replace HashSet with counter for useless headers * Refactor sync reset on bad block/queue full * Add missing target for log message * Fix compiler errors and test after merge * ethcore: revert ethereum tests submodule update
* Log block set in block_sync for easier debugging * logging macros * Match no args in sync logging macros * Add QueueFull error * Only allow importing headers if the first matches requested * WIP * Test for chain head gaps and log * Calc distance even with 2 heads * Revert previous commits, preparing simple fix This reverts commit 5f38aa8. * Reject headers with no gaps when ChainHead * Reset block sync download when queue full * Simplify check for subchain heads * Add comment to explain subchain heads filter * Fix is_subchain_heads check and comment * Prevent premature round completion after restart This is a problem on mainnet where multiple stale peer requests will force many rounds to complete quickly, forcing the retraction. * Reset stale old blocks request after queue full * Revert "Reject headers with no gaps when ChainHead" This reverts commit 0eb8655. * Add BlockSet to BlockDownloader logging Currently it is difficult to debug this because there are two instances, one for OldBlocks and one for NewBlocks. This adds the BlockSet to all log messages for easy log filtering. * Reset OldBlocks download from last enqueued Previously when the ancient block queue was full it would restart the download from the last imported block, so the ones still in the queue would be redownloaded. Keeping the existing downloader instance and just resetting it will start again from the last enqueued block.:wq * Ignore expired Body and Receipt requests * Log when ancient block download being restarted * Only request old blocks from peers with >= difficulty #9226 might be too permissive and causing the behaviour of the retraction soon after the fork block. With this change the peer difficulty has to be greater than or euqal to our syncing difficulty, so should still fix #9225 * Some logging and clear stalled blocks head * Revert "Some logging and clear stalled blocks head" This reverts commit 757641d. * Reset stalled header if useless more than once * Store useless headers in HashSet * Add sync target to logging macro * Don't disable useless peer and fix log macro * Clear useless headers on reset and comments * Use custom error for collecting blocks Previously we resued BlockImportError, however only the Invalid case and this made little sense with the QueueFull error. * Remove blank line * Test for reset sync after consecutive useless headers * Don't reset after consecutive headers when chain head * Delete commented out imports * Return DownloadAction from collect_blocks instead of error * Don't reset after round complete, was causing test hangs * Add comment explaining reset after useless * Replace HashSet with counter for useless headers * Refactor sync reset on bad block/queue full * Add missing target for log message * Fix compiler errors and test after merge * ethcore: revert ethereum tests submodule update
* produce portable binaries (#9725) * HF in POA Core (2018-10-22) (#9724) poanetwork/poa-chain-spec#87 * Use static call and apparent value transfer for block reward contract code (#9603) * Verify block syncing responses against requests (#9670) * sync: Validate received BlockHeaders packets against stored request. * sync: Validate received BlockBodies and BlockReceipts. * sync: Fix broken tests. * sync: Unit tests for BlockDownloader::import_headers. * sync: Unit tests for import_{bodies,receipts}. * tests: Add missing method doc. * Fix ancient blocks sync (#9531) * Log block set in block_sync for easier debugging * logging macros * Match no args in sync logging macros * Add QueueFull error * Only allow importing headers if the first matches requested * WIP * Test for chain head gaps and log * Calc distance even with 2 heads * Revert previous commits, preparing simple fix This reverts commit 5f38aa8. * Reject headers with no gaps when ChainHead * Reset block sync download when queue full * Simplify check for subchain heads * Add comment to explain subchain heads filter * Fix is_subchain_heads check and comment * Prevent premature round completion after restart This is a problem on mainnet where multiple stale peer requests will force many rounds to complete quickly, forcing the retraction. * Reset stale old blocks request after queue full * Revert "Reject headers with no gaps when ChainHead" This reverts commit 0eb8655. * Add BlockSet to BlockDownloader logging Currently it is difficult to debug this because there are two instances, one for OldBlocks and one for NewBlocks. This adds the BlockSet to all log messages for easy log filtering. * Reset OldBlocks download from last enqueued Previously when the ancient block queue was full it would restart the download from the last imported block, so the ones still in the queue would be redownloaded. Keeping the existing downloader instance and just resetting it will start again from the last enqueued block.:wq * Ignore expired Body and Receipt requests * Log when ancient block download being restarted * Only request old blocks from peers with >= difficulty #9226 might be too permissive and causing the behaviour of the retraction soon after the fork block. With this change the peer difficulty has to be greater than or euqal to our syncing difficulty, so should still fix #9225 * Some logging and clear stalled blocks head * Revert "Some logging and clear stalled blocks head" This reverts commit 757641d. * Reset stalled header if useless more than once * Store useless headers in HashSet * Add sync target to logging macro * Don't disable useless peer and fix log macro * Clear useless headers on reset and comments * Use custom error for collecting blocks Previously we resued BlockImportError, however only the Invalid case and this made little sense with the QueueFull error. * Remove blank line * Test for reset sync after consecutive useless headers * Don't reset after consecutive headers when chain head * Delete commented out imports * Return DownloadAction from collect_blocks instead of error * Don't reset after round complete, was causing test hangs * Add comment explaining reset after useless * Replace HashSet with counter for useless headers * Refactor sync reset on bad block/queue full * Add missing target for log message * Fix compiler errors and test after merge * ethcore: revert ethereum tests submodule update * Add hardcoded headers (#9730) * add foundation hardcoded header #6486017 * add ropsten hardcoded headers #4202497 * add kovan hardcoded headers #9023489 * gitlab ci: releasable_branches: change variables condition to schedule (#9729)
Fixes #9300, fixes #7008 and fixes #9306. Rel #9407.
Edit: there were several issues here
TLDR;
#0
#0
, can download a non canonical set of blocksRetraction after queue full
BlockDownloader
inState::Blocks
requesting headers for subchains:BlockDownloader
inState::ChainHead
. In flight header response from abandoned round arrives, and resets subchain headers to consecutive headers (see the todo comment for a clue!). These headers may be orphaned from their parents which were discarded after the queue became full above.UnknownParent
after, causing the sync to quickly retract to 0. This was also being caused by stale body responses.The fix is to expire all pending requests when resetting the sync.
Getting stuck on a non canonical block
#1..#57
, probably propogated by the above bug. Also around the fork block#1920000
it was getting stuck on some non canon blocks.The fix is to reset the sync if we receive consecutive header responses for a requesetd hash with no useful headers that advance the subchain.
Further improvements
The ancient blocks queue still fills up fairly frequently, resulting in many downloaded blocks being discarded. I will address this in a future PR, by pausing the download of old blocks similar to the normal sync. For now though this no longer triggers the retraction, and the ancient blocks sync now completes.
I also have some further improvements/refactorings for this code stashed away for future PRs. But trying to keep this PR to the minimum that will actually get the ancient block sync working again.