Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward sync stuck in a loop #6749

Closed
pinges opened this issue Mar 18, 2024 · 0 comments
Closed

Backward sync stuck in a loop #6749

pinges opened this issue Mar 18, 2024 · 0 comments
Assignees
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))

Comments

@pinges
Copy link
Contributor

pinges commented Mar 18, 2024

Description

The backward sync (BWS) is getting stuck in a loop with these recurring log messages:

{"@timestamp":"2024-03-05T13:57:59,199","level":"INFO","thread":"EthScheduler-Timer-0","class":"BackwardSyncContext","message":"Current backward sync session failed, it will be restarted","throwable":""}
{"@timestamp":"2024-03-05T13:58:01,114","level":"INFO","thread":"vert.x-worker-thread-0","class":"BackwardSyncContext","message":"Starting a new backward sync session","throwable":""}

Enough peers are present.

Restarting the node fixes the problem.

Reason

We receive an fcu containing the block hash of a head block. This block is added to the hashesToAppend queue. The block get's reorged and when we try to retrieve that block in the BWS from out peers none of them is able to provide it to us. This causes the BWS to fail, and when we receive the next fcu, a new hash might be added to the queue, but a new BWS will be started, trying to retrieve the same block that we have unsuccessfully tried to retrieve before.

This happened on 7 out of 8 nodes I started based on 24.2.0-RC4:
dev-elc-besu-teku-mainnet-dev-stefan-rc4-(1,2,4)
dev-elc-besu-teku-mainnet-dev-stefan-ss-(1,2,3,4)

The block reorged had the hash 0x4550b82492bf1738af79efb6140770c5443d368b9512ae8551583909554a040f.

Link to Kibane should work for about another 3 weeks:
Kibana: https://kibana.dev.protocols.consensys.net/app/r?l=DISCOVER_APP_LOCATOR&v=8.11.0&lz=N4IgjgrgpgTgniAXKSsGJANwLYH0B2AhtlIgDogAmUmAtFADYDGtARlAM4S0AuUA1t2yEAlvnxQetanQ58AZoXy0KAAiWVVJDh0IBzUhQAMADwAsAVgtHWADgBMZgJz3W8gIwB2AMy3C8zycoeVYANnczI09PIyYLMzNvSm9Q21YnC3d7QihbK3cLW28nIwz4wiNI%2BTUQABoQBiU9CH0oJBBBNBAAX3qOAHsYHiQAbRGQAAEeEW0eYgAHOqpOJhAAXTX6pn6GCGx8DlGsPCISJe1dA3X6sWoTdvsCs3t7eSdaIqLaS1DKWicnN4WEF5IlvEZKJ43GYlmI%2BDBMIQGO1CBAeP0lvIRAx4YdECNNlRCHMAGoiKAAdwAkpQHk8Xm8Pr5vN8LL9%2FoDgcEwRCoaCltMSAAlJptZAgeQwfrYdr4foU2jgygAelp9XRsvlPXqMGCuo4AAsqfh4YjkeKzdAkKEjLajPV5qiOGKeDBoN1ukA%3D%3D

Node rc4-1 has been restarted and finished syncing successfully.

@macfarla macfarla added bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) labels Mar 19, 2024
pinges added a commit that referenced this issue Mar 22, 2024
* minimal change to fix BWS

Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Simon Dudley <simon.dudley@consensys.net>
@jframe jframe closed this as completed Mar 26, 2024
jflo pushed a commit to jflo/besu that referenced this issue Mar 26, 2024
* minimal change to fix BWS

Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Justin Florentine <justin+github@florentine.us>
amsmota pushed a commit to Citi/besu that referenced this issue Apr 16, 2024
* minimal change to fix BWS

Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: amsmota <antonio.mota@citi.com>
amsmota pushed a commit to Citi/besu that referenced this issue Apr 16, 2024
* minimal change to fix BWS

Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: amsmota <antonio.mota@citi.com>
matthew1001 pushed a commit to kaleido-io/besu that referenced this issue Jun 7, 2024
* minimal change to fix BWS

Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Simon Dudley <simon.dudley@consensys.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))
Projects
None yet
Development

No branches or pull requests

3 participants