fix: fix task stuck and not reassign bug in concurrent-fetch logic #50
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR aims to address the issue where the engine-sync synchronization process gets stuck in certain scenarios and cannot recover.
Rationale
Under the current scenarios, concurrentfetch may get stuck under certain conditions. Specifically, the trigger condition is as follows: when the data fetched by a task does not exist among all current peers (a situation that typically occurs when the TPS on L2 is high and an op-node has broadcasted a block, but the geth of the same node has not finished writing), the task will not be triggered for retry or reassignment until an event occurs (such as a peer joining or leaving, or receiving a cancel signal). Otherwise, it will remain in a deadlock. The modification made by this PR is to retry such tasks. If the task exceeds the maximum number of retries, it will proactively cancel the current task, allowing the scheduler to reschedule. In addition, this PR fixes a bug in the existing code where the pregressed state was not reset after unreserve.
Example
N/A
Changes
Notable changes: