-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interrupted branch exploration prevents any future syncing of that branch #23
Labels
kind/bug
A bug in existing code (including security flaws)
Comments
hsanjuan
changed the title
Interrupted branch exploration prevents any future syncing
Interrupted branch exploration prevents any future syncing of that branch
Feb 18, 2020
hsanjuan
added a commit
that referenced
this issue
Feb 1, 2022
Per issue #23, any error while processing a branch of the dag results in that branch never being processed again, as we have the blocks in the blockstore and we assume that every block we have has been processed. This PR adds a repair goroutine that runs whenever the crdt-store is marked as dirty. The crdt-store gets marked as dirty when errors have occurred when "processing nodes" or "sending new jobs" (fetching nodes from the network usually). This task, which runs every RepairInterval, starts walking down the DAG from the heads and retriggers re-processing for every node that was not processed. It does not stop until the bottom has been reached. In order to know which DAG nodes have been processed and which ones haven't we need to explicitally store the list of processed nodes (which is different from the list of nodes that we have). A node becomes processed as soon as its delta has been merged. Repair can be an expensive operation thus we should avoid marking the datastore as dirty easily, which perhaps is the case when we shutdown while processing things. Once repair reaches the bottom, the dirty bit is cleared. Datastores in previous versions will reprocess the store on first run, as they need to mark all the nodes in the DAG as processed.
hsanjuan
added a commit
that referenced
this issue
Feb 1, 2022
* Allow self-repair of the store Per issue #23, any error while processing a branch of the dag results in that branch never being processed again, as we have the blocks in the blockstore and we assume that every block we have has been processed. This PR adds a repair goroutine that runs whenever the crdt-store is marked as dirty. The crdt-store gets marked as dirty when errors have occurred when "processing nodes" or "sending new jobs" (fetching nodes from the network usually). This task, which runs every RepairInterval, starts walking down the DAG from the heads and retriggers re-processing for every node that was not processed. It does not stop until the bottom has been reached. In order to know which DAG nodes have been processed and which ones haven't we need to explicitally store the list of processed nodes (which is different from the list of nodes that we have). A node becomes processed as soon as its delta has been merged. Repair can be an expensive operation thus we should avoid marking the datastore as dirty easily, which perhaps is the case when we shutdown while processing things. Once repair reaches the bottom, the dirty bit is cleared. Datastores in previous versions will reprocess the store on first run, as they need to mark all the nodes in the DAG as processed. * DAGSyncer can become a DAGService It is no longer necessary to check if a block is in the blockstore, as we keep track separately. * Allow disabling auto-repair when interval set to 0 * Add public Repair method and improve repair logic and logging * Improve node processing logic and queue caching for deduplication I believe nodes could be removed from the queue by sendJobs and then re-added to the queue before they were processed, resulting in several workers reprocessing the same nodes potentially. * Update globaldb example * Enable processing of multiple heads in parallel Now that we have improved processing logic to make branch-processing not step constantly on each other's feet, we can re-enable processing multiple heads in parallel and take advantage of the multiple workers which are normally idling as only one head is processed at a time. * Keep improving the repair process and the syncing Ensure that re-processed branches are setting the right heads. Tighten up logic to avoid duplication of processed nodes. * Do not rebroadcast anything if there are no heads to rebroadcast * Fix repair notifications * Add helper functions to the processedBlock namespace * Repair: Just keep a full set of visited nodes while walking In memory, unfortunately.
This was fixed by #144 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When a new head is received, syncing for that branch starts until the DAGSyncer returns HasBlock() = true for the current block.
This means that the block we want to process is already local, so we can assume that it was already processed in the past (along with everything below).
However, it may be that we did not finish fully processing a branch, and while we have some of the newer blocks, we never fetched the old ones.
Raw thinking: Currently:
The text was updated successfully, but these errors were encountered: