Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P data sync may deadlock on error #470

Closed
jsimnz opened this issue May 24, 2022 · 2 comments · Fixed by #1056
Closed

P2P data sync may deadlock on error #470

jsimnz opened this issue May 24, 2022 · 2 comments · Fixed by #1056
Assignees
Labels
area/p2p Related to the p2p networking system bug Something isn't working
Milestone

Comments

@jsimnz
Copy link
Member

jsimnz commented May 24, 2022

When peers are synchronizing a document graph. It will initially recieve update log L which will be n >= 1 updates ahead of the current head of the document graph H.

The data sync protocol will traverse backwards from the newly recieved log L until we get back our current head H.

If there is an error during sync, the Document will be left in an unstable state, and may not be able to recover.

This is related to this issue from IPFS: ipfs/go-ds-crdt#23

@jsimnz jsimnz added bug Something isn't working area/p2p Related to the p2p networking system labels May 24, 2022
@fredcarle
Copy link
Collaborator

@jsimnz is this something we need to fix ourselves or something that ipfs should take care of?

@jsimnz
Copy link
Member Author

jsimnz commented May 26, 2022

@jsimnz is this something we need to fix ourselves or something that ipfs should take care of?

Unfortunately it's on us :'(.

Technically there's a protocol that might help called GraphSync being worked on so Protocol Labs, but it's still a ways from being ready for prime time.

@jsimnz jsimnz added this to the DefraDB v0.3 milestone May 31, 2022
@jsimnz jsimnz modified the milestones: DefraDB v0.3, DefraDB v0.4 Jul 12, 2022
@fredcarle fredcarle modified the milestones: DefraDB v0.4, DefraDB v0.5 Dec 19, 2022
shahzadlone pushed a commit that referenced this issue Apr 13, 2023
Relevant issue(s)
Resolves #1028
Resolves #470
Resolves #1053

Description
The main purpose of this PR is to resolve the potential deadlock. The deadlock can happen if a PushLog cycle doesn't receive the whole missing block history before an error occurs. This leaves the DAG incomplete but there is no clear way, at the moment, for Defra to be aware of that and for it to try to fill in the gap at some point.

To solve the deadlock situation, John and I had discussed implementing a transaction that would cover the whole PushLog cycle. This means that any error occurring during the cycle will discard the transaction and thus leave the DAG unaffected.

As side effects, we add thread safety to the badger transactions and we manage DAG workers on a per PushLog cycle basis.
shahzadlone pushed a commit to shahzadlone/defradb that referenced this issue Feb 23, 2024
Relevant issue(s)
Resolves sourcenetwork#1028
Resolves sourcenetwork#470
Resolves sourcenetwork#1053

Description
The main purpose of this PR is to resolve the potential deadlock. The deadlock can happen if a PushLog cycle doesn't receive the whole missing block history before an error occurs. This leaves the DAG incomplete but there is no clear way, at the moment, for Defra to be aware of that and for it to try to fill in the gap at some point.

To solve the deadlock situation, John and I had discussed implementing a transaction that would cover the whole PushLog cycle. This means that any error occurring during the cycle will discard the transaction and thus leave the DAG unaffected.

As side effects, we add thread safety to the badger transactions and we manage DAG workers on a per PushLog cycle basis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/p2p Related to the p2p networking system bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants