-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Found an interesting bug related to synchronisation of threads #530
Comments
Does this sound familiar, given your recent tests using test ground @merlinran ? |
IIUC what triggers this bug is that Device A gets the log head via a 3rd party channel and calls What if there's a new API call so Device A can explicitly tell go-threads that this is the log head so the data structure gets set properly? Far from an elegant solution but it introduces no performance penalty. |
Can you elaborate please? What I initially thought that as far as I understand SetHead preserves the invariant that before the head there are no gaps in the records. Therefore we can introduce some kind of counter just for the head. Obviously because we set new head just after the previous one we can always maintain the counter just by incrementing it every time we set new head. Thus if every log has a counter for its head we can easily compare them (at the time we get records) to see if we need to accept more records and how many records should be between the heads. It seems it would be easy to implement such a solution if I am not mistaken. |
Sorry my understanding of the codebase is still far from complete, and my proposal simply breaks the invariant of calling Having the counter for log depth sounds like a brilliant idea! Naive question: how can a peer get the expected value of the counter for a log? Does it mean that each record brings the counter? |
Yes, the log depth counter is what we had in mind for our vector clock proposal exactly. I'm a big fan of this simple feature, because it actually buys us a lot of features in terms of peer synchronization beyond just this bug here. It would require an update to the head data structure, which, while minimal, would likely require some thought. Now is a good time to do this though, as we already have some breaking changes in the pipeline. |
Since only one peer is able to write to a given log, if you get a head from a peer, you know that that head counter value is correct (as far as that head record is concerned). This also means you can compare your head counters with another peer's head counters to get all the semantics that come with vector clocks. |
Do you think it makes sense for me to draft some proposal, so we would look at it on Thursday? And maybe you would already give me some comments |
That would be fantastic! |
I reference the PR here (#531), it is not finished and a lot of the stuff there is not final. The only thing which is implemented is the get records logic which fixes the bug. I didn't have time to check other things yet (including to properly test it, except for the testground test which is working for me), I will do that after our discussion on Thursday. |
So here are some things to note:
|
Lately we have been experiencing an interesting bug in go-threads.
The problem occurs when we have a lot of records and we are trying to synchornize them.
In our application happens as follows:
putRecords
. So it checks them usingisKnown
which looks into blockstore and sees that we already have the Head of Log B (which we got through bitswap), so it decides not to proceed with saving the records.Undef
as we haven't updated it.ExchangeEdges
will see that we have different heads and will try to get records again. Thus we will have an infinite loop of unsuccesfull record updates.I tried to reproduce this
isKnown
problem on testground:The test is not of super good quality, but at least it sometimes works :-) Feel free to ping me in case I missed something.
https://github.com/mcrakhman/go-threads/tree/sync-bug-demonstration
To run the test type
testground run composition --wait -f sync-bug-exec.toml
.When the test stops running you will see some prints in the console like:
got log 12D3KooWAhDczANHAuX8JgqDofkmhZsdftzfS9Q7GTt3kvqDECbB with head b
(seeGetHeads
method)You can observe different behaviour in case there will be not 200 but 3 records. In this case the log of First instance will be in sync with log of Second instance.
The text was updated successfully, but these errors were encountered: