-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate usage of tracker clock to ensure it's logically correct #21
Comments
I have noticed some failures in swarm when ITCs are compared. I realized that the Clock.peek() operation is broken, returning an invalid structure which is not compatible with the other operations. Which means that peeked clocks cannot be compared or joined. I put up a pull request with a fix: #77 While I was investigating this problem, I reviewed the general use of ITCs and it seems like the current implementation is flawed. Here are a few rules which have to be implemented to make ITCs work:
Also peeked clocks should never be stored because otherwise the identity gets lost: swarm/lib/swarm/tracker/tracker.ex Line 654 in e88966c
Let's take a look at an example (handle_replica_event for :update_meta):
I think, the code should look like this:
|
I found this issue because there is one process in our swarm of 3 nodes that does not start and we get this warning:
We have 3 nodes running, but this warning only occurs on one ( Strange is that Is this a problem that will be fixed by swarm? |
I've got some pending work which addresses the usages of clocks inside Swarm, but I'm currently in the middle of a big move, so I've had to shelve stuff for a bit until I get settled in my new house - this is definitely on my list of priorities! |
This issue is starting to occur more frequently at our system. Hope it can be fixed with the next release so we can keep using this nice package :) |
We're running into the same warning/error as @h4cc. |
I'm currently working on a new causally consistent model for Swarm, but in the meantime I believe we recently merged a PR which included fixes to the clock usage, you may want to give that a shot until I can get a release out. |
To recap, the tracker uses an implementation of an Interval Tree Clock for resolving the causal history of replicated events. My understanding of ITC is that a scenario plays out like so:
In the case of Swarm, we can deterministically resolve conflicts because at any given point in time, we know which node in the cluster "owns" a given process/registration, with the exception of processes which are registered but not managed by Swarm. We can ask the conflicted process to shut down, or kill it, and the registration will remain with the "correct" version. However, since the way synchronization works today is that the registry is sent to the remote node, I'm not sure a clock is strictly needed. If we assume that it does help us though, we should ensure that our logic for handling the clock forking/peeking/incrementing is correct.
The text was updated successfully, but these errors were encountered: