-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread edge proposal #477
Thread edge proposal #477
Conversation
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Hey @dgtony, this is great! I wonder, could we also include log addresses in the hash? Updated peer addresses should also be considered info worth sharing even if the head hasn't changed. |
Hi, Sander, sorry for the late reply. Hope to come up with solution in a couple of days. |
Finally I've got a chance to get back to the PR ) Thanks for your corrective, adding peer addresses to the edge computation is totally reasonable for distinguishing thread states in general. But I'm not sure its worth it in this particular PR, so let me explain. Current PR was intended as a first step of optimization of ThreadsDB-nodes communication. It makes processing of Regarding this PR, it looks like this small improvement preserves current |
Hey @dgtony, thanks for the summary. I agree doing this change stepwise makes sense. So, first the renaming ( BTW, have you experimented with testground? We're thinking of spinning that up soon to get a better idea of network performance and bottlenecks. |
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>
Ok, so current PR now contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carsonfarmer may be interested in taking a look at this. He mentioned this idea sounds something like a vector clock. Maybe there are some ideas from that realm that can also be applied here. Or maybe we can call the edges clocks?
In any case, I'll merge this now and we can follow up with additional ideas.
} | ||
return hs[i].LogID < hs[j].LogID | ||
}) | ||
hasher := fnv.New64a() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of FNV 💪
Yeh this is neat. Almost like a vector clock, except more like a vector clock "summary" or digest. To really take advantage of a vector clock approach, we'd need to "track" more than just the head of the logs here. We'd also have to track an update logical clock/integer, that would get incremented on updates. It'd only be a few extra bytes, but the benefit with that approach is it'd tell us which logs were off, whether they were behind (or ahead), and by how far off they are. This would make it possible to do really fine-grained syncing. The vector clock approach is something I've been thinking about for a while, and I just need to write some stuff down. At the end of the day, a thread really is just a vector clock of logs. Almost a perfect fit. But it also has me thinking about the approach outlined here. Its a super cool way to create a fixed-length digest of a vector clock as well. This is cool because you get the vector clock on the one hand, and you get a digest to use as a sorting mechanism. The vector clock thing is worth thinking seriously about and writing up a new proposal discussion for implementing it. But of course, it would require some non-trivial code changes, whereas the approach in this PR answers the question are they off, yes or no pretty quickly and easily. So in the mean/shorter term, this is a super clever move in an awesome direction! Love it. |
Hi folks, here is another improvement for GetRecords.
This time we propose a new concept: thread edge. Basically it's just a 64-bit deterministic hash of all heads in a thread. Using the edge values we can easily compare two states of the same thread and decide if they are equal without fetching all thread's information from the logstore. Such optimization at first should improve request processing time on a fast path for the frequent GetRecords requests.
As a follow-up: introducing thread edges enables implementing fast checks for thread changes without getting and sending thread information on both communicating nodes (protocol extension in progress for now).