Thread edge proposal #477

dgtony · 2020-12-24T11:50:12Z

Hi folks, here is another improvement for GetRecords.

This time we propose a new concept: thread edge. Basically it's just a 64-bit deterministic hash of all heads in a thread. Using the edge values we can easily compare two states of the same thread and decide if they are equal without fetching all thread's information from the logstore. Such optimization at first should improve request processing time on a fast path for the frequent GetRecords requests.
As a follow-up: introducing thread edges enables implementing fast checks for thread changes without getting and sending thread information on both communicating nodes (protocol extension in progress for now).

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

sanderpick · 2020-12-31T20:03:35Z

Hey @dgtony, this is great! I wonder, could we also include log addresses in the hash? Updated peer addresses should also be considered info worth sharing even if the head hasn't changed.

dgtony · 2021-01-19T15:04:24Z

Hi, Sander, sorry for the late reply. Hope to come up with solution in a couple of days.

dgtony · 2021-02-02T16:53:27Z

Finally I've got a chance to get back to the PR )

Thanks for your corrective, adding peer addresses to the edge computation is totally reasonable for distinguishing thread states in general. But I'm not sure its worth it in this particular PR, so let me explain.

Current PR was intended as a first step of optimization of ThreadsDB-nodes communication. It makes processing of GetRecords easier for high-degree nodes, but the main idea here is compressing thread's state information into tiny hashes. So the real contribution is ThreadEdge method (btw, it seems to be better renamed into HeadsEdge).
And the next logical step should be extending internal ThreadsDB interface with a new method (lets say GetThreadEdge), enabling very lightweight difference checks between local and remote states of the thread. And at this point we can make a good use of additional information like peer addresses as you suggested. Message GetThreadEdgeResponse will contain pair of such hashes: current state of the thread's heads provided by HeadsEdge method and similar hash of all peer addresses associated with the given thread's logs. The latter will be fetched with unimplemented yet method AddrsEdge.
Having this two hashes in the response, node can compare it with its local values and decide if any of full GetRecords or GetLogs calls are neccessary in order to ensure local state is up-to-date.

Regarding this PR, it looks like this small improvement preserves current GetRecords behaviour. At the moment if some logID was included into request you'll get only new records for it and no log information (and peer address updates) in the response. Otherwise fresh LogInfo will be returned in the response whenever you omit said logID in the request, but currently it may happen only if the log was previously unknown to the requester. But in this case local and remote edge values will be different and request processing will move along the slow path as usual.
So there are two options I can see. Rename ThreadEdge -> HeadsEdge, implement AddrsEdge and PR could be merged in order to make the next one (with GetThreadEdge) smaller. Or we can close current PR and I'll use this branch as a ground for the entire feature, but future merge could be pretty massive.
@sanderpick What do you think?

sanderpick · 2021-02-09T16:27:58Z

Hey @dgtony, thanks for the summary. I agree doing this change stepwise makes sense. So, first the renaming (ThreadEdge -> HeadsEdge), then deal with addresses.

BTW, have you experimented with testground? We're thinking of spinning that up soon to get a better idea of network performance and bottlenecks.

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

dgtony · 2021-02-10T13:48:34Z

Ok, so current PR now contains HeadsEdge implementation and I'll submit AddrsEdge in a separate one.
As for Testground it seems to be very promising for identifying network/performance/consistency issues. It'd be great to make use of it and we've been looking at it during the last couple of months, but unfortunately have no hands-on experience yet.

sanderpick

@carsonfarmer may be interested in taking a look at this. He mentioned this idea sounds something like a vector clock. Maybe there are some ideas from that realm that can also be applied here. Or maybe we can call the edges clocks?

In any case, I'll merge this now and we can follow up with additional ideas.

sanderpick · 2021-02-24T16:19:35Z

util/util.go

+		}
+		return hs[i].LogID < hs[j].LogID
+	})
+	hasher := fnv.New64a()


Nice use of FNV 💪

carsonfarmer · 2021-02-24T17:08:47Z

Yeh this is neat. Almost like a vector clock, except more like a vector clock "summary" or digest.

To really take advantage of a vector clock approach, we'd need to "track" more than just the head of the logs here. We'd also have to track an update logical clock/integer, that would get incremented on updates. It'd only be a few extra bytes, but the benefit with that approach is it'd tell us which logs were off, whether they were behind (or ahead), and by how far off they are. This would make it possible to do really fine-grained syncing. The vector clock approach is something I've been thinking about for a while, and I just need to write some stuff down. At the end of the day, a thread really is just a vector clock of logs. Almost a perfect fit. But it also has me thinking about the approach outlined here. Its a super cool way to create a fixed-length digest of a vector clock as well. This is cool because you get the vector clock on the one hand, and you get a digest to use as a sorting mechanism.

The vector clock thing is worth thinking seriously about and writing up a new proposal discussion for implementing it. But of course, it would require some non-trivial code changes, whereas the approach in this PR answers the question are they off, yes or no pretty quickly and easily. So in the mean/shorter term, this is a super clever move in an awesome direction! Love it.

dgtony added 8 commits December 15, 2020 19:20

Impl deterministic thread edge computation

08b3885

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Add thread edge computation to persistent headbook

2a090b4

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Return error if thread doesn't exist

13262b6

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Add thread edge computation to in-memory headbook

68687ff

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Extend headbook interface

77e3935

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Add test for thread edge computation

7437cbb

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Impl fast path for GetRecords handler

128d55c

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Fix timings in tests

1e2a5d9

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

dgtony added 2 commits February 10, 2021 16:23

Rename method

eca1db2

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

Change key pattern for heads edges to avoid erroneous traverse

770cdcf

Signed-off-by: Anton Dort-Golts <dortgolts@gmail.com>

sanderpick approved these changes Feb 24, 2021

View reviewed changes

util/util.go

}

return hs[i].LogID < hs[j].LogID

})

hasher := fnv.New64a()

Copy link

Member

sanderpick Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of FNV 💪

sanderpick merged commit 7a964ca into textileio:master Feb 24, 2021

dgtony deleted the improvement/get-records-handler branch February 25, 2021 11:33

dgtony mentioned this pull request Mar 8, 2021

Exchange edges #495

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread edge proposal #477

Thread edge proposal #477

dgtony commented Dec 24, 2020

sanderpick commented Dec 31, 2020

dgtony commented Jan 19, 2021

dgtony commented Feb 2, 2021

sanderpick commented Feb 9, 2021

dgtony commented Feb 10, 2021

sanderpick left a comment

sanderpick Feb 24, 2021

carsonfarmer commented Feb 24, 2021

Thread edge proposal #477

Thread edge proposal #477

Conversation

dgtony commented Dec 24, 2020

sanderpick commented Dec 31, 2020

dgtony commented Jan 19, 2021

dgtony commented Feb 2, 2021

sanderpick commented Feb 9, 2021

dgtony commented Feb 10, 2021

sanderpick left a comment

Choose a reason for hiding this comment

sanderpick Feb 24, 2021

Choose a reason for hiding this comment

carsonfarmer commented Feb 24, 2021