-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for go-threads improvements #547
Comments
This is very useful, thanks for sharing the details.
Yep, sounds useful. On a side note, we have been toying with the idea of moving ThreadDB out of the repo and creating a better interface for "plugins". How is your app layer tied into the core thread layer?
Makes sense. IIRC, we landed on the invariant so that any peer can full validate a log.
👍
👍 Something to consider when thinking about a common interface to the
👍 These all sound related to snapshotting
Makes sense!
💯
👍 Related to the snapshot questions above, if the user controls snapshotting, sounds like each peer could get into a state where their snapshots are different / overlapping. Maybe that's fine, but it does add complexity when considering pagination. Snapshots at predictable intervals (based on the new counters), might make things simpler.
👍
💯
👍
So this is like replaying the records? Could this be combined with follow mode with a "since" param? Continuing with the analogy: This all sounds really good! Full support from our side. |
Hello guys!
This is a draft proposal with regards to changing the current mechanics of go-threads. Before we go into the details we wanted to summarise the current state of go-threads sync and also the motivation behind our proposed changes.
Current state of go-threads sync
Go-threads has multiple logs for each thread, each log is a single writer log, therefore only one peer can write to it, thus we have counters for each log telling us how many records it has. If a log has a head then we maintain an invariant that we have every record prior to head.
Based on this logic a lot of checks work now (mainly
GetRecords
check, where we know that if the log has the head counter then it has all records before that and alsoputRecords
check which also checks the counter to see if we need to add record and certain records before it).The threads are synced either by
pullThread/pushLog
where we specifically request the records from all the known peers or whenexchangeEdges
happens and we exchange hashes of heads with all our peers to see if they have more info and we need to callgetRecords
.How we use go-threads in Anytype
It will be good to start from the way we use go-threads in our app.
Each thread represents some document and each document consists of changes. Each change corresponds to a record, with only difference that it can connect to other changes in different logs. Sometimes we capture the current state of the document (e.g. all changes in the document) in one record which is called the snapshot. Snapshot-based approach may be useful for other apps as well. E.g. ThreadsDB can also benefit from it in case of huge DB.
In this snapshot among other things we store the reference to heads of the logs which the snapshot has “seen” at the time of its creation.
To build the document we start from current heads of the logs and then we try to get to the common snapshot. The main gist of it is that to build the document we don’t need to get all the records we can just get all the records after the common snapshot of all the changes. And we don’t need any records before this common snapshot.
Also we listen to any records which are added to the thread to rebuild the document as the time goes.
What problems do we have with current implementation
a. Our databases only grow in size and are too large
This becomes a problem especially for mobile devices as soon as the threads are shared by many users and have more data stored there.
And because the logs are only growing and we maintain an invariant that all records before the head are always present it means that we can’t get rid of records even when we don’t need them (see snapshots explanation above).
b. The synchronisation speed can be improved
Depending on the size of the thread we get stuff through bitswap (see
putRecords
implementation), also we get some unnecessary records (see a. above).c. Inconsistent subscribing
We can miss records, because go-threads starts processing records as long as we create the
app.Net
object. And our app may not be ready for that.d. Pulling records and threads from cold start takes too long
Mostly because again we get a lot of unnecessary records and we can't control what records do we download. Everything is decided by go-threads under the hood.
e. No garbage collecting
There is no way for us to get rid of unneeded records or mark them as such.
f. No way to prioritise what go-threads is downloading at the moment
Again everything is decided by go-threads under the hood and there is no way to control it.
The changes we propose
In general we want to make synchronisation to be configurable by client via some strategy (this can be either a config or a component which will determine the strategy). That will make go-threads more "dumb" and the client will have full control over it.
Of course we want to make changes backwards compatible, so by default the strategy will work in the same way as was before.
a. Remove the invariant that we have all the records before head
We will still have heads and their counters synchronised across devices, but we will not guarantee that we have everything before that. That will enable us to “garbage collect” all the records that we don’t need for building our documents.
It is a question if we for our convenience will maintain a list of ranges of downloaded records, looking something like:
{(hash A, counter 0), (hash B, counter 150)}, {(hash C, counter 390, hash D, counter 1000)}...
This will enable us to know if we have some record with counter just by doing a search through this list.
b. Introduce on-demand thread following
Drawing an analogy from
tail -f
the user can say that he wants to follow a certain thread and only then go-threads will try to synchronise all the records which come after the current head, but not before it.c. Introduce pagination
A lot of the time we need just to get N records below a specific hash/counter. This can be head or any other record. But at the same time we don’t want go-threads to fully download the log (because we don’t need it).
Go-threads now lacks such an API, for example in
GetRecords
you only provide the offset (end point), but loading always starts from head of the server’s log, you cannot provide another starting point.So essentially we want to be in control of how many records go-threads download and from which offset. Right now we cannot do that, because the records will be thrown away if we don’t fill the gap between our current head and the oldest received record. This topic is closely related to us killing the invariant that we must have all the records before head.
d. Change
exchangeEdges
so that it will only sync headsBut it will not try to get all the records unless we are in follow mode
e. Subscribe from particular record/counter
We want to be able to get all the records starting from some other record or counter. So no matter when we start subscribing we will still get all the needed records.
The text was updated successfully, but these errors were encountered: