-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diffsync should be stateless #2
Comments
Is this about something other than https://github.com/janmonschke/diffsync#dataadapter? My thoughts:
|
My thoughts about your thoughts: 💭 🌀 💭
|
Ouff, just saw that I did not implement the fetching of client shadow documents asynchronously 😄 |
Oh no, running into a bigger problem here. Let's say that clients can reside on arbitrary nodes and those nodes take care of fetching the correct master document and the correct shadow documents for each client. This leads to the problem that for each sync request, each node has to make up to four DB requests:
These requests can be reduced to two requests if shadow documents were embedded inside their master documents (which would be easy for the case of schema-free databases). But the biggest culprit would be that the database had to lock the document from step 1 to step 4 and could only release it afterwards. Other nodes could attempt to write to the same document in the meantime which would result in dirty reads and loss of data when writing. Am I right with my assumptions? Or do I oversee a very simple solution on how to scale this to more than one node without having a load-balancer in place that gathers clients working on the same document on the same nodes. @episodeyang How did you handle this problem? |
It seems like Redis would be a good fit for storing the master and shadow, maybe paired with node-redlock. Another alternative is to have the servers share the objects directly with each other. It seemed that Neil Fraser was more intrigued by this idea in his talk. @janmonschke I'm interested in working on this problem so please let me know your thoughts on those two options. |
At the moment, diffsync stores all user documents that are necessary for the sync-cycle in memory. This has two implications:
In my opinion, the users' documents should be kept outside of the diffsync node e.g. in a redis instance.
Luckily, diffsync internally already reads data via an asynchronous interface so that the code changes should actually be pretty minimal. Regarding testing it should also not be too hard.
The main implication for the user would be an elevated sync time which depends on the type of data store and the distribution of the application's parts (node, intermediate data and permanent data). I guess it is okay to have this overhead in favour of getting rid of this extra state.
What do you think?
The text was updated successfully, but these errors were encountered: