Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set "Remove" operation removes all tags repeatedly #38

Closed
jsimnz opened this issue Apr 28, 2020 · 6 comments
Closed

Set "Remove" operation removes all tags repeatedly #38

jsimnz opened this issue Apr 28, 2020 · 6 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s) P1 High: Likely tackled by core team if no one steps up

Comments

@jsimnz
Copy link
Contributor

jsimnz commented Apr 28, 2020

Currently, calling set.Rmv(key) runs an element prefix query for the given key, and tombstones all the results it gets. This means a full element removal, every time a Rmv() operation is run. This results in redundant tombstones ending up in the Delta. Should it not only tombstone the elements that haven't yet been tombstoned.

Example.

set.Add("key", 1)     #produces delta with ID: Qa
set.Add("key", 2)     #produces delta with ID: Qb
set.Rmv("key")        # produces delta tombstoning Qa, and Qb
set.Add("key", 3)     # produces delta with ID: Qc
set.Rmv("key")        # produces delta tombstoning Qa, Qb, and Qc

(Assume there is a merge operation called between each action)
In the example above, Qa and Qb were tombstoned twice (Once for each Rmv() call). Additionally, this problem only gets worse if you were to make several hundred updates to a key, followed by removals. Given that the entire BlockID is included in a tombstone Delta, this inflates the storage requirements exponentially.

Given that each Merge call updates the local state key value store, the record of which keys have been tombstoned is already available. In the case where we are syncing from a remote store, we already sync the Merkle Clock DAG and its deltas as well, so again we have a consistent state of which keys have been tombstoned when. In either of these cases, do we not already have the necessary information to only add tombstones of elements which havent actually been tombstoned yet?

Is this an intended design for the Merkle CRDT semantics of an ORSet, or just taken verbatim from the original design from the delta-state paper.

@jsimnz jsimnz added the need/triage Needs initial labeling and prioritization label Apr 28, 2020
@jsimnz jsimnz changed the title Set "Remove" operation removes all tags event event Set "Remove" operation removes all tags repeatedly Apr 28, 2020
@hsanjuan hsanjuan added kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s) P1 High: Likely tackled by core team if no one steps up and removed need/triage Needs initial labeling and prioritization labels Apr 29, 2020
@hsanjuan
Copy link
Collaborator

Thanks for looking into this! I think your intuitions are correct and this is an optimization we can probably do.

I want to sleep over it, but my thinking now is that the order imposed by the DAG allows to do this without consequences, even when multiple branches are present, and when they are being synced in parallel (something that does not happen now but that I would like to address some day).

As mentioned, this should improve delta-sizes for scenarios where a key has a large number of tombstones because it is being added and removed repeteadly.

@jsimnz
Copy link
Contributor Author

jsimnz commented Apr 29, 2020

Thanks for the quick feedback. I'll dig into this more, just wanted to make sure I wasn't missing something. From what I can tell, and obviously, since I created the issue, we should be able to do away with the duplicate tombstonining without issue. The only issue would be #23 but that is independent of this, and more of a global issue across the entire design.

I'll follow up after I do some tests with this optimization.

@hsanjuan
Copy link
Collaborator

Yes, I was thinking of #23 too and same thought.

@hsanjuan
Copy link
Collaborator

@jsimnz do you have patch you want to submit for this?

@jsimnz
Copy link
Contributor Author

jsimnz commented May 16, 2020

I do! Caught up with some other work stuff but can open a PR

@hsanjuan
Copy link
Collaborator

Fixed in #48 . Thanks very much for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

2 participants