-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared String: Obliterate op #8690
Comments
annotate seems like it would be the solution here |
Are you saying we could annotate the delete op to indicate that it should have the obliterate semantics? Or something more complicated? |
No. the data they want to obliterate should be stored as annotations, either on the text or in a marker. annotates work like maps, last write wins, which seems like the behavior they want. |
I see. |
I was thinking about this a bit yesterday, and a possible solution is something like below. i'm not wholly convinced it will work right, so it needs more vetting.
i also don't have a clear picture of how we handle properties for this marker on atomic update, do they coalesce. combine, or overwrite? this is also true for pure annotates, how do they mix with atomic updates. also, what happens if there is not marker with the id on atomic update, my hunch is it should be a no-op |
fyi @DLehenbauer and @jack-williams |
Just to understand the original ask. The goal is to really have something like a range replace:
That removes range from The obliterate op being proposed is a special type of delete that additionally removes segments in the delete range that were concurrently inserted, as normal delete will not delete other client's content. Is that right? So to achieve the replace you would do something like:
The replace would need to be sent as a single op and interpreted locally as obliterate + insert to avoid interleaving. I think that could work, but beyond this case I don't know how useful obliterate is as a new primitive. (Haven't thought about it much). @anthony-murphy's marker approach, which if I understand correctly is effectively adding "tokens" to the merge tree, that cannot be split, have identity, and can be updated using a variety of resolutions. That seems quite versatile and powerful to me, and would be worth looking into IMO. I'm not very familiar with markers. Are ID's tracked globally to prevent concurrent insert of markers with the same ID? Is the reason for giving markers arbitrary length only to make it easier to map positions when the sequence is rendered as something like a string. For instance. Could it be possible to use a 0 or 1 length marker, but understand that positions after the marker in the string must be adjusted. For instance:
Visual updates to position 12 get adjusted by -6 (to account more marker content) before being applied to the shared sequence. I'm guessing that if this is added to shared string, having the segments reflect the rendered string is desirable, hence giving the marker the length of the content avoids this translation. If markers have arbitrary length, what happens for inserts or deletes partially overlapping. Is this handled by the segment split logic returning
Yeah agreed. |
Great description @jack-williams. Let me answers a couple of the questions
Yes, at least that is how they are indented to work. They are currently not well used, so there could be existing bugs
Today markers have a default and fixed length of 1, no segment can have a length of 0 (text or marker) that will be rejected by the merge tree on insert. The reason to allow a configurable length is exactly as you laid out, to allow easier mapping between shared string cp space and whatever the application model is.
Today markers cannot be split, there is code to enforce this, but it is weak. This has not been a problem as all markers have length 1, so there is nothing to split. A change like this would introduce complexity. We have a similar problem with annotates. For local operations i would just reject it the specified range didn't cover the whole marker width. Remote is much more complicated. I don't know what the right behavior for remote operations is but whatever we do will need to be:
My hunch is we'd go with expansion to cover the full range. Currently, if you annotate a segment that some else deletes, your annotate basically becomes a no-op. The same thing probably happens with atomic update in this case in that the segment is gone, so your update is a no-op. |
One solution @anthony-murphy and I were kicking around involves inserting markers with length=1 at the beginning and end of the obliterated range. It seems like that would let us cordon off an obliterated subtree in a way that would withstand splitting of segments, etc. That in turn could help with conflict resolution. Keep length=1 also avoids all the complexity around making sure that markers themselves can't be split. It does seem like it would take an op to remove the markers. (Which brings up the question: when, if ever, could the markers be removed? From the partner ask, it seems like the obliterate command continues to prevent changes to the obliterated region after the obliterate's own sequence number. So it seems like we would keep the markers in place until we received an "end obliterate" command from the user. The API should probably be clarified with the partner.) The fact that the markers have non-zero length also means that the app model would have to take them into account. In side discussions, @vladsud has mentioned that "virtualization of cp space" would likely be required. |
If the solution inevitably involves having to map between the clients view and the merge tree, is it also worth considering the approach that uses a single length 1 marker/token with the content stored in an annotation? |
Paul - I'm not sure I follow your last proposal. Tony's proposal essentially is about implementing obliterate for markers. It's simpler than doing same for arbitrary text range (as it has a lot of cases that needs to be taking into account, which makes design really complex). But can we avoid adding obliterate semantics? There is a complexity of updating "user" properties (actual text) and marker length through one op (such that application can maintain consistency here - length of text in one of the properties to be equal to marker's segment size). Have not thought much about it, but I hope we can make it happen. I also think we do not need to have some unique ID for marker. Or at least I do not see (outside above problem) how handling of such marker and its properties today differs from what we need, it seems like app would have everything it needs to be successful without adding more constructs. And there should be no issues with splitting such marker, as there is no API to do so (same as today). One can insert delete markers or update marker properties, but that's about it. Even better if it was a separate layer (not baked into Sequence) that allowed CP virtualization (i.e., only this layer would know about such property as being special). While it may be complex to build, once we have a skeleton, it would be super easy to reuse it in other places where app would have different needs RE virtualization. |
The issue with Markers is that any property will have last write win semantics. however, the partner still wants collaborative edit in the range they wish to obliterate. Their specific scenario is that they store comments inline with the text, this has some nice features around locality. They want comments to be collaboratively editable, both individually, and threads. however, if the whole comment thread is deleted, they want to obliterate the whole range of the comment thread, and throw any incoming changes away. |
If you add virtualization over the sequence, is "obliterate" possible to emulate by toggling the markers as "off" and having the rendering logic skipping all content (regardless of tombstone status) between "off" pairs of markers. This leaves the issue of clearing up concurrently inserted content that does not get rendered by the view, and also deleting the markers. If there is a process to delete text between "off" markers, then eventually will we reach a point where zamboni GC's the segments between the markers and we get adjacent "off" markers? At this point could they be deleted themselves? |
@anthony-murphy - hm, I was assuming this is about fields in Word, and this issue starts with description of fields. Curious, is this new requirement? Would be great to capture in the top comment of this issue. |
yeah. when we talked to them they brought up a bunch of other cases, like comments that don't fit the field model |
Commenting scenario seems like (logically) can be achieved as marker with a handle pointing to another Sequence instance. Not sure RE requirement of CP space / virtualization. I wonder if we are better off investing into cheaper "child" sub-Sequences in some form, also helping other scenarios (like tables). |
Internal docwith a bit more info. |
This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework! |
Feature
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Existing work
Describe alternatives you've considered
Additional context
Scenario
Here's the original ask from a partner:
We recalculate field results and place those results in the SharedString. If multiple endpoints attempt to update the field at the same time, the deletes will coalesce, but the inserts won’t, leading to duplicated (though not necessarily identical) content in the document.
E.g:
<DateFieldCode | 1/2/2012>
Could become:
<DateFieldCode | 1/2/20121/2/2012>
Proposal: The Obliterate Op
Normally when we update the field code, we’ll end up performing a delete over the range between the field separator character “|” and the field end character “>” followed by an insert at the same location. Instead, we need to delete existing text in this range, and also any text that our client hasn’t yet received that would end up in that range.
When the local client has a locally applied obliterate, any inserts that come within the obliterate range or adjacent to the obliterate range would be immediately marked as deleted (but no delete ops created).
E.g.
Shared string:
“1234obliterated content567890” @ Sequence #1
Local client creates: Obliterate [4, 23), Ref Seq#1
Result: “123467890”
Incoming op:
Insert “!” @ 4 Ref Seq #1 (client1)
Result: “123467890”
Incoming op:
Insert “*” @ 24 Ref Seq#1 (client1)
Result: “123467890”
Incoming op:
Insert “^” @ 16 Ref Seq#1 (client1)
Result: “123467890”
Incoming op:
Insert “!” @ 4 Ref Seq #1 (client1)
Result: “123467890”
When applying the obliterate, the obliterate would similarly expand to include any adjacent text which was unknown to the sender. That is, the obliterate will expand to include any adjacent ranges with a sequence # lower than the reference sequence number of the obliterate op.
Example:
Shared string:
“1234obliterated content567890” @ Sequence #1
“1234!!obliterated content67890” @ Sequence #5
Incoming op:
Obliterate [4, 23), Ref Seq #1 (client2)
Result: “123467890”
AB#158
The text was updated successfully, but these errors were encountered: